Bug 1853734 - cluster-image-registry-operator calls ListKeys excessively
Summary: cluster-image-registry-operator calls ListKeys excessively
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Ricardo Maraschini
QA Contact: XiuJuan Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-03 17:41 UTC by Jim Minter
Modified: 2020-10-27 16:12 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Avoid calling Azure endpoints too many times. Reason: Azure enforces quotas and the operator was constantly querying for storage account keys. Result: By caching the keys locally for a time we avoid going remotely to get them every time.
Clone Of:
Environment:
Last Closed: 2020-10-27 16:12:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-image-registry-operator pull 603 0 None closed Bug 1853734: Reusing Azure API key 2020-10-19 15:18:46 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:12:26 UTC

Description Jim Minter 2020-07-03 17:41:18 UTC
Description of problem:
  In IPI mode, cluster-image-registry-operator calls ListKeys on the Azure storage account incessantly.  This is bad behaviour, especially so because the quotas on ListKeys() calls are low (https://aka.ms/srpthrottlinglimits) and others could be affected.

  Ideally, cluster-image-registry-operator should call ListKeys once at start-up, and then subsequently in a rate-limited way if it detects an authentication error using the key that it has cached (e.g. in the unlikely event that the end user has rotated the storage key).


Version-Release number of selected component (if applicable):
  4.3.27


How reproducible:
  Always.


Steps to Reproduce:
1. Run IPI cluster.
2. Check audit logs on registry storage account in Azure portal.

Actual results:
  Lots of ListKeys calls seen.

Expected results:
  Few ListKeys calls seen.

Additional info:
  This BZ is closely associated to https://bugzilla.redhat.com/show_bug.cgi?id=1853643.

Comment 4 Ricardo Maraschini 2020-08-21 12:37:15 UTC
No progress due to higher severity bugs.

Comment 8 XiuJuan Wang 2020-09-11 10:13:38 UTC
From Azure portal console, we could see the event "List Storage Account Keys" has been cached every 5 mins no longer cached several times a minute.
Test with 4.6.0-0.nightly-2020-09-10-195619 azure cluster.

Comment 9 XiuJuan Wang 2020-09-14 03:08:20 UTC
And image_registry_operator_azure_key_cache_requests_total metrics has been added
image_registry_operator_azure_key_cache_requests_total{endpoint="60000",instance="10.129.0.13:60000",job="image-registry-operator",namespace="openshift-image-registry",pod="cluster-image-registry-operator-f5f4469b4-l8k2f",result="hit",service="image-registry-operator"}	2586
image_registry_operator_azure_key_cache_requests_total{endpoint="60000",instance="10.129.0.13:60000",job="image-registry-operator",namespace="openshift-image-registry",pod="cluster-image-registry-operator-f5f4469b4-l8k2f",result="miss",service="image-registry-operator"}	18

Comment 11 errata-xmlrpc 2020-10-27 16:12:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.