Bug 1853734 - cluster-image-registry-operator calls ListKeys excessively
Summary: cluster-image-registry-operator calls ListKeys excessively
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.0
Assignee: Ricardo Maraschini
QA Contact: XiuJuan Wang
Depends On:
TreeView+ depends on / blocked
Reported: 2020-07-03 17:41 UTC by Jim Minter
Modified: 2020-09-15 12:21 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Avoid calling Azure endpoints too many times. Reason: Azure enforces quotas and the operator was constantly querying for storage account keys. Result: By caching the keys locally for a time we avoid going remotely to get them every time.
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Github openshift cluster-image-registry-operator pull 603 None closed Bug 1853734: Reusing Azure API key 2020-09-14 02:50:36 UTC

Description Jim Minter 2020-07-03 17:41:18 UTC
Description of problem:
  In IPI mode, cluster-image-registry-operator calls ListKeys on the Azure storage account incessantly.  This is bad behaviour, especially so because the quotas on ListKeys() calls are low (https://aka.ms/srpthrottlinglimits) and others could be affected.

  Ideally, cluster-image-registry-operator should call ListKeys once at start-up, and then subsequently in a rate-limited way if it detects an authentication error using the key that it has cached (e.g. in the unlikely event that the end user has rotated the storage key).

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Run IPI cluster.
2. Check audit logs on registry storage account in Azure portal.

Actual results:
  Lots of ListKeys calls seen.

Expected results:
  Few ListKeys calls seen.

Additional info:
  This BZ is closely associated to https://bugzilla.redhat.com/show_bug.cgi?id=1853643.

Comment 4 Ricardo Maraschini 2020-08-21 12:37:15 UTC
No progress due to higher severity bugs.

Comment 8 XiuJuan Wang 2020-09-11 10:13:38 UTC
From Azure portal console, we could see the event "List Storage Account Keys" has been cached every 5 mins no longer cached several times a minute.
Test with 4.6.0-0.nightly-2020-09-10-195619 azure cluster.

Comment 9 XiuJuan Wang 2020-09-14 03:08:20 UTC
And image_registry_operator_azure_key_cache_requests_total metrics has been added
image_registry_operator_azure_key_cache_requests_total{endpoint="60000",instance="",job="image-registry-operator",namespace="openshift-image-registry",pod="cluster-image-registry-operator-f5f4469b4-l8k2f",result="hit",service="image-registry-operator"}	2586
image_registry_operator_azure_key_cache_requests_total{endpoint="60000",instance="",job="image-registry-operator",namespace="openshift-image-registry",pod="cluster-image-registry-operator-f5f4469b4-l8k2f",result="miss",service="image-registry-operator"}	18

Note You need to log in before you can comment on or make changes to this bug.