Created attachment 1699829 [details] registry operator configuration Description of problem: Registry fails to upgrade with an authentication error: Version-Release number of selected component (if applicable): upgrade 4.3.18 -> 4.3.27 How reproducible: Not sure. Additional info: Registry logs do not have anything abnormal. Operator logs have the same message as errors in the attached config.yaml
Secrets are not changed: ``` containers: - env: - name: REGISTRY_STORAGE value: azure - name: REGISTRY_STORAGE_AZURE_CONTAINER value: aro4cluster-xxxxx-image-registry-xxxxxxxxxxxxx - name: REGISTRY_STORAGE_AZURE_ACCOUNTNAME value: aro4clustermxxxxxxx - name: REGISTRY_STORAGE_AZURE_ACCOUNTKEY valueFrom: secretKeyRef: key: REGISTRY_STORAGE_AZURE_ACCOUNTKEY name: image-registry-private-configuration ```
We can see from the k8s audit logs that the end user created a image-registry-private-configuration-user Secret, and I can see from the Azure-side logs that at the same time the operator stopped calling ListKeys. We believe that the image-registry-private-configuration-user contains an invalid storage account key. This is problematic on ARO because this condition blocks cluster upgrades. Please can the cluster-image-registry-operator detect authentication errors when using the key in the image-registry-private-configuration-user Secret, and fall back to trying to call ListKeys() on the storage account? (Also c.f. https://bugzilla.redhat.com/show_bug.cgi?id=1853734 which details a separate issue with the way ListKeys() is used currently).
That's expected. Presence of the secret image-registry-private-configuration-user is an explicit signal to the operator that credentials are managed by the user. I don't think fallback to ListKeys is appropriate, the operator shouldn't guess what credentials to use. You need to address the root cause of the problem: the configuration is changed when it shouldn't. Any operator may prevent your cluster from upgrading if its config is not valid.