Bug 1853643

Summary: Registry upgrade fails with ServiceCode=AuthenticationFailed
Product: OpenShift Container Platform Reporter: Mangirdas Judeikis <mjudeiki>
Component: Image RegistryAssignee: Oleg Bulatov <obulatov>
Status: CLOSED NOTABUG QA Contact: Wenjing Zheng <wzheng>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.3.zCC: aos-bugs, jminter
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-07 12:59:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
registry operator configuration none

Description Mangirdas Judeikis 2020-07-03 12:54:15 UTC
Created attachment 1699829 [details]
registry operator configuration

Description of problem:

Registry fails to upgrade with an authentication error:


Version-Release number of selected component (if applicable):

upgrade 4.3.18 -> 4.3.27


How reproducible:

Not sure.

Additional info:


Registry logs do not have anything abnormal.

Operator logs have the same message as errors in the attached config.yaml

Comment 1 Mangirdas Judeikis 2020-07-03 13:13:41 UTC
Secrets are not changed:
```
containers:
  - env:
    - name: REGISTRY_STORAGE
      value: azure
    - name: REGISTRY_STORAGE_AZURE_CONTAINER
      value: aro4cluster-xxxxx-image-registry-xxxxxxxxxxxxx
    - name: REGISTRY_STORAGE_AZURE_ACCOUNTNAME
      value: aro4clustermxxxxxxx
    - name: REGISTRY_STORAGE_AZURE_ACCOUNTKEY
      valueFrom:
        secretKeyRef:
          key: REGISTRY_STORAGE_AZURE_ACCOUNTKEY
          name: image-registry-private-configuration
```

Comment 2 Jim Minter 2020-07-03 17:48:02 UTC
We can see from the k8s audit logs that the end user created a image-registry-private-configuration-user Secret, and I can see from the Azure-side logs that at the same time the operator stopped calling ListKeys.

We believe that the image-registry-private-configuration-user contains an invalid storage account key.

This is problematic on ARO because this condition blocks cluster upgrades.

Please can the cluster-image-registry-operator detect authentication errors when using the key in the image-registry-private-configuration-user Secret, and fall back to 
trying to call ListKeys() on the storage account?

(Also c.f. https://bugzilla.redhat.com/show_bug.cgi?id=1853734 which details a separate issue with the way ListKeys() is used currently).

Comment 3 Oleg Bulatov 2020-07-07 12:59:40 UTC
That's expected. Presence of the secret image-registry-private-configuration-user is an explicit signal to the operator that credentials are managed by the user. I don't think fallback to ListKeys is appropriate, the operator shouldn't guess what credentials to use.

You need to address the root cause of the problem: the configuration is changed when it shouldn't. Any operator may prevent your cluster from upgrading if its config is not valid.