Bug 1853643 - Registry upgrade fails with ServiceCode=AuthenticationFailed
Summary: Registry upgrade fails with ServiceCode=AuthenticationFailed
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Oleg Bulatov
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-03 12:54 UTC by Mangirdas Judeikis
Modified: 2020-07-07 12:59 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-07 12:59:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
registry operator configuration (5.83 KB, text/plain)
2020-07-03 12:54 UTC, Mangirdas Judeikis
no flags Details

Description Mangirdas Judeikis 2020-07-03 12:54:15 UTC
Created attachment 1699829 [details]
registry operator configuration

Description of problem:

Registry fails to upgrade with an authentication error:


Version-Release number of selected component (if applicable):

upgrade 4.3.18 -> 4.3.27


How reproducible:

Not sure.

Additional info:


Registry logs do not have anything abnormal.

Operator logs have the same message as errors in the attached config.yaml

Comment 1 Mangirdas Judeikis 2020-07-03 13:13:41 UTC
Secrets are not changed:
```
containers:
  - env:
    - name: REGISTRY_STORAGE
      value: azure
    - name: REGISTRY_STORAGE_AZURE_CONTAINER
      value: aro4cluster-xxxxx-image-registry-xxxxxxxxxxxxx
    - name: REGISTRY_STORAGE_AZURE_ACCOUNTNAME
      value: aro4clustermxxxxxxx
    - name: REGISTRY_STORAGE_AZURE_ACCOUNTKEY
      valueFrom:
        secretKeyRef:
          key: REGISTRY_STORAGE_AZURE_ACCOUNTKEY
          name: image-registry-private-configuration
```

Comment 2 Jim Minter 2020-07-03 17:48:02 UTC
We can see from the k8s audit logs that the end user created a image-registry-private-configuration-user Secret, and I can see from the Azure-side logs that at the same time the operator stopped calling ListKeys.

We believe that the image-registry-private-configuration-user contains an invalid storage account key.

This is problematic on ARO because this condition blocks cluster upgrades.

Please can the cluster-image-registry-operator detect authentication errors when using the key in the image-registry-private-configuration-user Secret, and fall back to 
trying to call ListKeys() on the storage account?

(Also c.f. https://bugzilla.redhat.com/show_bug.cgi?id=1853734 which details a separate issue with the way ListKeys() is used currently).

Comment 3 Oleg Bulatov 2020-07-07 12:59:40 UTC
That's expected. Presence of the secret image-registry-private-configuration-user is an explicit signal to the operator that credentials are managed by the user. I don't think fallback to ListKeys is appropriate, the operator shouldn't guess what credentials to use.

You need to address the root cause of the problem: the configuration is changed when it shouldn't. Any operator may prevent your cluster from upgrading if its config is not valid.


Note You need to log in before you can comment on or make changes to this bug.