Bug 1937464

Summary: openstack cloud credentials are not getting configured with correct user_domain_name across the cluster
Product: OpenShift Container Platform Reporter: Sudarshan Chaudhari <suchaudh>
Component: Cloud ComputeAssignee: Mike Fedosin <mfedosin>
Cloud Compute sub component: OpenStack Provider QA Contact: Itzik Brown <itbrown>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: adduarte, aos-bugs, arane, egarcia, itbrown, jaeichle, jdiaz, jkaur, jrouth, lwan, m.andre, mbooth, mfedosin, obulatov, pprinett
Version: 4.5Keywords: Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Cluster Image Registry Operator considered user_domain_name as an immutable field and didn't modify it after installation. Consequence: After update of user_domain_name in the main secret, the operator didn't accept this change and couldn't work with updated credentials. Fix: Mark user_domain_name and other related domain field as mutable and do not store them in the image registry config. Result: Updating of user_domain_name and all other auth parameters are now supported.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:52:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sudarshan Chaudhari 2021-03-10 17:28:22 UTC
Description of problem:
The customer has deployed OpenShift cluster on OpenStack using local User and post deployment they are changing the credentials from local to LDAP based OpenStack user. 

To make the changes, customer has updated the correct credentials and user_domain_name in clouds.yaml and clouds.conf in the secret openstack-credentials secret. The new credentials can be validated correctly using openstack cli. 

Post updating the openstack-credentials the username and passwords were correctly updated across the cluster but the user_domain_name is not reflected correctly which is causing the issue with the components as they are still referring to "Default" user_domain_name and are getting authentication while mounting the swift, cinder storage as well as other operation.

$ date ; oc -n openshift-cloud-credential-operator get credentialsrequest -o json | jq -c '.items[] | select(.status.provisioned) | .spec.secretRef, .metadata.creationTimestamp, .status.lastSyncTimestamp'
Wed Feb 24 18:07:59 UTC 2021

Check the attached resource configs. 

Error observed:
$ oc logs cluster-image-registry-operator-748f8f9855-fhblt -c cluster-image-registry-operator --tail=20

I0303 17:32:15.100153      14 controller.go:291] object changed: *v1.Config, Name=cluster (status=true): changed:status.conditions.0.lastTransitionTime={"2021-03-03T17:32:14Z" -> "2021-03-03T17:32:15Z"}
I0303 17:32:15.603731      14 controller.go:291] object changed: *v1.Config, Name=cluster (status=true): 
E0303 17:32:15.610609      14 controller.go:330] unable to sync: unable to sync storage configuration: Failed to authenticate provider client: Authentication failed, requeuing
I0303 17:32:15.644501      14 controller.go:291] object changed: *v1.Config, Name=cluster (status=true): 
E0303 17:32:15.650130      14 controller.go:330] unable to sync: unable to sync storage configuration: Failed to authenticate provider client: Authentication failed, requeuing

openshift-monitoring       23s         Warning   FailedMount                    pod/prometheus-k8s-0                                     Unable to attach or mount volumes: unmounted volumes=[cinder], unattached volumes=[config secret-grpc-tls tls-assets secret-kube-etcd-client-certs secret-prometheus-k8s-proxy prometheus-k8s-token-l9cvb cinder secret-prometheus-k8s-tls config-out configmap-kubelet-serving-ca-bundle secret-prometheus-k8s-htpasswd prometheus-trusted-ca-bundle configmap-serving-certs-ca-bundle secret-kube-rbac-proxy prometheus-k8s-rulefiles-0]: timed out waiting for the condition

openshift-logging          21s         Warning   FailedMount                    pod/elasticsearch-cdm-ntdbo1fj-1-675d7c8bd8-mn644        Unable to attach or mount volumes: unmounted volumes=[elasticsearch-storage], unattached volumes=[certificates elasticsearch-token-bdclj elasticsearch-metrics elasticsearch-storage elasticsearch-config]: timed out waiting for the condition

Version-Release number of selected component (if applicable):
OCP 4.5

Steps to Reproduce:
1. configure OCP cluster with default user_domain_name
2. post deployment change the OSP user credentials in openstack-credentials with LDAP based auth and add different user_domain_name

Actual results:
The OpenShift is not able to authenticate with OpenStack using LDAP based user

Expected results:
The credentials should be configured correctly across multiple components in OpenShift.

Additional info:
Check the provided details:
- openstack-credentials secret yaml
- must-gather access.
- image-registry access which shows that the user_domain_name is missing which is causing authentication failure while mounting SWIFT storage.

Comment 2 Janine Eichler 2021-03-11 20:08:44 UTC
I tested with a fresh installation as well.

Used clouds.yaml for IPI installation (OCP 4.5, RHOSP13):

      auth_url: https://osp.acme.cloud:13000/v3
      username: my-user
      password: xxxxxxxxx
      project_name: myprojectname
      user_domain_name: myuserdoamin.name
      project_domain_name: default
    cacert: "/home/ansible_deployer/my_root_ca.pem"
    region_name: myregion
    interface: "public"

Result: cluster installation failed. Registry is not coming up.

Four Observations:
a) registry falls back to use cinder in the first place instead of using swift (the user has the swift operator role, with the cli the user can create containers)
b) the cinder volume cannot be mounted due to: "Failed to provision volume with StorageClass "standard": unable to initialize cinder client for region: myregion, err: cloud provider is not initialized: cannot initialize cloud provider using data from the secret: Authentication failed"
c) when checking the openstack-credentials in kube-sustem, I can see that the clouds.yaml looks correct, however the clouds.conf looks like this:

auth-url = "https://osp.acme.cloud:13000/v3"
username = "myuser"
password = "xxxxxxxx"
tenant-name = "myprojectname"
domain-name = "myuserdoamin.name"
region = "myregion"
ca-file = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem

---> user-domain-name is not there but domain-name instead, which looks surprising to me, but I might be wrong.

d) when changing this clouds.conf in the secret by setting the user-domain-name instead of the domain-name, and then trying to create a PVC, I get the error that user-domain-name is not a valid config option under the [Global] section.

Comment 26 Itzik Brown 2021-06-13 17:38:55 UTC
Deployed with domain 'shiftstack'

And verified that under 'swift' there is only the container entry

$ oc get configs.imageregistry.operator.openshift.io/cluster -o json | jq .status.storage
  "managementState": "Managed",
  "swift": {
    "container": "ostest-f228c-image-registry-yjdyrxtxocvqouruapcoegyqcpdkfsdyms"

OCP: 4.8.0-0.nightly-2021-06-13-101614
OSP: RHOS-16.1-RHEL-8-20210506.n.1

Comment 40 errata-xmlrpc 2021-07-27 22:52:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 44 Sudarshan Chaudhari 2021-10-04 20:10:59 UTC

Do we have any update for this to be backported to 4.6?

The recent erratas do not includes this issue and it seems to be fixed since a while.

Comment 45 Martin André 2021-10-05 07:47:36 UTC
(In reply to Sudarshan Chaudhari from comment #44)
> Hello, 
> Do we have any update for this to be backported to 4.6?
> The recent erratas do not includes this issue and it seems to be fixed since
> a while.

Hi Sudarshan, the fix is available starting from 4.8 and will not be backported to 4.6 due to risks associated with the backport. We've however documented a workaround to get past the issue in versions prior to 4.8: