4.6 fixed bug 1884558 around a broken cacert file path by bumping the path in the installer . But that didn't fix born-before-4.6 clusters who were initialized with the broken path. Many OpenStack providers apparently work around the broken path, but when those clusters update to 4.7 and get the new Cinder CSI handler, they stick on update with the storage ClusterOperator Available=False with:
Message: OpenStackCinderCSIDriverOperatorCRAvailable: OpenStackCinderDriverControllerServiceControllerAvailable: Waiting for Deployment to deploy the CSI Controller Service
W0316 15:02:07.788864 1 main.go:108] Failed to GetOpenStackProvider: Post "https://.../v3/auth/tokens": x509: certificate signed by unknown authority
in the crash-looping csi-driver container.
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker keyword has been added to this bug. The expectation is that the assignee answers these questions.
Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking?
* example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet
* example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time
What is the impact? Is it serious enough to warrant blocking edges?
* example: Up to 2 minute disruption in edge routing
* example: Up to 90 seconds of API downtime
* example: etcd loses quorum and you have to restore from backup
How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)?
* example: Issue resolves itself after five minutes
* example: Admin uses oc to fix things
* example: Admin must SSH to hosts, restore from backups, or other non standard admin activities
Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)?
* example: No, it’s always been like this we just never noticed
* example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1
Who is impacted?
Customers that deployed an OCP cluster version <4.6 on OpenStack with self-signed certificates can't upgrade to 4.7.
What is the impact?
Cinder CSI driver gets incorrect CA cert path from the clouds.yaml file and can't start.
How involved is remediation?
The immediate workaround would be to manually modify the `clouds.yaml` key in `openstack-credentials` secret in `kube-system` namespace, and replace `cacert: <some value>` with `cacert: /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem`. The long-term solution is to update CCO to generate correct clouds.yaml .
Is this a regression?
The issue happens only when upgrading from 4.6 to 4.7, all other versions are not affected.
*** Bug 1940395 has been marked as a duplicate of this bug. ***
I'm adding ImpactStatementProposed , because comment 1 gives us an impact statement, and we just need to make a call on whether we need to block edges to protect folks while we get this fix out.
Without knowing the actual number of clusters or % of clusters that will be impacted it is not possible to mark this as upgrade blocker as this is very specific to clusters on OpenStack with self-signed certificates.
Ok, I'm going to say we don't block edges on this, but if folks hear about more of this sort of thing going on, we can revisit.
CA cert path issue has fixed on 4.8.0-0.nightly-2021-04-09-222447
1.Install a self-signed cert cluster on openstack
2.Edit secret openstack-credentials in kube-system namespace, and update CA cert path to a wrong one and save
3.Check secret openstack-credentials again, verify it will be changed to `/etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem` again
oc get secret -n kube-system openstack-credentials -o json | jq -r ".data"
"clouds.yaml": "BASE64 encode string"
4. The components secrets are the same as the root credential
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
*** Bug 2027597 has been marked as a duplicate of this bug. ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days