Description of problem: It is not easy to find that csi driver is in non-work status. Due to bug 1918140, openstack-cinder-csi-driver-controller did not be installed on OSP, but there is no status specifying csi driver doesn't work when checking the clustercsidrivers/cinder.csi.openstack.org, here is no anything degrade or not available, and there is no type "OpenStackCinderDriverControllerServiceController" $ oc get clustercsidrivers cinder.csi.openstack.org -o json | jq .status { "conditions": [ { "lastTransitionTime": "2021-01-20T09:48:43Z", "status": "False", "type": "ManagementStateDegraded" }, { "lastTransitionTime": "2021-01-20T09:48:52Z", "status": "True", "type": "OpenStackCinderDriverNodeServiceControllerAvailable" }, { "lastTransitionTime": "2021-01-20T10:02:44Z", "status": "False", "type": "OpenStackCinderDriverNodeServiceControllerProgressing" }, { "lastTransitionTime": "2021-01-20T16:03:02Z", "reason": "AsExpected", "status": "False", "type": "OpenStackCinderDriverNodeServiceControllerDegraded" }, { "lastTransitionTime": "2021-01-20T21:36:58Z", "reason": "AsExpected", "status": "False", "type": "OpenStackCinderDriverStaticResourcesControllerDegraded" } ], And there is no explicit error/warning info from openstack-cinder-csi-driver-operator: $ oc -n openshift-cluster-csi-drivers logs openstack-cinder-csi-driver-operator-557ffdc94d-r9pst | grep "^E" $ oc -n openshift-cluster-csi-drivers logs openstack-cinder-csi-driver-operator-557ffdc94d-r9pst | grep "^W" W0120 08:56:20.426994 1 cmd.go:204] Using insecure, self-signed certificates W0120 08:56:21.457470 1 secure_serving.go:69] Use of insecure cipher 'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256' detected. W0120 08:56:21.457499 1 secure_serving.go:69] Use of insecure cipher 'TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256' detected. When checking the CSO, in most cases it is in normal status, you can only see it degrades in a very short moment with “oc get co storage -w”, so it's hard to find the issue early. Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2021-01-19-095812 How reproducible: On condition Steps to Reproduce: See Description Actual results: Expected results:
Note that the error message will show up in the operator logs only after 10 minutes. Here's an example [1]: "F0826 15:07:40.104515 1 base_controller.go:96] unable to sync caches for ConfigObserver" Since the error message is being recorded in the logs, I'm moving back to ON_QA. Just a note about the issue: note that in order to trigger this error from happening, the developer working on the CSI operator needs to NOT start the informers. Even though this happened once, it's something unlikely to happen and should be caught by code review. However, if it does happen again, this mistake would've been caught by the presubmit job added recently for Cinder (not sure if Manila operator has that too). That's because the absense of the CSI controller Deployment would cause volume provision to fail, which would be definitely caught by the CI job [2]. Other than that, we could add a check in CSO to make sure the CSI controller Deployment has started correctly, however, I believe it's not worth the effort given the odds of this happening again. [1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_openstack-cinder-csi-driver-operator/39/pull-ci-openshift-openstack-cinder-csi-driver-operator-master-e2e-openstack-csi/1430884632865280000/artifacts/e2e-openstack-csi/gather-extra/artifacts/pods/openshift-cluster-csi-drivers_openstack-cinder-csi-driver-operator-bddfdc65b-9sdnn_openstack-cinder-csi-driver-operator_previous.log [2] https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_openstack-cinder-csi-driver-operator/39/pull-ci-openshift-openstack-cinder-csi-driver-operator-master-e2e-openstack-csi/1430884632865280000
*** Bug 1918564 has been marked as a duplicate of this bug. ***
Verified pass.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days