Description of problem (please be detailed as possible and provide log snippets): After upgrading the external mode cluster from OCS 4.5 to OCS 4.6, it was observed that the RGW storage class was no longer present. Before upgrade: =============== NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE ocs-external-storagecluster-ceph-rbd openshift-storage.rbd.csi.ceph.com Delete Immediate true 67m ocs-external-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 67m ocs-external-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 67m openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 65m After upgrade: ============== NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE ocs-external-storagecluster-ceph-rbd openshift-storage.rbd.csi.ceph.com Delete Immediate true 92m ocs-external-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 92m openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 89m Version of all relevant components (if applicable): $ oc get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.6.0-593.ci OpenShift Container Storage 4.6.0-593.ci ocs-operator.v4.5.0-560.ci Succeeded OCP version: 4.6.0-0.nightly-2020-10-08-210814 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, creation of RGW OBCs fails because the RGW storageclass is not present Is there any workaround available to the best of your knowledge? Create a storageclass manually Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Tried it only once Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Upgrade external mode OCS from 4.5 to 4.6 2. Check storageclass list Actual results: RGW storageclass is not listed Expected results: RGW storageclass should be listed
Rook does not create SC, ocs-op does, moving to ocs-op :)
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1873580?
(In reply to Mudit Agarwal from comment #4) > Related to https://bugzilla.redhat.com/show_bug.cgi?id=1873580? It is not. That is the RGW SC never being created at all. I'm not sure what is going on here. There is nothing in the ocs-operator code that would delete the SC without recreating it, and even there it would only be triggered if the StorageClusterInitialization was removed. Does this happen immediately after upgrade? Is it reliably reproducible?
Sorry for the delay, I missed the notification. Looks like the problem is this: 2020-10-14T17:26:59.952188571Z {"level":"error","ts":"2020-10-14T17:26:59.952Z","logger":"controller_storagecluster","msg":"failed to create needed StorageClasses","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster","error":"resourceVersion should not be set on objects to be created"} Being caused by these lines: https://github.com/openshift/ocs-operator/blob/master/pkg/controller/storagecluster/storageclasses.go#L69-L70 I'm not sure why we're doing this here, I'll have to investigate.
Based on today's bug triage, providing qa ack. QE team will validate this BZ during upgrade testing.
I think this is just intermittent, nothing should have changed to resolve the issue. This PR includes the fix to fully resolve the issue: https://github.com/openshift/ocs-operator/pull/856
@pbyregow though this bug was reported in externam mode upgrade, it would be good if we can verify this BZ for upgrade in both Internal and external mode. Just so we can be sure that the storageclass doesn't disappear in any mode. thanks
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5605
Removing AutomationBackLog keyword. Presence of storageclasses are verified in automated upgrade test.