Bug 1886709 - [External] RGW storageclass disappears after upgrade from OCS 4.5 to 4.6
Summary: [External] RGW storageclass disappears after upgrade from OCS 4.5 to 4.6
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.6.0
Assignee: Jose A. Rivera
QA Contact: Parikshith
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-09 07:55 UTC by Rachael
Modified: 2021-08-23 14:37 UTC (History)
12 users (show)

Fixed In Version: 4.6.0-148.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-17 06:24:47 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 856 0 None closed Fix errors in managed resources reconciliation 2020-12-01 10:46:57 UTC
Github openshift ocs-operator pull 871 0 None closed Bug 1886709: [release-4.6]: Fix errors in managed resources reconciliation 2020-12-01 10:46:57 UTC
Red Hat Product Errata RHSA-2020:5605 0 None None None 2020-12-17 06:25:07 UTC

Description Rachael 2020-10-09 07:55:41 UTC
Description of problem (please be detailed as possible and provide log
snippets):

After upgrading the external mode cluster from OCS 4.5 to OCS 4.6, it was observed that the RGW storage class was no longer present.

Before upgrade:
===============

NAME                                   PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
ocs-external-storagecluster-ceph-rbd   openshift-storage.rbd.csi.ceph.com      Delete          Immediate           true                   67m
ocs-external-storagecluster-ceph-rgw   openshift-storage.ceph.rook.io/bucket   Delete          Immediate           false                  67m
ocs-external-storagecluster-cephfs     openshift-storage.cephfs.csi.ceph.com   Delete          Immediate           true                   67m
openshift-storage.noobaa.io            openshift-storage.noobaa.io/obc         Delete          Immediate           false                  65m


After upgrade: 
==============
NAME                                   PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
ocs-external-storagecluster-ceph-rbd   openshift-storage.rbd.csi.ceph.com      Delete          Immediate           true                   92m
ocs-external-storagecluster-cephfs     openshift-storage.cephfs.csi.ceph.com   Delete          Immediate           true                   92m
openshift-storage.noobaa.io            openshift-storage.noobaa.io/obc         Delete          Immediate           false                  89m


Version of all relevant components (if applicable):

$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES                     PHASE
ocs-operator.v4.6.0-593.ci   OpenShift Container Storage   4.6.0-593.ci   ocs-operator.v4.5.0-560.ci   Succeeded

OCP version: 4.6.0-0.nightly-2020-10-08-210814

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, creation of RGW OBCs fails because the RGW storageclass is not present


Is there any workaround available to the best of your knowledge?
Create a storageclass manually 


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2


Can this issue reproducible?
Tried it only once

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Upgrade external mode OCS from 4.5 to 4.6
2. Check storageclass list


Actual results:
RGW storageclass is not listed

Expected results:
RGW storageclass should be listed

Comment 2 Sébastien Han 2020-10-09 08:12:20 UTC
Rook does not create SC, ocs-op does, moving to ocs-op :)

Comment 4 Mudit Agarwal 2020-10-12 14:22:37 UTC
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1873580?

Comment 5 Jose A. Rivera 2020-10-14 14:52:55 UTC
(In reply to Mudit Agarwal from comment #4)
> Related to https://bugzilla.redhat.com/show_bug.cgi?id=1873580?

It is not. That is the RGW SC never being created at all.

I'm not sure what is going on here. There is nothing in the ocs-operator code that would delete the SC without recreating it, and even there it would only be triggered if the StorageClusterInitialization was removed. Does this happen immediately after upgrade? Is it reliably reproducible?

Comment 7 Jose A. Rivera 2020-10-19 14:35:26 UTC
Sorry for the delay, I missed the notification.

Looks like the problem is this:


2020-10-14T17:26:59.952188571Z {"level":"error","ts":"2020-10-14T17:26:59.952Z","logger":"controller_storagecluster","msg":"failed to create needed StorageClasses","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster","error":"resourceVersion should not be set on objects to be created"}

Being caused by these lines:

https://github.com/openshift/ocs-operator/blob/master/pkg/controller/storagecluster/storageclasses.go#L69-L70

I'm not sure why we're doing this here, I'll have to investigate.

Comment 8 Martin Bukatovic 2020-10-20 15:03:44 UTC
Based on today's bug triage, providing qa ack. QE team will validate this BZ during upgrade testing.

Comment 13 Jose A. Rivera 2020-10-23 14:37:28 UTC
I think this is just intermittent, nothing should have changed to resolve the issue.

This PR includes the fix to fully resolve the issue: https://github.com/openshift/ocs-operator/pull/856

Comment 14 Neha Berry 2020-10-27 10:17:59 UTC
@pbyregow though this bug was reported in externam mode upgrade, it would be good if we can verify this BZ for upgrade in both Internal and external mode. Just so we can be sure that the storageclass doesn't disappear in any mode.

thanks

Comment 20 errata-xmlrpc 2020-12-17 06:24:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605

Comment 21 Jilju Joy 2021-08-23 14:37:15 UTC
Removing AutomationBackLog keyword. Presence of storageclasses are verified in automated upgrade test.


Note You need to log in before you can comment on or make changes to this bug.