Bug 1860418 - OCS 4.5 Uninstall: Deleting StorageCluster leaves Noobaa-db PV in Released state(secret not found)
Summary: OCS 4.5 Uninstall: Deleting StorageCluster leaves Noobaa-db PV in Released st...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: OCS 4.5.0
Assignee: Raghavendra Talur
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-24 14:00 UTC by Neha Berry
Modified: 2023-09-14 06:04 UTC (History)
11 users (show)

Fixed In Version: 4.5.0-518.ci
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-15 10:18:25 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:3754 0 None None None 2020-09-15 10:18:59 UTC

Description Neha Berry 2020-07-24 14:00:07 UTC
Description of problem (please be detailed as possible and provide log
snippests):
----------------------------------------------------------------------
noobaa-db PV stays behind in Released state on deleting the storagecluster & then openshift-storage namespace.


Reason: With Bug 1849105 and 849532#c5, automatic deletion of the StorageClasses is expected on StorageCluster Deletion. But, in the absence of SC, the nooba-pv stays behind in Released state. with the message added below:

Events:
  Type     Reason              Age                 From                                                                                                               Message
  ----     ------              ----                ----                                                                                                               -------
  Warning  VolumeFailedDelete  20s (x12 over 13m)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-bfd6f845d-mh7vd_751011e3-143e-477b-9389-91ca947f8313  rpc error: code = Internal desc = provided secret is empty


  $ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                            STORAGECLASS                  REASON   AGE
pvc-36be30b5-a90a-42a7-93ed-144b7ecc31e9   512Gi      RWO            Delete           Bound      openshift-storage/ocs-deviceset-2-data-0-2x4w2   thin                                   2d2h
pvc-acee0891-6f46-4996-874b-c68922e5e804   512Gi      RWO            Delete           Bound      openshift-storage/ocs-deviceset-1-data-0-gg4hr   thin                                   2d2h
pvc-b41c4457-ace7-4749-bae8-3e05c02c43f5   512Gi      RWO            Delete           Bound      openshift-storage/ocs-deviceset-0-data-0-5hjlx   thin                                   2d2h
>> pvc-c57b8874-4869-4861-a104-b050c90ceec0   60Gi       RWO            Delete           Released   openshift-storage/db-noobaa-db-0                 ocs-storagecluster-ceph-rbd            2d2h



Version of all relevant components (if applicable):
----------------------------------------------------------------------
OCS = 4.5.0-494.ci
OCP = 4.5.0-0.nightly-2020-07-22-074214
Ceph = RHCS 4.1.z1 (14.2.8-81.el8cp)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
----------------------------------------------------------------------
No

Is there any workaround available to the best of your knowledge?
----------------------------------------------------------------------
We can delete the PV manually, but then this would need a documentation.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
----------------------------------------------------------------------
3

Can this issue reproducible?
----------------------------------------------------------------------
Yes

Can this issue reproduce from the UI?
----------------------------------------------------------------------
Yes

If this is a regression, please provide more details to justify this:
----------------------------------------------------------------------
Uninstall has undergone changes . Earlier, we used to delete the namespace(which deleted the PVC and PV) before deleting the RBD StorageClass


Steps to Reproduce:
----------------------------------------------------------------------

Doc link for reference: [1] as OCS 4.5 Uninstall is not yet ready

1. Labelled the Storagecluster with cleanup.ocs.openshift.io=yes-really-destroy-data

$  oc label -n openshift-storage storagecluster --all cleanup.ocs.openshift.io=yes-really-destroy-data
storagecluster.ocs.openshift.io/ocs-storagecluster labeled

2. Deleted all PVCs and OBCs as per current Step #3 in OCS 4.4 docs

3.  Followed the Uninstall as per https://bugzilla.redhat.com/show_bug.cgi?id=1849532#c5 and deleted the StorageCLuster from UI (the storagecluster was already patched with  cleanup.ocs.openshift.io=yes-really-destroy-data)

4. Deleted the Storagecluster from UI : Installed Operators->OCS Operator->Storage Cluster-> Delete StorageCLuster Service->OK

5. Check the state of the items deleted due to storagecluster deletion. It is seen that the PVC noobaa-db-0 gets deleted but corresponding PV fails to get deleted due to absence of the StorageCLass(secret is empty).





[1] - https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.4/html-single/deploying_openshift_container_storage/index?lb_target=preview#assembly_uninstalling-openshift-container-storage_aws-vmware

Actual results:
----------------------------------------------------------------------
The noobaa-db PVC is deleted(along with noobaa-db pod) but the PV stays back in released state.

Expected results:
----------------------------------------------------------------------
With these new changes in Uninstall process, we need to handle the deletion of the noobaa PVC and PV more gracefully.

Additional info:
----------------------------------------------------------------------

following things are expected to be removed on deleting a Storagecluster:

1. Default StorageClass
2. OCS node labels
3. OCS node taints
4. cleanup the cluster namespace on the dataDirHostPath
5. Delete all the ceph monitor directories on the dataDirHostPath. For example mon-a, mon-b, etc.
6. Clean up the devices on each node.

Comment 4 Mudit Agarwal 2020-08-04 12:47:34 UTC
Talur, if I am not wrong one of the commits in https://github.com/openshift/ocs-operator/pull/645 will fix this issue as well.

Comment 5 Raghavendra Talur 2020-08-04 14:10:05 UTC
(In reply to Mudit Agarwal from comment #4)
> Talur, if I am not wrong one of the commits in
> https://github.com/openshift/ocs-operator/pull/645 will fix this issue as
> well.

Not yet. I did attempt to fix it but the PV is still seen in the released state.
I am still debugging.

Comment 6 Jose A. Rivera 2020-08-05 16:33:35 UTC
As discussed in a meeting today between engineering and QE, moving this to OCS 4.6. We will document a workaround for OCS 4.5.

Comment 9 Anat Eyal 2020-08-10 17:19:09 UTC
 jrivera, rtalur, per Comment 7 and Comment 8, it seems that this BZ is already fixed in OCS 4.5. Is this correct? Was it fixed with by Bug 1849105?

Comment 17 errata-xmlrpc 2020-09-15 10:18:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754

Comment 18 Red Hat Bugzilla 2023-09-14 06:04:19 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.