Bug 1868060 - [External Cluster] Noobaa-default-backingstore PV in released state upon OCS 4.5 uninstall (Secret not found)
Summary: [External Cluster] Noobaa-default-backingstore PV in released state upon OCS ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.6.0
Assignee: Raghavendra Talur
QA Contact: Sidhant Agrawal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-11 14:29 UTC by Neha Berry
Modified: 2020-12-17 06:24 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-17 06:23:13 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:5605 0 None None None 2020-12-17 06:24:28 UTC

Description Neha Berry 2020-08-11 14:29:19 UTC
Description of problem (please be detailed as possible and provide log
snippests):
-------------------------------------------------------------------------
Created an external mode OCS cluster without providing any RGW detail while uploading the JSON. In the absence of RGW, the noobaa-default-backingstore failed back on the PV-pool using RBD SC

Following the steps in [1], performed followjg steps:

1. Did not delete the default BS even though it was using RBD PVC as it is noobaa default resource

2. Deleted the Storagecluster which deleted the default BS, POD and PVC
  $ oc delete storagecluster --all -n openshift-storage
storagecluster.ocs.openshift.io "ocs-external-storagecluster" deleted


3. Deleted the namespace openshift-storage
$ oc delete project openshift-storage
project.project.openshift.io "openshift-storage" deleted

Observations
--------------------

a) Even though the default-BS, PVC, Pods (prefix noobaa-default-backing-store) are removed as part Step#2, the PV stays back in Released state

======= PV ====
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                                                STORAGECLASS                           REASON   AGE
pvc-e7c62946-69c5-4459-a61d-11ac22225c77   50Gi       RWO            Delete           Released   openshift-storage/noobaa-default-backing-store-noobaa-pvc-091f4acd   ocs-external-storagecluster-ceph-rbd            18h
Tue Aug 11 11:00:15 UTC 2020

Events:
  Type     Reason              Age                  From                                                                                                                Message
  ----     ------              ----                 ----                                                                                                                -------
  Warning  VolumeFailedDelete  6m4s (x8 over 7m8s)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-848b585c9c-snnb4_1261a91e-9556-474a-8ef4-f51e8a1ff371  rpc error: code = Internal desc = provided secret is empty



[1] https://docs.google.com/document/d/1BYMZFdyhXC8FMEe3lKlonKkUDmeDM_JsQ1uzLAP-Gns/edit#heading=h.hnqudcywbkuf

Logs before Uninstall - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bug-1866155-C5/
Logs after uninstall was not collected as openshift-storage namespace was deleted


Version of all relevant components (if applicable):
-----------------------------------------------------
OCS = ocs-operator.v4.5.0-521.ci
OCP = 4.5.0-0.nightly-2020-08-07-024812

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
----------------------------------------------------
No. But either we need to fix this or document manual removal of the PV(not a good user experience)

Is there any workaround available to the best of your knowledge?
----------------------------------------------------------
Delete the released PV manually
$ oc delete pv <pv name>

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
--------------------------------------------------
4

Can this issue reproducible?
-----------------------------
Yes. reproduced on BM based external cluster too

Can this issue reproduce from the UI?
---------------------------------
NA

If this is a regression, please provide more details to justify this:
----------------------------------------
PV-pool is new in OCS 4.5

Steps to Reproduce:
1. Generate JSON on the external RHCS cluster without passing 
   # python3 ceph_exporter.py --rbd-data-pool-name ocs-cephblockpool-dc8

2. Upload the JSON to OCS Create Storage Cluster Page. 
3. In the absence of RGW details, noobaa will use the PV-pool for backingstore creation
4. Uninstall OCS using the deployment guide
5. check the PV status for the noobaa-default-backing-store-noobaa-pvc*** . It is seen in released state.



Actual results:
--------------------
noobaa-default-backing-store-noobaa-pvc*** is in Released state after uninstall

Expected results:
------------------------
noobaa-default-backing-store-noobaa-pvc*** should be deleted as part of OCS uninstall.

Additional info:
----------------------
Before uninstall;
---------------------


======= PVC ==========
NAME                                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
db-noobaa-db-0                                     Bound    pvc-a6e336e5-b7a6-4807-b395-4d4fb46871af   50Gi       RWO            ocs-external-storagecluster-ceph-rbd   25m
noobaa-default-backing-store-noobaa-pvc-091f4acd   Bound    pvc-e7c62946-69c5-4459-a61d-11ac22225c77   50Gi       RWO            ocs-external-storagecluster-ceph-rbd   23m


=====PODS===

noobaa-db-0                                        1/1     Running   0          25m   10.131.0.34   compute-0   <none>           <none>
noobaa-default-backing-store-noobaa-pod-091f4acd   1/1     Running   0          23m   10.131.0.35   compute-0   <none>           <none>

======= backingstore ==========
NAME                           TYPE      PHASE   AGE
noobaa-default-backing-store   pv-pool   Ready   23m

spec:
  pvPool:
    numVolumes: 1
    resources:
      requests:
        storage: 50Gi
    secret: {}
  type: pv-pool
status:



======= PV ====
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                                STORAGECLASS                           REASON   AGE
pvc-a6e336e5-b7a6-4807-b395-4d4fb46871af   50Gi       RWO            Delete           Bound    openshift-storage/db-noobaa-db-0                                     ocs-external-storagecluster-ceph-rbd            25m
pvc-e7c62946-69c5-4459-a61d-11ac22225c77   50Gi       RWO            Delete           Bound    openshift-storage/noobaa-default-backing-store-noobaa-pvc-091f4acd   ocs-external-storagecluster-ceph-rbd            24m


After uninstall
---------------------

$ oc describe pv pvc-e7c62946-69c5-4459-a61d-11ac22225c77
Name:            pvc-e7c62946-69c5-4459-a61d-11ac22225c77
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: openshift-storage.rbd.csi.ceph.com
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    ocs-external-storagecluster-ceph-rbd
Status:          Released
Claim:           openshift-storage/noobaa-default-backing-store-noobaa-pvc-091f4acd
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        50Gi
Node Affinity:   <none>
Message:         
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            openshift-storage.rbd.csi.ceph.com
    FSType:            ext4
    VolumeHandle:      0001-0011-openshift-storage-000000000000000a-6c21e347-db29-11ea-bc96-0a580a80021e
    ReadOnly:          false
    VolumeAttributes:      clusterID=openshift-storage
                           imageFeatures=layering
                           imageFormat=2
                           journalPool=ocs-cephblockpool-dc8
                           pool=ocs-cephblockpool-dc8
                           storage.kubernetes.io/csiProvisionerIdentity=1597078034671-8081-openshift-storage.rbd.csi.ceph.com
Events:
  Type     Reason              Age                  From                                                                                                                Message
  ----     ------              ----                 ----                                                                                                                -------
  Warning  VolumeFailedDelete  6m4s (x8 over 7m8s)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-848b585c9c-snnb4_1261a91e-9556-474a-8ef4-f51e8a1ff371  rpc error: code = Internal desc = provided secret is empty

Comment 2 Mudit Agarwal 2020-08-11 14:35:32 UTC
How is this different from https://bugzilla.redhat.com/show_bug.cgi?id=1860418?

Comment 3 Neha Berry 2020-08-11 15:38:02 UTC
(In reply to Mudit Agarwal from comment #2)
> How is this different from
> https://bugzilla.redhat.com/show_bug.cgi?id=1860418?

In Bug 1860418, it was the noobaa-db PV

In this BZ, it is a noobaa-default-backingstore-pv

This will be seen only when noobaa-default-BS is backed by PV-pool and not RGW

Comment 5 Elad 2020-08-18 09:23:59 UTC
Hi Talur, is there a bug where we track the change needed in the 4.5 uninstall procedure?

Comment 6 Raghavendra Talur 2020-08-19 13:26:13 UTC
Hi Elad,

The changes have been made in the uninstall doc.

In most cases, using the default uninstall procedure should be sufficient.
In case the bug is hit, the steps to be followed are provided as a link to the troubleshooting guide.

The BZ used to make the doc changes is https://bugzilla.redhat.com/show_bug.cgi?id=1849532

Comment 7 Elad 2020-08-19 13:34:47 UTC
Ack. Thanks Talur

Comment 8 Jose A. Rivera 2020-10-05 13:41:13 UTC
Hey Talur, are we ready to fix this one in OCS 4.6?

Comment 9 Raghavendra Talur 2020-10-08 16:32:29 UTC
(In reply to Jose A. Rivera from comment #8)
> Hey Talur, are we ready to fix this one in OCS 4.6?

Yes, this should be fixed in 4.6 now with the serialization in the delete process.

Comment 10 Raghavendra Talur 2020-10-08 16:34:38 UTC
Sidhant,

Please provide QA ack.

Comment 13 Mudit Agarwal 2020-10-09 11:09:56 UTC
This should have been fixed with recent changes, moving it to ON_QA

Comment 17 errata-xmlrpc 2020-12-17 06:23:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605


Note You need to log in before you can comment on or make changes to this bug.