Bug 1927338 - Uninstall OCS: Include events for major CRs to know the cause of deletion getting stuck
Summary: Uninstall OCS: Include events for major CRs to know the cause of deletion get...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: OCS 4.7.0
Assignee: Nitin Goyal
QA Contact: Mugdha Soni
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-10 14:56 UTC by Neha Berry
Modified: 2021-05-19 09:19 UTC (History)
7 users (show)

Fixed In Version: 4.7.0-272.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-19 09:19:24 UTC
Embargoed:


Attachments (Terms of Use)
Events from console. (71.22 KB, image/png)
2021-03-23 15:07 UTC, Mugdha Soni
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 1058 0 None closed storagecluster: Update uninstall error as an events 2021-02-17 05:14:53 UTC
Github openshift ocs-operator pull 1083 0 None open Bug 1927338: [release-4.7] storagecluster: Update uninstall error as an events 2021-02-17 05:22:55 UTC
Red Hat Product Errata RHSA-2021:2041 0 None None None 2021-05-19 09:19:48 UTC

Description Neha Berry 2021-02-10 14:56:13 UTC
Description of problem (please be detailed as possible and provide log
snippests):


Version of all relevant components (if applicable):


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:


Additional info:

Comment 10 Mugdha Soni 2021-03-23 15:07:42 UTC
Created attachment 1765577 [details]
Events from console.

Comment 11 Mugdha Soni 2021-03-23 15:09:56 UTC
Hi Nitin

Performed the following steps to reproduce the issue :-

1.Created a OCS 4.7 cluster.
  JOB LINK :-https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/1517/  (accessible only for a few hours, as it will be deleted automatically by a cleanup job we have for AWS)
  
2. Created few PVC and waited for them to be in bound state .

   [root@localhost ocs_b_23]# oc get pvc -n openshift-storage
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-pg-0             Bound    pvc-afdbd9c9-e439-4202-9079-19efd703cc22   50Gi       RWO            ocs-storagecluster-ceph-rbd   44m
ocs-deviceset-0-data-0h7tdg   Bound    pvc-8bebc774-1512-40cb-8bb4-cc27c6abd66d   512Gi      RWO            gp2                           45m
ocs-deviceset-1-data-0fflmx   Bound    pvc-c8f1dfd4-6a45-4576-8bf2-d5166223cdeb   512Gi      RWO            gp2                           45m
ocs-deviceset-2-data-0s68fk   Bound    pvc-b0428f00-1857-4aa6-928a-05e36c7486e5   512Gi      RWO            gp2                           45m
rook-ceph-mon-a               Bound    pvc-929bde63-3bf8-4edf-8d97-12afba731e52   10Gi       RWO            gp2                           47m
rook-ceph-mon-b               Bound    pvc-1f5b1105-c91a-42ff-904c-b5650c1d787b   10Gi       RWO            gp2                           47m
rook-ceph-mon-c               Bound    pvc-f99b891b-4758-4ab1-99db-378a119c096d   10Gi       RWO            gp2                           47m
test                          Bound    pvc-0893baaf-1813-4b27-b120-4d828b715231   5Gi        RWO            ocs-storagecluster-ceph-rbd   19m
testt                         Bound    pvc-2d51e2a4-c4df-43ef-bbb2-170991e68824   10Gi       RWO            ocs-storagecluster-cephfs     19m

'test' and 'testt' were the pvc's created .

3. From UI deleted storagecluster and checked for the events under Storage cluster . There were events present stating "Uninstall: Waiting for cephCluster to be deleted" . (Screenshot from UI is attached for reference)

4. Checked for events in " oc describe storagecluster -n openshift-storage" and could find the event stating "Uninstall: Waiting for cephCluster to be deleted".

[root@localhost ocs_b_23]# oc describe storagecluster -n openshift-storage
Name:         ocs-storagecluster
Namespace:    openshift-storage
Labels:       <none>
Annotations:  uninstall.ocs.openshift.io/cleanup-policy: delete
              uninstall.ocs.openshift.io/mode: graceful
API Version:  ocs.openshift.io/v1
Kind:         StorageCluster
Metadata:
  Creation Timestamp:             2021-03-23T12:16:48Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2021-03-23T13:42:01Z
  Finalizers:
    storagecluster.ocs.openshift.io
  Generation:  3
  Managed Fields:
    API Version:  ocs.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
    Manager:      kubectl-create
    Operation:    Update
    Time:         2021-03-23T12:16:48Z
    API Version:  ocs.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:uninstall.ocs.openshift.io/cleanup-policy:
          f:uninstall.ocs.openshift.io/mode:
        f:finalizers:
      f:spec:
        f:arbiter:
        f:encryption:
          .:
          f:kms:
        f:externalStorage:
        f:managedResources:
          .:
          f:cephBlockPools:
          f:cephConfig:
          f:cephFilesystems:
          f:cephObjectStoreUsers:
          f:cephObjectStores:
        f:storageDeviceSets:
        f:version:
      f:status:
        .:
        f:conditions:
        f:failureDomain:
        f:failureDomainKey:
        f:failureDomainValues:
        f:images:
          .:
          f:ceph:
            .:
            f:actualImage:
            f:desiredImage:
          f:noobaaCore:
            .:
            f:actualImage:
            f:desiredImage:
          f:noobaaDB:
            .:
            f:actualImage:
            f:desiredImage:
        f:nodeTopologies:
          .:
          f:labels:
            .:
            f:kubernetes.io/hostname:
            f:topology.kubernetes.io/region:
            f:topology.kubernetes.io/zone:
        f:phase:
        f:relatedObjects:
    Manager:         ocs-operator
    Operation:       Update
    Time:            2021-03-23T12:19:51Z
  Resource Version:  68798
  Self Link:         /apis/ocs.openshift.io/v1/namespaces/openshift-storage/storageclusters/ocs-storagecluster
  UID:               99fe5cd5-1fc9-412c-91f4-2fc3e0a56668
Spec:
  Arbiter:
  Encryption:
    Kms:
  External Storage:
  Managed Resources:
    Ceph Block Pools:
    Ceph Config:
    Ceph Filesystems:
    Ceph Object Store Users:
    Ceph Object Stores:
  Storage Device Sets:
    Config:
    Count:  1
    Data PVC Template:
      Metadata:
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:         512Gi
        Storage Class Name:  gp2
        Volume Mode:         Block
      Status:
    Name:  ocs-deviceset
    Placement:
    Portable:  true
    Prepare Placement:
    Replica:  3
    Resources:
  Version:  4.7.0
Status:
  Conditions:
    Last Heartbeat Time:   2021-03-23T13:41:59Z
    Last Transition Time:  2021-03-23T12:16:49Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  ReconcileComplete
    Last Heartbeat Time:   2021-03-23T13:41:59Z
    Last Transition Time:  2021-03-23T12:21:30Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  Available
    Last Heartbeat Time:   2021-03-23T13:41:59Z
    Last Transition Time:  2021-03-23T12:21:30Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                False
    Type:                  Progressing
    Last Heartbeat Time:   2021-03-23T13:41:59Z
    Last Transition Time:  2021-03-23T12:16:48Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                False
    Type:                  Degraded
    Last Heartbeat Time:   2021-03-23T13:41:59Z
    Last Transition Time:  2021-03-23T12:21:30Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  Upgradeable
  Failure Domain:          zone
  Failure Domain Key:      topology.kubernetes.io/zone
  Failure Domain Values:
    us-east-2a
    us-east-2b
    us-east-2c
  Images:
    Ceph:
      Actual Image:   quay.io/rhceph-dev/rhceph@sha256:a334f5429bc9c5ff1175e616fd0c9d1765457ead727a036005125ba3747cc5b3
      Desired Image:  quay.io/rhceph-dev/rhceph@sha256:a334f5429bc9c5ff1175e616fd0c9d1765457ead727a036005125ba3747cc5b3
    Noobaa Core:
      Actual Image:   quay.io/rhceph-dev/mcg-core@sha256:54d2ea9d4e18f6c4bb1a11dfec741d1adb62c34d98ca4c488f9b06c070a794d3
      Desired Image:  quay.io/rhceph-dev/rhceph@sha256:a334f5429bc9c5ff1175e616fd0c9d1765457ead727a036005125ba3747cc5b3
    Noobaa DB:
      Actual Image:   registry.redhat.io/rhel8/postgresql-12@sha256:ed859e2054840467e9a0ffc310ddf74ff64a8743c236598ca41c7557d8cdc767
      Desired Image:  registry.redhat.io/rhel8/postgresql-12@sha256:ed859e2054840467e9a0ffc310ddf74ff64a8743c236598ca41c7557d8cdc767
  Node Topologies:
    Labels:
      kubernetes.io/hostname:
        ip-10-0-136-40
        ip-10-0-168-2
        ip-10-0-193-239
      topology.kubernetes.io/region:
        us-east-2
      topology.kubernetes.io/zone:
        us-east-2a
        us-east-2b
        us-east-2c
  Phase:  Deleting
  Related Objects:
    API Version:       ceph.rook.io/v1
    Kind:              CephCluster
    Name:              ocs-storagecluster-cephcluster
    Namespace:         openshift-storage
    Resource Version:  68518
    UID:               4ffbb78c-068f-461c-9d70-1931007af13c
    API Version:       noobaa.io/v1alpha1
    Kind:              NooBaa
    Name:              noobaa
    Namespace:         openshift-storage
    Resource Version:  68764
    UID:               26e295a8-f883-4b6b-a019-86b82ef86e16
Events:
  Type     Reason            Age                     From                       Message
  ----     ------            ----                    ----                       -------
  Warning  UninstallPending  44m (x2 over 44m)       controller_storagecluster  Uninstall: Waiting on NooBaa system noobaa to be deleted
  Warning  UninstallPending  4m45s (x4373 over 44m)  controller_storagecluster  Uninstall: Waiting for cephCluster to be deleted


Thanks for the enhancement. But we still cannot see why cephcluster deletion is stuck ?
For this user still needs to check some logs, rook's Ceph cluster events or some other way around?


Thanks 
Mugdha Soni

Comment 13 Mugdha Soni 2021-03-24 08:29:58 UTC
Based on the comment#11 and comment#12 moving this bug to verified state .

Comment 15 errata-xmlrpc 2021-05-19 09:19:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041


Note You need to log in before you can comment on or make changes to this bug.