Bug 1885428

Summary: panic seen in rook-ceph during uninstall - "close of closed channel"
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Raghavendra Talur <rtalur>
Component: rookAssignee: Raghavendra Talur <rtalur>
Status: CLOSED ERRATA QA Contact: Anna Sandler <asandler>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: ebenahar, kramdoss, madam, muagarwa, nberry, nigoyal, ocs-bugs, rtalur
Target Milestone: ---Keywords: AutomationBackLog
Target Release: OCS 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.6.0-116.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-17 06:24:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rook-ceph-operator log none

Description Raghavendra Talur 2020-10-05 22:33:17 UTC
Description of problem (please be detailed as possible and provide log
snippests):

During uninstall process rook closes the monitoring channel for daemons. It may attempt close on closed channel leading to a panic and restart of the rook operator pod. 

Version of all relevant components (if applicable):
4.6

Comment 4 Nitin Goyal 2020-10-06 06:04:38 UTC
Upstream PR https://github.com/rook/rook/pull/6369

Comment 5 krishnaram Karthick 2020-10-06 10:10:36 UTC
@Talur,
We haven't seen this issue in our test beds. can you please share the steps to reproduce?

Comment 10 Anna Sandler 2020-10-08 02:22:22 UTC
Added PVC from UI, deleted storagecluster from UI - storagecluster stuck on deleting, panic in the rook-ceph-operator pod is not seen 
[asandler@redhat ~]$ oc get storagecluster
NAME                 AGE   PHASE      EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   8h    Deleting              2020-10-07T17:51:22Z   4.6.0
[asandler@redhat ~]$ oc logs -n openshift-storage rook-ceph-operator-57459f5464-8fcfk | grep panic
[asandler@redhat ~]$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.6.0-116.ci   OpenShift Container Storage   4.6.0-116.ci              Succeeded

is it ok that the storagecluster is not deleted and stuck? (p.s. didn't do all uninstall procedure from start - only deleteing storage cluster from UI)

Comment 11 Anna Sandler 2020-10-08 02:24:02 UTC
Created attachment 1719872 [details]
rook-ceph-operator log

Comment 12 Raghavendra Talur 2020-10-08 05:34:32 UTC
(In reply to Anna Sandler from comment #10)
> Added PVC from UI, deleted storagecluster from UI - storagecluster stuck on
> deleting, panic in the rook-ceph-operator pod is not seen 
> [asandler@redhat ~]$ oc get storagecluster
> NAME                 AGE   PHASE      EXTERNAL   CREATED AT            
> VERSION
> ocs-storagecluster   8h    Deleting              2020-10-07T17:51:22Z   4.6.0
> [asandler@redhat ~]$ oc logs -n openshift-storage
> rook-ceph-operator-57459f5464-8fcfk | grep panic
> [asandler@redhat ~]$ oc get csv
> NAME                         DISPLAY                       VERSION       
> REPLACES   PHASE
> ocs-operator.v4.6.0-116.ci   OpenShift Container Storage   4.6.0-116.ci     
> Succeeded
> 
> is it ok that the storagecluster is not deleted and stuck? (p.s. didn't do
> all uninstall procedure from start - only deleteing storage cluster from UI)


Yes, the delete request will be stuck because of the graceful delete feature. I confirmed by looking at the attached logs.
Relevant line - ""ocs-storagecluster-cephblockpool" has rbd images: pool "ocs-storagecluster-cephblockpool" contains images/snapshosts"


Once you delete the rbd/fs PVCs, the delete request should proceed.

The panic should have occurred already if the bug was not fixed.

Comment 13 Anna Sandler 2020-10-08 16:03:04 UTC
the bug is not seen anymore. moving to verified

Comment 16 errata-xmlrpc 2020-12-17 06:24:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605