Bug 1885428 - panic seen in rook-ceph during uninstall - "close of closed channel"
Summary: panic seen in rook-ceph during uninstall - "close of closed channel"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: rook
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.6.0
Assignee: Raghavendra Talur
QA Contact: Anna Sandler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-05 22:33 UTC by Raghavendra Talur
Modified: 2021-06-01 08:47 UTC (History)
8 users (show)

Fixed In Version: 4.6.0-116.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-17 06:24:44 UTC
Embargoed:


Attachments (Terms of Use)
rook-ceph-operator log (67.82 KB, text/plain)
2020-10-08 02:24 UTC, Anna Sandler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift rook pull 132 0 None closed Bug 1885428: ceph: prevent closing of channel more than once 2020-11-29 11:00:15 UTC
Red Hat Product Errata RHSA-2020:5605 0 None None None 2020-12-17 06:25:26 UTC

Description Raghavendra Talur 2020-10-05 22:33:17 UTC
Description of problem (please be detailed as possible and provide log
snippests):

During uninstall process rook closes the monitoring channel for daemons. It may attempt close on closed channel leading to a panic and restart of the rook operator pod. 

Version of all relevant components (if applicable):
4.6

Comment 4 Nitin Goyal 2020-10-06 06:04:38 UTC
Upstream PR https://github.com/rook/rook/pull/6369

Comment 5 krishnaram Karthick 2020-10-06 10:10:36 UTC
@Talur,
We haven't seen this issue in our test beds. can you please share the steps to reproduce?

Comment 10 Anna Sandler 2020-10-08 02:22:22 UTC
Added PVC from UI, deleted storagecluster from UI - storagecluster stuck on deleting, panic in the rook-ceph-operator pod is not seen 
[asandler@redhat ~]$ oc get storagecluster
NAME                 AGE   PHASE      EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   8h    Deleting              2020-10-07T17:51:22Z   4.6.0
[asandler@redhat ~]$ oc logs -n openshift-storage rook-ceph-operator-57459f5464-8fcfk | grep panic
[asandler@redhat ~]$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.6.0-116.ci   OpenShift Container Storage   4.6.0-116.ci              Succeeded

is it ok that the storagecluster is not deleted and stuck? (p.s. didn't do all uninstall procedure from start - only deleteing storage cluster from UI)

Comment 11 Anna Sandler 2020-10-08 02:24:02 UTC
Created attachment 1719872 [details]
rook-ceph-operator log

Comment 12 Raghavendra Talur 2020-10-08 05:34:32 UTC
(In reply to Anna Sandler from comment #10)
> Added PVC from UI, deleted storagecluster from UI - storagecluster stuck on
> deleting, panic in the rook-ceph-operator pod is not seen 
> [asandler@redhat ~]$ oc get storagecluster
> NAME                 AGE   PHASE      EXTERNAL   CREATED AT            
> VERSION
> ocs-storagecluster   8h    Deleting              2020-10-07T17:51:22Z   4.6.0
> [asandler@redhat ~]$ oc logs -n openshift-storage
> rook-ceph-operator-57459f5464-8fcfk | grep panic
> [asandler@redhat ~]$ oc get csv
> NAME                         DISPLAY                       VERSION       
> REPLACES   PHASE
> ocs-operator.v4.6.0-116.ci   OpenShift Container Storage   4.6.0-116.ci     
> Succeeded
> 
> is it ok that the storagecluster is not deleted and stuck? (p.s. didn't do
> all uninstall procedure from start - only deleteing storage cluster from UI)


Yes, the delete request will be stuck because of the graceful delete feature. I confirmed by looking at the attached logs.
Relevant line - ""ocs-storagecluster-cephblockpool" has rbd images: pool "ocs-storagecluster-cephblockpool" contains images/snapshosts"


Once you delete the rbd/fs PVCs, the delete request should proceed.

The panic should have occurred already if the bug was not fixed.

Comment 13 Anna Sandler 2020-10-08 16:03:04 UTC
the bug is not seen anymore. moving to verified

Comment 16 errata-xmlrpc 2020-12-17 06:24:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605


Note You need to log in before you can comment on or make changes to this bug.