1885428 – panic seen in rook-ceph during uninstall - "close of closed channel"

Bug 1885428 - panic seen in rook-ceph during uninstall - "close of closed channel"

Summary: panic seen in rook-ceph during uninstall - "close of closed channel"

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.6.0
Assignee:	Raghavendra Talur
QA Contact:	Anna Sandler
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-05 22:33 UTC by Raghavendra Talur
Modified:	2021-06-01 08:47 UTC (History)
CC List:	8 users (show)
Fixed In Version:	4.6.0-116.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-12-17 06:24:44 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
rook-ceph-operator log (67.82 KB, text/plain) 2020-10-08 02:24 UTC, Anna Sandler	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift rook pull 132	0	None	closed	Bug 1885428: ceph: prevent closing of channel more than once	2020-11-29 11:00:15 UTC
Red Hat Product Errata	RHSA-2020:5605	0	None	None	None	2020-12-17 06:25:26 UTC

Description Raghavendra Talur 2020-10-05 22:33:17 UTC

Description of problem (please be detailed as possible and provide log
snippests):

During uninstall process rook closes the monitoring channel for daemons. It may attempt close on closed channel leading to a panic and restart of the rook operator pod. 

Version of all relevant components (if applicable):
4.6

Comment 4 Nitin Goyal 2020-10-06 06:04:38 UTC

Upstream PR https://github.com/rook/rook/pull/6369

Comment 5 krishnaram Karthick 2020-10-06 10:10:36 UTC

@Talur,
We haven't seen this issue in our test beds. can you please share the steps to reproduce?

Comment 10 Anna Sandler 2020-10-08 02:22:22 UTC

Added PVC from UI, deleted storagecluster from UI - storagecluster stuck on deleting, panic in the rook-ceph-operator pod is not seen 
[asandler@redhat ~]$ oc get storagecluster
NAME                 AGE   PHASE      EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   8h    Deleting              2020-10-07T17:51:22Z   4.6.0
[asandler@redhat ~]$ oc logs -n openshift-storage rook-ceph-operator-57459f5464-8fcfk | grep panic
[asandler@redhat ~]$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.6.0-116.ci   OpenShift Container Storage   4.6.0-116.ci              Succeeded

is it ok that the storagecluster is not deleted and stuck? (p.s. didn't do all uninstall procedure from start - only deleteing storage cluster from UI)

Comment 11 Anna Sandler 2020-10-08 02:24:02 UTC

Created attachment 1719872 [details]
rook-ceph-operator log

Comment 12 Raghavendra Talur 2020-10-08 05:34:32 UTC

(In reply to Anna Sandler from comment #10)
> Added PVC from UI, deleted storagecluster from UI - storagecluster stuck on
> deleting, panic in the rook-ceph-operator pod is not seen 
> [asandler@redhat ~]$ oc get storagecluster
> NAME                 AGE   PHASE      EXTERNAL   CREATED AT            
> VERSION
> ocs-storagecluster   8h    Deleting              2020-10-07T17:51:22Z   4.6.0
> [asandler@redhat ~]$ oc logs -n openshift-storage
> rook-ceph-operator-57459f5464-8fcfk | grep panic
> [asandler@redhat ~]$ oc get csv
> NAME                         DISPLAY                       VERSION       
> REPLACES   PHASE
> ocs-operator.v4.6.0-116.ci   OpenShift Container Storage   4.6.0-116.ci     
> Succeeded
> 
> is it ok that the storagecluster is not deleted and stuck? (p.s. didn't do
> all uninstall procedure from start - only deleteing storage cluster from UI)


Yes, the delete request will be stuck because of the graceful delete feature. I confirmed by looking at the attached logs.
Relevant line - ""ocs-storagecluster-cephblockpool" has rbd images: pool "ocs-storagecluster-cephblockpool" contains images/snapshosts"


Once you delete the rbd/fs PVCs, the delete request should proceed.

The panic should have occurred already if the bug was not fixed.

Comment 13 Anna Sandler 2020-10-08 16:03:04 UTC

the bug is not seen anymore. moving to verified

Comment 16 errata-xmlrpc 2020-12-17 06:24:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605

Note You need to log in before you can comment on or make changes to this bug.