Bug 1885175

Summary:	Handle disappeared underlying device for encrypted OSD
Product:	[Red Hat Storage] Red Hat OpenShift Container Storage	Reporter:	Sébastien Han <shan>
Component:	rook	Assignee:	Sébastien Han <shan>
Status:	CLOSED ERRATA	QA Contact:	Oded <oviner>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.6	CC:	ebenahar, fbalak, madam, muagarwa, ocs-bugs, pbalogh, ratamir
Target Milestone:	---	Keywords:	Automation
Target Release:	OCS 4.6.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	4.6.0-116.ci	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-12-17 06:24:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Sébastien Han 2020-10-05 10:46:53 UTC

During the initialization encrypted OSD initialization sequence, we check
for the presence of the encrypted container. If it exists we don't try
to open it again since this will result in an error.
However, there is another case we need to handle, when the underlying
device is gone. For instance, if the pod/PV couple was drained and move
back, living the orphan dm. Once the pod comes back, the dm is still
present and perfectly matches. Unfortunately, the underlying disk is
different and thus the dm must be removed and the disk re-opened.

Comment 4 Sébastien Han 2020-10-08 08:27:44 UTC

*** Bug 1885666 has been marked as a duplicate of this bug. ***

Comment 5 Oded 2020-10-26 08:59:33 UTC

Setup:
Provider:Vmware
OCP version:4.6.0-0.nightly-2020-10-22-034051
OCS Version:ocs-operator.v4.6.0-141.ci

Test Process:
1.Verify 3 OSDs encrypted:
`-ocs-deviceset-0-data-0-skmxl-block-dmcrypt
         253:1    0   256G  0 crypt 

`-ocs-deviceset-1-data-0-vmnvt-block-dmcrypt
         253:1    0   256G  0 crypt 

`-ocs-deviceset-2-data-0-br2xj-block-dmcrypt
          253:1    0   256G  0 crypt 


2.Scale one of OSDs to 0.
$ oc -n openshift-storage scale --replicas=0 deployment/rook-ceph-osd-2

3.Get OSD pods
osd-2 doesnt exist.
$ oc get pods -n openshift-storage | grep -i osd
rook-ceph-osd-0-7ffffbdf78-qkn9v                                  1/1     Running     0          8h
rook-ceph-osd-1-66678f8bcc-pbxlm                                  1/1     Running     0          8h

4.Wait 15 minutes.

5.Scale the OSD to 1.
$ oc -n openshift-storage scale --replicas=1 deployment/rook-ceph-osd-2

6.Check OSD-2 pod status
$ oc get pods rook-ceph-osd-2-69555cd8cc-blqmb -n openshift-storage 
NAME                               READY   STATUS    RESTARTS   AGE
rook-ceph-osd-2-69555cd8cc-blqmb   1/1     Running   0          90s

7. Check ceph health.
$ oc -n openshift-storage exec rook-ceph-tools-6c7c4c65d9-q5xs2 -- ceph health
HEALTH_OK

Comment 6 Oded 2020-10-26 09:06:24 UTC

Bug not Reconstructed.
Encrypted OSD come up when scaled from 0 to 1.

Comment 9 errata-xmlrpc 2020-12-17 06:24:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605