Bug 1885175

Summary: Handle disappeared underlying device for encrypted OSD
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Sébastien Han <shan>
Component: rookAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Oded <oviner>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: ebenahar, fbalak, madam, muagarwa, ocs-bugs, pbalogh, ratamir
Target Milestone: ---Keywords: Automation
Target Release: OCS 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.6.0-116.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-17 06:24:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sébastien Han 2020-10-05 10:46:53 UTC
During the initialization encrypted OSD initialization sequence, we check
for the presence of the encrypted container. If it exists we don't try
to open it again since this will result in an error.
However, there is another case we need to handle, when the underlying
device is gone. For instance, if the pod/PV couple was drained and move
back, living the orphan dm. Once the pod comes back, the dm is still
present and perfectly matches. Unfortunately, the underlying disk is
different and thus the dm must be removed and the disk re-opened.

Comment 4 Sébastien Han 2020-10-08 08:27:44 UTC
*** Bug 1885666 has been marked as a duplicate of this bug. ***

Comment 5 Oded 2020-10-26 08:59:33 UTC
Setup:
Provider:Vmware
OCP version:4.6.0-0.nightly-2020-10-22-034051
OCS Version:ocs-operator.v4.6.0-141.ci

Test Process:
1.Verify 3 OSDs encrypted:
`-ocs-deviceset-0-data-0-skmxl-block-dmcrypt
         253:1    0   256G  0 crypt 

`-ocs-deviceset-1-data-0-vmnvt-block-dmcrypt
         253:1    0   256G  0 crypt 

`-ocs-deviceset-2-data-0-br2xj-block-dmcrypt
          253:1    0   256G  0 crypt 


2.Scale one of OSDs to 0.
$ oc -n openshift-storage scale --replicas=0 deployment/rook-ceph-osd-2

3.Get OSD pods
osd-2 doesnt exist.
$ oc get pods -n openshift-storage | grep -i osd
rook-ceph-osd-0-7ffffbdf78-qkn9v                                  1/1     Running     0          8h
rook-ceph-osd-1-66678f8bcc-pbxlm                                  1/1     Running     0          8h

4.Wait 15 minutes.

5.Scale the OSD to 1.
$ oc -n openshift-storage scale --replicas=1 deployment/rook-ceph-osd-2

6.Check OSD-2 pod status
$ oc get pods rook-ceph-osd-2-69555cd8cc-blqmb -n openshift-storage 
NAME                               READY   STATUS    RESTARTS   AGE
rook-ceph-osd-2-69555cd8cc-blqmb   1/1     Running   0          90s

7. Check ceph health.
$ oc -n openshift-storage exec rook-ceph-tools-6c7c4c65d9-q5xs2 -- ceph health
HEALTH_OK

Comment 6 Oded 2020-10-26 09:06:24 UTC
Bug not Reconstructed.
Encrypted OSD come up when scaled from 0 to 1.

Comment 9 errata-xmlrpc 2020-12-17 06:24:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605