Bug 1885175 - Handle disappeared underlying device for encrypted OSD
Summary: Handle disappeared underlying device for encrypted OSD
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: rook
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: OCS 4.6.0
Assignee: Sébastien Han
QA Contact: Oded
URL:
Whiteboard:
: 1885666 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-05 10:46 UTC by Sébastien Han
Modified: 2020-12-17 06:25 UTC (History)
7 users (show)

Fixed In Version: 4.6.0-116.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-17 06:24:44 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift rook pull 130 0 None closed Bug 1885175: ceph: check underlaying block status 2020-11-22 10:49:40 UTC
Github rook rook pull 6367 0 None closed ceph: check underlaying block status 2020-11-22 10:49:40 UTC
Red Hat Product Errata RHSA-2020:5605 0 None None None 2020-12-17 06:25:26 UTC

Description Sébastien Han 2020-10-05 10:46:53 UTC
During the initialization encrypted OSD initialization sequence, we check
for the presence of the encrypted container. If it exists we don't try
to open it again since this will result in an error.
However, there is another case we need to handle, when the underlying
device is gone. For instance, if the pod/PV couple was drained and move
back, living the orphan dm. Once the pod comes back, the dm is still
present and perfectly matches. Unfortunately, the underlying disk is
different and thus the dm must be removed and the disk re-opened.

Comment 4 Sébastien Han 2020-10-08 08:27:44 UTC
*** Bug 1885666 has been marked as a duplicate of this bug. ***

Comment 5 Oded 2020-10-26 08:59:33 UTC
Setup:
Provider:Vmware
OCP version:4.6.0-0.nightly-2020-10-22-034051
OCS Version:ocs-operator.v4.6.0-141.ci

Test Process:
1.Verify 3 OSDs encrypted:
`-ocs-deviceset-0-data-0-skmxl-block-dmcrypt
         253:1    0   256G  0 crypt 

`-ocs-deviceset-1-data-0-vmnvt-block-dmcrypt
         253:1    0   256G  0 crypt 

`-ocs-deviceset-2-data-0-br2xj-block-dmcrypt
          253:1    0   256G  0 crypt 


2.Scale one of OSDs to 0.
$ oc -n openshift-storage scale --replicas=0 deployment/rook-ceph-osd-2

3.Get OSD pods
osd-2 doesnt exist.
$ oc get pods -n openshift-storage | grep -i osd
rook-ceph-osd-0-7ffffbdf78-qkn9v                                  1/1     Running     0          8h
rook-ceph-osd-1-66678f8bcc-pbxlm                                  1/1     Running     0          8h

4.Wait 15 minutes.

5.Scale the OSD to 1.
$ oc -n openshift-storage scale --replicas=1 deployment/rook-ceph-osd-2

6.Check OSD-2 pod status
$ oc get pods rook-ceph-osd-2-69555cd8cc-blqmb -n openshift-storage 
NAME                               READY   STATUS    RESTARTS   AGE
rook-ceph-osd-2-69555cd8cc-blqmb   1/1     Running   0          90s

7. Check ceph health.
$ oc -n openshift-storage exec rook-ceph-tools-6c7c4c65d9-q5xs2 -- ceph health
HEALTH_OK

Comment 6 Oded 2020-10-26 09:06:24 UTC
Bug not Reconstructed.
Encrypted OSD come up when scaled from 0 to 1.

Comment 9 errata-xmlrpc 2020-12-17 06:24:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605


Note You need to log in before you can comment on or make changes to this bug.