Bug 1947482

Summary:	The device replacement process when deleting the volume metadata need to be fixed or modified
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	Itzhak <ikave>
Component:	rook	Assignee:	Travis Nielsen <tnielsen>
Status:	CLOSED ERRATA	QA Contact:	Itzhak <ikave>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.7	CC:	bniver, ebenahar, kramdoss, madam, muagarwa, nberry, ocs-bugs, odf-bz-bot, rcyriac, tnielsen
Target Milestone:	---
Target Release:	ODF 4.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	4.10.0-175	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-08-24 13:48:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Itzhak 2021-04-08 14:54:47 UTC

Description of problem (please be detailed as possible and provide log
snippets): When trying to delete the metadata volume in the feature of "Multiple device type deployment for OSD" and followed the device replacement procedure. The pv that was released was the pv associated with the data instead of the pv associated with the metadata.

Version of all relevant components (if applicable):
OCP version:
Client Version: 4.6.0-0.nightly-2021-01-12-112514
Server Version: 4.7.0-0.nightly-2021-03-27-082615
Kubernetes Version: v1.20.0+bafe72f

OCS verison:
ocs-operator.v4.7.0-324.ci OpenShift Container Storage 4.7.0-324.ci Succeeded

cluster version
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.0-0.nightly-2021-03-27-082615 True False 8d Cluster version is 4.7.0-0.nightly-2021-03-27-082615

Rook version
rook: 4.7-121.436d4ed74.release_4.7
go: go1.15.7

Ceph version
ceph version 14.2.11-138.el8cp (18a95d26e01b87abf3e47e9f01f615b8d2dd03c4) nautilus (stable)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes. It makes the process of device replacement more complicated.

Is there any workaround available to the best of your knowledge?
Yes.
First, follow the process described in the device replacement.
After that, you will see the OSD prepare pod associated with the OSD stuck on init state, and the osd prepare job associated with the OSD is not completed.

Delete the osd prepare job, and the OSD prepare pod. Delete the PVC associated with the OSD. Then you will see one of the PVs associated with the OSD in the released state.
Delete the PV in the released state, and the new OSD should come up after a few moments.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
No

If this is a regression, please provide more details to justify this:

Steps to Reproduce:
I tested it with a vSphere LSO 4.7 cluster with the Multiple device type deployment for OSD (storage on HDD & metadata on SSD/NVME).

steps I did to reproduce the bug:

1. Go to the vSphere platform where the cluster is located, and delete one disk of the metadata.

2. Look at the terminal and see that one of the OSD's is down:
$ oc get pods | grep osd
rook-ceph-osd-0-d8dc85899-kmfsw 1/2 CrashLoopBackOff 5 8d
rook-ceph-osd-1-65876f5857-7kqc6 2/2 Running 0 8d
rook-ceph-osd-2-98cdf8584-5d9bj 2/2 Running 0 17h
rook-ceph-osd-prepare-ocs-deviceset-0-data-0s92pg-gllvd 0/1 Completed 0 8d
rook-ceph-osd-prepare-ocs-deviceset-1-data-0ctw6g-gxm4x 0/1 Completed 0 8d
rook-ceph-osd-prepare-ocs-deviceset-2-data-0n4g49-lfw5z 0/1 Completed 0 17h

Follow the device replacement process as described in the docs of vSphere LSO.
Actual results:
The pv that was released was the pv associated with the data instead of the pv associated with the metadata. And Ceph health is not OK at the end of the process.

Expected results:
The pv that was released associated with the metadata, or try to simplify the process. Ceph health should be OK at the end of the process.

Additional info:

Comment 2 Itzhak 2021-04-08 15:01:56 UTC

Additional info: 

Here are some of the outputs after I finished with the device replacement process: 

$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                               STORAGECLASS                  REASON   AGE
local-pv-16b477a8                          8Gi        RWO            Delete           Bound       openshift-storage/ocs-deviceset-2-metadata-0djkkm   localblock-metadata                    21h
local-pv-27e19a3a                          200Gi      RWO            Delete           Bound       openshift-storage/ocs-deviceset-2-data-0n4g49       localblock                             21h
local-pv-71e9b59c                          200Gi      RWO            Delete           Bound       openshift-storage/ocs-deviceset-0-data-04lgrj       localblock                             120m
local-pv-8b3c930                           200Gi      RWO            Delete           Bound       openshift-storage/ocs-deviceset-1-data-0ctw6g       localblock                             8d
local-pv-ad86817c                          8Gi        RWO            Delete           Bound       openshift-storage/ocs-deviceset-1-metadata-0hkrsl   localblock-metadata                    8d
local-pv-c5440a6f                          8Gi        RWO            Delete           Available                                                       localblock-metadata                    111m
local-pv-d1771d53                          8Gi        RWO            Delete           Bound       openshift-storage/ocs-deviceset-0-metadata-0n4wbl   localblock-metadata                    8d
pvc-ba427b93-4d0b-40b3-90ec-44692abfb7cb   50Gi       RWO            Delete           Bound       openshift-storage/db-noobaa-db-pg-0                 ocs-storagecluster-ceph-rbd            8d


$ oc get pods | grep osd
rook-ceph-osd-1-65876f5857-7kqc6                                  2/2     Running     0          8d
rook-ceph-osd-2-98cdf8584-5d9bj                                   2/2     Running     0          17h
rook-ceph-osd-prepare-ocs-deviceset-0-data-04lgrj-2f7bq           0/1     Init:0/3    0          32m
rook-ceph-osd-prepare-ocs-deviceset-1-data-0ctw6g-gxm4x           0/1     Completed   0          8d
rook-ceph-osd-prepare-ocs-deviceset-2-data-0n4g49-lfw5z           0/1     Completed   0          17h


$ oc get jobs
NAME                                                COMPLETIONS   DURATION   AGE
rook-ceph-osd-prepare-ocs-deviceset-0-data-04lgrj   0/1           34m        34m
rook-ceph-osd-prepare-ocs-deviceset-1-data-0ctw6g   1/1           13s        8d
rook-ceph-osd-prepare-ocs-deviceset-2-data-0n4g49   1/1           13s        17h


As you can see, Ceph didn't recognize the newly available PV metadata after the process has finished

Comment 3 Mudit Agarwal 2021-06-09 17:21:07 UTC

Doesn't look like a blocker for 4.8, moving it out.

Comment 6 Kesavan 2021-09-27 07:15:00 UTC

Hi Mudit, 
Currently, I don't have a solution for this and we require someone from Ceph team.

@Travis Need help on the device replacement process in case of multiple device deployment (OSD + metadata + wal device). Does rook already support it? .

Comment 7 Mudit Agarwal 2021-09-27 07:20:06 UTC

Thanks Kesavan, will request Ceph team to take a look as well.

Comment 8 Travis Nielsen 2021-09-27 19:47:01 UTC

(In reply to Kesavan from comment #6)
> Hi Mudit, 
> Currently, I don't have a solution for this and we require someone from Ceph
> team.
> 
> @Travis Need help on the device replacement process in case of multiple
> device deployment (OSD + metadata + wal device). Does rook already support
> it? .

When using a metadata/wal device, you would have to replace all OSDs using the same metadata/wal device, which usually means:
- Replace all OSDs on a node if the metadata device dies
- If a single OSD dies, leave it dead until all OSDs on the node can be wiped and replaced

Comment 9 Scott Ostapovicz 2022-02-23 14:51:32 UTC

Travis are you suggesting that the proper process was not followed?

Comment 10 Travis Nielsen 2022-02-23 18:47:01 UTC

Taking another look, I see that the osd purge job does not delete the metadata PVC associated with the OSD that is being purged. Rook is only deleting the data PVC as observed in the bug. In my previous response I must have missed that the osd purge job was being used.

Comment 11 Travis Nielsen 2022-02-28 16:56:47 UTC

Moving back to 4.10 since we have a fix before dev freeze.

Comment 15 Mudit Agarwal 2022-03-03 09:58:01 UTC

Should we add doc text?

Comment 16 Travis Nielsen 2022-03-03 18:29:35 UTC

(In reply to Mudit Agarwal from comment #15)
> Should we add doc text?

Do we need doc text for 4.10 issues? This is such an uncommon case to remove an OSD on a cluster with metadata devices, not sure it's worth adding doc text.

Comment 17 Itzhak 2022-04-05 14:10:07 UTC

Can we move it to 4.11? I think this BZ is not so important at the moment.

Comment 18 Travis Nielsen 2022-04-05 14:15:33 UTC

Can it be verified with 4.10.z? While the scenario isn't critical, the risk of regression could also be there until the BZ is verified.

Comment 19 Mudit Agarwal 2022-04-15 07:22:41 UTC

Travis already answered.

Comment 23 Itzhak 2022-08-18 09:00:17 UTC

Okay. I am moving it to Verified.

Comment 25 errata-xmlrpc 2022-08-24 13:48:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156