2147526 – After performing drive detach and attach, the OSD count does not increase

Bug 2147526 - After performing drive detach and attach, the OSD count does not increase

Summary: After performing drive detach and attach, the OSD count does not increase

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Travis Nielsen
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-11-24 07:53 UTC by Bhavana
Modified:	2023-12-08 04:31 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-02-06 23:22:45 UTC
Embargoed:

Attachments	(Terms of Use)
oc get pv (17.64 KB, text/plain) 2022-11-25 06:54 UTC, Bhavana	no flags	Details
oc get pvc (10.91 KB, text/plain) 2022-11-25 06:54 UTC, Bhavana	no flags	Details
oc get csv (586 bytes, text/plain) 2022-11-25 06:55 UTC, Bhavana	no flags	Details
oc get jobs (2.03 KB, text/plain) 2022-11-25 06:56 UTC, Bhavana	no flags	Details
oc logs for rook ceph operator pod (13.44 MB, text/plain) 2022-11-25 06:59 UTC, Bhavana	no flags	Details
oc describe pod for rook ceph operator pod (129.22 KB, text/plain) 2022-11-25 06:59 UTC, Bhavana	no flags	Details
oc describe for one osd pod which is in crashloopback state (20.33 KB, text/plain) 2022-11-25 07:01 UTC, Bhavana	no flags	Details
oc describe storagecluster -n openshift-storage (8.11 KB, text/plain) 2022-11-25 07:01 UTC, Bhavana	no flags	Details
ceph status (211.18 KB, image/png) 2022-11-25 07:03 UTC, Bhavana	no flags	Details
oc version (255.59 KB, image/png) 2022-11-25 07:04 UTC, Bhavana	no flags	Details
oc get cepcluster , oc get storagecluster (114.78 KB, image/png) 2022-11-25 07:05 UTC, Bhavana	no flags	Details
osd_removal_job_pod_describe (627.82 KB, image/png) 2022-11-29 16:14 UTC, Ishwarya Munesh	no flags	Details
oc -n openshift-local-storage describe localvolume local-metadata (2.79 KB, text/plain) 2022-12-05 06:43 UTC, Bhavana	no flags	Details
oc -n openshift-local-storage describe localvolume local-wal (2.46 KB, text/plain) 2022-12-05 06:43 UTC, Bhavana	no flags	Details
ceph status (295.13 KB, image/png) 2022-12-05 07:22 UTC, Bhavana	no flags	Details
ceph osd tree (186.52 KB, image/png) 2022-12-05 07:25 UTC, Bhavana	no flags	Details
pv list before detaching the drive (2.60 KB, text/plain) 2022-12-20 13:21 UTC, Ishwarya Munesh	no flags	Details
pv list after detaching the drive and osd purge (2.26 KB, text/plain) 2022-12-20 13:22 UTC, Ishwarya Munesh	no flags	Details
View All

Description Bhavana 2022-11-24 07:53:03 UTC

Description of problem (please be detailed as possible and provide log
snippets):

1. We have 1 optane drive and 8 nvme drives in each worker node in the OCP 4.11 bare-metal cluster.
2. As part of drive detach testing, after detaching the metadata Optane drive from one of the Worker abruptly, 7 OSD's went down which is expected.
3. We also noticed that 7 pods went to crashloopback state. And the PV's is based on LSO with the partitioning of the metadata device.
4. We have performed 1 to 10 steps (skipped 8th step) in the below link before reattaching the drive and restarted the rook ceph operator pod.
https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.11/html/replacing_devices/openshift_data_foundation_deployed_using_local_storage_devices#replacing-operational-or-failed-storage-devices-on-clusters-backed-by-local-storage-devices_rhodf
5. In the 10th step, We've deleted the PVs corresponding to those 7 OSDs. After deleting the PVC & PV, it is in the available state.
6. The ceph health is in the same ‘WARN’ state showing 7 OSDs as down.

Version of all relevant components (if applicable):
4.11

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes

Is there any workaround available to the best of your knowledge?

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
no

Actual results:
OSD's count not increased after performing these steps in the above link.

Expected results:
OSD's count should increase after performing these steps.

Additional info:

Comment 3 Bhavana 2022-11-24 08:46:00 UTC

Please find the drop box link for must-gather logs:

https://www.dropbox.com/s/3s32abqfo063i9a/must-gather%201.zip?dl=0

ocs-mustgatherlogs:
https://www.dropbox.com/s/jty896cuiz2spfw/ocsmustgatherlogs.zip?dl=0

Comment 4 Bhavana 2022-11-24 08:46:34 UTC

Please find the drop box link for must-gather logs:

https://www.dropbox.com/s/3s32abqfo063i9a/must-gather%201.zip?dl=0

ocs-mustgatherlogs:
https://www.dropbox.com/s/jty896cuiz2spfw/ocsmustgatherlogs.zip?dl=0

Comment 5 Bhavana 2022-11-25 06:54:13 UTC

Created attachment 1927364 [details]
oc get pv

Comment 6 Bhavana 2022-11-25 06:54:46 UTC

Created attachment 1927365 [details]
oc get pvc

Comment 7 Bhavana 2022-11-25 06:55:17 UTC

Created attachment 1927366 [details]
oc get csv

Comment 8 Bhavana 2022-11-25 06:56:03 UTC

Created attachment 1927367 [details]
oc get jobs

Comment 9 Bhavana 2022-11-25 06:59:02 UTC

Created attachment 1927368 [details]
oc logs for rook ceph operator pod

Comment 10 Bhavana 2022-11-25 06:59:43 UTC

Created attachment 1927369 [details]
oc describe pod for rook ceph operator pod

Comment 11 Bhavana 2022-11-25 07:01:02 UTC

Created attachment 1927370 [details]
oc describe for one osd pod which is in crashloopback state

Comment 12 Bhavana 2022-11-25 07:01:50 UTC

Created attachment 1927371 [details]
oc describe storagecluster -n openshift-storage

Comment 13 Bhavana 2022-11-25 07:03:24 UTC

Created attachment 1927372 [details]
ceph status

Comment 14 Bhavana 2022-11-25 07:04:30 UTC

Created attachment 1927373 [details]
oc version

Comment 15 Bhavana 2022-11-25 07:05:49 UTC

Created attachment 1927374 [details]
oc get cepcluster , oc get storagecluster

Comment 16 Travis Nielsen 2022-11-28 17:38:28 UTC

From the ceph status in comment 13, it appears the old OSDs were not fully removed from the cluster. Ceph should not show the existence of those 7 removed OSDs anymore. 

In steps 6 and 7, what did the logs show for the removed OSDs? There must have been a failure at that step.

Comment 18 Ishwarya Munesh 2022-11-29 16:13:53 UTC

Hi Travis,
In step 6, the osd-removal-job was in 'Running' state rather than 'Completed' and while checking the corresponding pod's log (as per step 7) it showed that one of the OSD is 'NOT ok to be destroy' message. Attached the screenshot.

Comment 19 Ishwarya Munesh 2022-11-29 16:14:31 UTC

Created attachment 1928368 [details]
osd_removal_job_pod_describe

Comment 20 Travis Nielsen 2022-11-29 23:17:13 UTC

If the OSD is not ok to destroy, you will either need to wait for the OSD to be safe, or you will need to set the force destroy flag when you run the OSD removal job.

Comment 21 Travis Nielsen 2022-12-02 21:03:20 UTC

Not a 4.12 blocker if the previous OSD wasn't fully removed, moving to 4.13 to complete investigation.

Comment 23 Bhavana 2022-12-05 06:37:59 UTC

We again performed a drive detach test in another cluster (OCP 4.11.12) which has 2 nvme drives and 1 optane drive in each worker node. After detaching the optane drive from the worker2 node, we proceeded the step 6 by setting the force destroy flag, and the osd-removal-job was in a 'Completed' state. After deleting the corresponding PV in the released state, a new OSD pod is not automatically created. While describing the corresponding PV, we are not seeing any errors. Attached is the screenshot for reference.

Comment 24 Bhavana 2022-12-05 06:43:17 UTC

Created attachment 1929993 [details]
oc -n openshift-local-storage describe localvolume local-metadata

Comment 25 Bhavana 2022-12-05 06:43:52 UTC

Created attachment 1929994 [details]
oc -n openshift-local-storage describe localvolume local-wal

Comment 26 Bhavana 2022-12-05 07:22:12 UTC

Created attachment 1930018 [details]
ceph status

Comment 27 Bhavana 2022-12-05 07:25:04 UTC

Created attachment 1930020 [details]
ceph osd tree

Comment 28 Bhavana 2022-12-05 07:41:20 UTC

Please find the dropbox link for must-gather logs:

https://www.dropbox.com/s/uas7d5626wl4e97/must-gather.tar.gz?dl=0

ocs-mustgatherlogs:

https://www.dropbox.com/s/rkwovpydl8111ld/ocsmustgather.zip?dl=0

Comment 29 Subham Rai 2022-12-05 11:31:09 UTC

What is not working now, from the comment https://bugzilla.redhat.com/show_bug.cgi?id=2147526#c23 seems like you were able to remove the osd.

Comment 30 Ishwarya Munesh 2022-12-05 12:10:25 UTC

Hi Subham,
As per the comment https://bugzilla.redhat.com/show_bug.cgi?id=2147526#c23 we were able to delete the old OSDs successfully by passing the force parameter. The OSD count reduced to 2 from 3 which is expected. But after attaching the drive back, the OSD count did not increase, as per the RH link (https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.11/html/replacing_devices/openshift_data_foundation_deployed_using_local_storage_devices#replacing-operational-or-failed-storage-devices-on-clusters-backed-by-local-storage-devices_rhodf) the OSD will automatically get added once the OSD pods are created. But here we see that the OSD pod did not get created. We even tried deleting the rook-ceph-operator pod, still no change. We waited for many hours even a day, still the OSD pod did not list. 
There were no errors in rook-ceph-rgw pod, oc -n openshift-local-storage describe localvolume local-metadata, oc -n openshift-local-storage describe localvolume local-wal commands too did not list any errors.

Comment 31 Travis Nielsen 2022-12-05 21:05:10 UTC

The OSD prepare log [1] shows that the metadata device may be the issue.

2022-12-04T14:36:37.883434079Z 2022-12-04 14:36:37.883431 D | exec: Running command: stdbuf -oL ceph-volume --log-path /var/log/ceph/ocs-deviceset-0-data-0bk76s raw prepare --bluestore --data /mnt/ocs-deviceset-0-data-0bk76s --block.db /srv/ocs-deviceset-0-metadata-0qlwlc --block.wal /wal/ocs-deviceset-0-wal-0rdxnr
2022-12-04T14:36:38.523696491Z 2022-12-04 14:36:38.523557 I | cephosd: stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
2022-12-04T14:36:38.523696491Z  stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
2022-12-04T14:36:38.523696491Z --> Raw device /srv/ocs-deviceset-0-metadata-0qlwlc is already prepared.

The last line shows that the metadata device is already prepared, so then it skips creating the OSD. 

Were the metadata and wal devices also wiped in addition to the data device? To create the new OSD, all three devices need to be cleaned from the previous OSD. 
Also, do you have the log from the osd purge job to show if the metadata and db PVCs were also deleted?



[1] https://www.dropbox.com/s/rkwovpydl8111ld/ocsmustgather.zip?dl=0&file_subpath=%2Focsmustgather%2Fregistry-redhat-io-odf4-ocs-must-gather-rhel8-sha256-5816523a486a0363410fcdb581457e65b993db0c52829c1b8e5124d78b9abd90%2Fnamespaces%2Fopenshift-storage%2Fpods%2Frook-ceph-osd-prepare-ocs-deviceset-0-data-0bk76s-2jmhk%2Fprovision%2Fprovision%2Flogs%2Fcurrent.log

Comment 32 Ishwarya Munesh 2022-12-07 06:43:54 UTC

Hi Travis,
This is a mixed media setup where we have 2 nvme (slower) drives and 1 Optane (faster) drive connected to each worker (total-3 workers). The data devices are created on 6 nvme drives and metadata, wal devices are created on the 3 Optane drives. The drive detached was the optane drive which is the metadata & wal device.
Is it necessary to remove the corresponding data device also when we detach the metadata/wal device? Since the data device is on seperate nvme drive, its not related to the pulled out metadata device.
Please suggest on how to add back the OSD.

Thanks
Ishwarya M

Comment 33 Travis Nielsen 2022-12-07 18:41:32 UTC

The fundamental issue that an OSD needs to be created on clean device(s). In this case since there is a separate wal, db, and data device, if any of them are replaced, all of them must be wiped or replaced. Otherwise, the remnants from the previous OSD will prevent the creation of a new OSD on them.

Comment 34 Ishwarya Munesh 2022-12-12 06:06:13 UTC

Hi Travis,
We have below queries related to wiping out the drive in drive detach scenario,
	1. How to find out the corresponding data drive for the detached metadata drive?
	2. In our cluster, 'osd-0' was the one that did not come up after re-attaching the drive, so we wiped (dd command) the data drive corresponding to the deviceset-0. Before this, we removed its PVC and its PV. On applying the local-data.yaml file, there was no error thrown yet the local-data PV did not add back.
	3. On restarting the rook-ceph operator pod, the rook-ceph-osd-1 pod went to 'Crashloopbackoff' state since the PV of the corresponding data drive is not added back. ODF must gather logs is placed in https://www.dropbox.com/s/0zu8orbxfwf8b2y/odf_must_gather_12dec.tar.gz?dl=0

	4. Can you please provide your inputs on how to recover the OSDs from this state.
Also, if its expected to wipe/replace all data,metadata, wal devices on replacing one of the drive, this will not be possible from a customer environment. If any one of the drive goes faulty and needs a replacement, we cannot expect the customer to identify the relative drives to wipe out.  Anyone would expect the data to be recovered automatically on replacing the faulty drive alone. Can you please explain on why such behavior is designed in ODF.

Thanks
Ishwarya M

Comment 35 Travis Nielsen 2022-12-13 18:49:12 UTC

(In reply to Ishwarya Munesh from comment #34)
> Hi Travis,
> We have below queries related to wiping out the drive in drive detach
> scenario,
> 	1. How to find out the corresponding data drive for the detached metadata
> drive?

When you purged the OSD, you ran the job template to purge the OSD, correct?
This job should have purged the OSDs and deleted all the related PVCs for
data, metadata, and wal. While the purge doesn't wipe the devices, it's expected
that their PVCs and PVs were deleted, and then the next time the operator 
reconciles the OSDs, it would create new PVCs in their place. Then if there
are no new PVs available, the OSD creation should wait for a new clean PV
where the PVC can be bound. 

But since it sounds like something is missing for you in that flow, 
the PVCs for the metadata, wal, and data devices are all named in a related way.
You should see the PVCs named similar to:
- set1-data-0-<suffix>
- set1-metadata-0-<suffix>
- set1-wal-0-<suffix>

Where the suffix is a random ID, and "set1" is the name of the storageClassDeviceSet.
In this example "0" is a simple index for OSDs that were created from the same storageClassDeviceSet.
From the PVCs, you can see which PVs they are bound to.

If the PVCs had been deleted by the purge job, then finding the PVs for the
deleted PVCs would be more difficult.

> 	2. In our cluster, 'osd-0' was the one that did not come up after
> re-attaching the drive, so we wiped (dd command) the data drive
> corresponding to the deviceset-0. Before this, we removed its PVC and its
> PV. On applying the local-data.yaml file, there was no error thrown yet the
> local-data PV did not add back.

What is the local-data.yaml file? Is this for creating the local PVs?
Did you clean the meatadata and wal PVs yet? The osd prepare log is just
indicating that they cannot be re-used.

> 	3. On restarting the rook-ceph operator pod, the rook-ceph-osd-1 pod went
> to 'Crashloopbackoff' state since the PV of the corresponding data drive is
> not added back. ODF must gather logs is placed in
> https://www.dropbox.com/s/0zu8orbxfwf8b2y/odf_must_gather_12dec.tar.gz?dl=0
> 
> 	4. Can you please provide your inputs on how to recover the OSDs from this
> state.
> Also, if its expected to wipe/replace all data,metadata, wal devices on
> replacing one of the drive, this will not be possible from a customer
> environment. If any one of the drive goes faulty and needs a replacement, we
> cannot expect the customer to identify the relative drives to wipe out. 
> Anyone would expect the data to be recovered automatically on replacing the
> faulty drive alone. Can you please explain on why such behavior is designed
> in ODF.

ODF is designed to keep OSDs running. If there is ever a question about whether
an OSD should be removed or wiped, ODF expects the admin to be involved in
that decision so that ODF doesn't automatically remove any data and cause data
loss accidentally. 

This scenario of having a metadata, wal, and data PV for the OSD is not common,
so we should certainly improve the scenario to at least make it easier to
identify corresponding devices. 

Ultimately, this isn't a managed service where the cloud storage can be fully 
and automatically managed when hardware dies. When underlying devices are
replaced, it is a disruptive change for sure where storage admins will need
to handle it.

> 
> Thanks
> Ishwarya M

Comment 36 Ishwarya Munesh 2022-12-14 16:12:22 UTC

(In reply to Travis Nielsen from comment #35)
> (In reply to Ishwarya Munesh from comment #34)
> > Hi Travis,
> > We have below queries related to wiping out the drive in drive detach
> > scenario,
> > 	1. How to find out the corresponding data drive for the detached metadata
> > drive?
> 
> When you purged the OSD, you ran the job template to purge the OSD, correct?
> This job should have purged the OSDs and deleted all the related PVCs for
> data, metadata, and wal. While the purge doesn't wipe the devices, it's
> expected
> that their PVCs and PVs were deleted, and then the next time the operator 
> reconciles the OSDs, it would create new PVCs in their place. Then if there
> are no new PVs available, the OSD creation should wait for a new clean PV
> where the PVC can be bound. 
> 
> But since it sounds like something is missing for you in that flow, 
> the PVCs for the metadata, wal, and data devices are all named in a related
> way.
> You should see the PVCs named similar to:
> - set1-data-0-<suffix>
> - set1-metadata-0-<suffix>
> - set1-wal-0-<suffix>
> 
> Where the suffix is a random ID, and "set1" is the name of the
> storageClassDeviceSet.
> In this example "0" is a simple index for OSDs that were created from the
> same storageClassDeviceSet.
> From the PVCs, you can see which PVs they are bound to.
> 
> If the PVCs had been deleted by the purge job, then finding the PVs for the
> deleted PVCs would be more difficult.
> 
> > 	2. In our cluster, 'osd-0' was the one that did not come up after
> > re-attaching the drive, so we wiped (dd command) the data drive
> > corresponding to the deviceset-0. Before this, we removed its PVC and its
> > PV. On applying the local-data.yaml file, there was no error thrown yet the
> > local-data PV did not add back.
> 
> What is the local-data.yaml file? Is this for creating the local PVs?
> Did you clean the meatadata and wal PVs yet? The osd prepare log is just
> indicating that they cannot be re-used.

>>> Yes, the local-data.yaml file is for creating the local-data PVs. The metadata and wal PVs were deleted during the process of deleting the previous OSDs after detaching the drive.
> 
> > 	3. On restarting the rook-ceph operator pod, the rook-ceph-osd-1 pod went
> > to 'Crashloopbackoff' state since the PV of the corresponding data drive is
> > not added back. ODF must gather logs is placed in
> > https://www.dropbox.com/s/0zu8orbxfwf8b2y/odf_must_gather_12dec.tar.gz?dl=0
> > 
> > 	4. Can you please provide your inputs on how to recover the OSDs from this
> > state.
> > Also, if its expected to wipe/replace all data,metadata, wal devices on
> > replacing one of the drive, this will not be possible from a customer
> > environment. If any one of the drive goes faulty and needs a replacement, we
> > cannot expect the customer to identify the relative drives to wipe out. 
> > Anyone would expect the data to be recovered automatically on replacing the
> > faulty drive alone. Can you please explain on why such behavior is designed
> > in ODF.
> 
> ODF is designed to keep OSDs running. If there is ever a question about
> whether
> an OSD should be removed or wiped, ODF expects the admin to be involved in
> that decision so that ODF doesn't automatically remove any data and cause
> data
> loss accidentally. 
> 
> This scenario of having a metadata, wal, and data PV for the OSD is not
> common,
>>> Can you provide more clarity on what are the other possible ways of configuring data, metadata and wal ??
> so we should certainly improve the scenario to at least make it easier to
> identify corresponding devices. 
> 
> Ultimately, this isn't a managed service where the cloud storage can be
> fully 
> and automatically managed when hardware dies. When underlying devices are
> replaced, it is a disruptive change for sure where storage admins will need
> to handle it.
> 

Also we performed the same steps in OCP 4.10 and we were able to see the OSD got added after reattaching the drive. Was there any change in OCP 4.11 requiring the data,metadata,wal devices to be wiped for the OSD to add?

Comment 37 Travis Nielsen 2022-12-14 21:01:12 UTC

> Also we performed the same steps in OCP 4.10 and we were able to see the OSD got added after reattaching the drive. Was there any change in OCP 4.11 requiring the data,metadata,wal devices to be wiped for the OSD to add?

I cannot think of a change that would have affected this replacement of the OSDs between 4.10 and 4.11. If you can gather the logs for the osd purge job for 4.10 and 4.11, hopefully we can see the difference about why there is a difference in the cleanup that would allow it to work in 4.10.

Comment 38 Ishwarya Munesh 2022-12-20 13:19:38 UTC

Hi Travis,
We once again performed a complete ODF cleanup and re-installed ODF, storagecluster and performed the drive detach scenario again.
Steps followed:
Initial configuration of Mixed media setup:
Worker nodes -3
1. Each worker has 3 drives (2-nvme and 1 Optane (faster drive))
2. 12 PVs were created on these drives - 3 data, 3 metadata, 3 wal, 3 mon-pods
3. Metadata and wal PVs were created on each Optane drive by creating 2 partitions
4. 12 PVC created
5. Storage cluster is created with replica -3
6. Number of OSDs - 3, ceph was healthy
7. Detached one Optane drive (metadata & wal partitions) from one worker node
8. Number of OSDs reduced to 2 as expected
9. Purged the old OSDs as per the steps provided here - https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.11/html/replacing_devices/openshift_data_foundation_deployed_using_local_storage_devices#replacing-operational-or-failed-storage-devices-on-clusters-backed-by-local-storage-devices_rhodf
10. After deleting the PV (step 10 in above link), its corresponding data PV also got removed. Hope this is as expected as per your previous comment. (In the attached txt file - getpvafter.txt, deviceset-2 related PVs not present)
11. After re-attaching the drive back, the data, metadata and wal PVs got created. PVC also got created. 
12. As per your earlier comments, cleaned up the data, wal/metadata drive by performing dd command and also sgdisk zap command, still the OSD-2 was not added back.
13. ODF must-gather logs is placed in https://www.dropbox.com/s/ss11mz1i982oifi/odf_mustgat_20dec.tar.gz?dl=0
for reference
Can you please check the attached logs and let us know what is missing here and how to recover the OSD.

Also can you let us know how to get the logs of OSD purge job?

Thanks
Ishwarya M

Comment 39 Ishwarya Munesh 2022-12-20 13:21:39 UTC

Created attachment 1933764 [details]
pv list before detaching the drive

Comment 40 Ishwarya Munesh 2022-12-20 13:22:18 UTC

Created attachment 1933766 [details]
pv list after detaching the drive and osd purge

Comment 41 Travis Nielsen 2022-12-20 17:52:10 UTC

From the osd prepare log for pod [1] I see that the OSD is not being created because metadata device appears to be not clean:

Raw device /srv/ocs-deviceset-2-metadata-0gfzvc is already prepared


Here is more of the osd prepare log:

2022-12-20T07:39:29.188095716Z 2022-12-20 07:39:29.188089 D | cephosd: device "/srv/ocs-deviceset-2-metadata-0gfzvc" is a metadata or wal device, skipping this iteration it will be used in the next one
2022-12-20T07:39:29.188095716Z 2022-12-20 07:39:29.188093 D | cephosd: device "/wal/ocs-deviceset-2-wal-0vwsnk" is a metadata or wal device, skipping this iteration it will be used in the next one
2022-12-20T07:39:29.188101190Z 2022-12-20 07:39:29.188097 I | cephosd: configuring new device "/mnt/ocs-deviceset-2-data-04ck6p"
2022-12-20T07:39:29.188101190Z 2022-12-20 07:39:29.188099 I | cephosd: devlink names:
2022-12-20T07:39:29.188105289Z 2022-12-20 07:39:29.188101 I | cephosd: /dev/disk/by-id/nvme-INTEL_SSDPF2KX076TZ_PHAC112401YW7P6CGN
2022-12-20T07:39:29.188105289Z 2022-12-20 07:39:29.188102 I | cephosd: /dev/disk/by-path/pci-0000:31:00.0-nvme-1
2022-12-20T07:39:29.188109150Z 2022-12-20 07:39:29.188104 I | cephosd: /dev/disk/by-id/nvme-eui.01000000000000005cd2e41a4d3e5351
2022-12-20T07:39:29.188112860Z 2022-12-20 07:39:29.188109 D | exec: Running command: stdbuf -oL ceph-volume --log-path /var/log/ceph/ocs-deviceset-2-data-04ck6p raw prepare --bluestore --data /mnt/ocs-deviceset-2-data-04ck6p --block.db /srv/ocs-deviceset-2-metadata-0gfzvc --block.wal /wal/ocs-deviceset-2-wal-0vwsnk
2022-12-20T07:39:30.298429727Z 2022-12-20 07:39:30.298392 I | cephosd: stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
2022-12-20T07:39:30.298429727Z  stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
2022-12-20T07:39:30.298429727Z --> Raw device /srv/ocs-deviceset-2-metadata-0gfzvc is already prepared.


[1] rook-ceph-osd-prepare-ocs-deviceset-2-data-04ck6p-t72td pod logs in the must-gather

Comment 42 Ishwarya Munesh 2022-12-21 05:52:37 UTC

Hi Travis,
Tried cleaning up the mentioned metadata device via dd and zap commands, restarted the ceph-rook-operator pod, still the OSD did not come up. Is there any step to be performed after cleaning up the drive?

Thanks
Ishwarya M

Comment 43 Ishwarya Munesh 2022-12-21 06:51:10 UTC

Travis,
ODF must gather collected after cleaning up the metadata device is placed here for reference-https://www.dropbox.com/s/048fvqmoytu4g6w/odf_mustgather_21dec.tar.gz?dl=0
We still see the message 'already prepared' in the logs. Let us know what step should be done after cleaning up the drives.

Comment 47 Bertrand 2023-01-03 15:37:51 UTC

Hello Travis,

Just making sure you saw the latest comment from Ishwarya at https://bugzilla.redhat.com/show_bug.cgi?id=2147526#c43

Comment 48 Travis Nielsen 2023-01-03 22:54:12 UTC

(In reply to Bertrand from comment #47)
> Hello Travis,
> 
> Just making sure you saw the latest comment from Ishwarya at
> https://bugzilla.redhat.com/show_bug.cgi?id=2147526#c43

Yes, back from break now...

The latest must-gather still shows the metadata device was already prepared for a prior OSD. Something is still not fully wiped on the metadata device so Ceph still will not create the new volume.

2022-12-21T05:13:20.570792887Z 2022-12-21 05:13:20.570787 D | exec: Running command: stdbuf -oL ceph-volume --log-path /var/log/ceph/ocs-deviceset-2-data-04ck6p raw prepare --bluestore --data /mnt/ocs-deviceset-2-data-04ck6p --block.db /srv/ocs-deviceset-2-metadata-0gfzvc --block.wal /wal/ocs-deviceset-2-wal-0vwsnk
2022-12-21T05:13:21.632710555Z 2022-12-21 05:13:21.632574 I | cephosd: stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
2022-12-21T05:13:21.632710555Z  stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
2022-12-21T05:13:21.632710555Z --> Raw device /srv/ocs-deviceset-2-metadata-0gfzvc is already prepared.

From the steps in comment 38, the OSDs were successfully created after the reinstall. The issue is just during the OSD replacement. What steps were different when cleaning up for the full reinstall? There must be something else that cleaned the PV that worked for the full install, but was missed for the OSD replacement. The requirement for the clean metadata PV/device is the same in either case.

Comment 49 Travis Nielsen 2023-01-17 15:09:08 UTC

Is this solved now so we can close this issue?

Comment 52 Travis Nielsen 2023-02-06 23:22:45 UTC

Please reopen if there is still an issue

Comment 53 Red Hat Bugzilla 2023-12-08 04:31:29 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.