Bug 2053490 - Red Hat OpenShift Data Foundation deployment issue [NEEDINFO]
Summary: Red Hat OpenShift Data Foundation deployment issue
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Blaine Gardner
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-11 12:13 UTC by adrian.podlawski
Modified: 2023-08-09 17:03 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-21 16:43:59 UTC
Embargoed:
brgardne: needinfo? (adrian.podlawski)
tnielsen: needinfo? (adrian.podlawski)


Attachments (Terms of Use)
ocs-logs (269.59 KB, text/plain)
2022-02-11 12:13 UTC, adrian.podlawski
no flags Details
odf-logs (123.70 KB, text/plain)
2022-02-11 12:14 UTC, adrian.podlawski
no flags Details
ocs-error (3.49 KB, text/plain)
2022-02-11 12:15 UTC, adrian.podlawski
no flags Details
cluster setup (1.16 KB, text/plain)
2022-02-11 12:17 UTC, adrian.podlawski
no flags Details
osd-logs (1.95 KB, text/plain)
2022-02-11 12:25 UTC, adrian.podlawski
no flags Details

Description adrian.podlawski 2022-02-11 12:13:50 UTC
Created attachment 1860594 [details]
ocs-logs

Description of problem:
I created the StorageCluster with an updated YAML file for 4.9 ODF version. (odf-cluster.yaml)
At first attempt 3 prepare-ocs pods were marked as completed, but I was able to see only 2 OSD pods running. I found in OCS-prepare pods information that the device was already provisioned. The same situation had a place a few times for the 4.8 setup.
I made a cleanup and created StorageCluster again, with the same YAML file. In the second attempt all OSD-s were provided, but the StorageCluster was stuck in the “Progressing” state. I found some issues in the ODF operator and also in the OCS operator (logs in attachments).  Could you help us to find the root cause?


Version-Release number of selected component (if applicable): 4.9

How reproducible: 100%

Steps to Reproduce:
1. Create a StorageCluster with yaml file.

Actual results:
StorageCluster is in Progressing state/missing OSD

Comment 1 adrian.podlawski 2022-02-11 12:14:28 UTC
Created attachment 1860595 [details]
odf-logs

Comment 2 adrian.podlawski 2022-02-11 12:15:52 UTC
Created attachment 1860596 [details]
ocs-error

Comment 3 adrian.podlawski 2022-02-11 12:17:20 UTC
Created attachment 1860597 [details]
cluster setup

Comment 4 adrian.podlawski 2022-02-11 12:25:32 UTC
Created attachment 1860600 [details]
osd-logs

Comment 7 Blaine Gardner 2022-02-14 16:32:01 UTC
I believe this is a case of Rook operating as intended. From the OSD prepare pod logs shared (relevant line copied below), Rook is reporting that it found an OSD belonging to a different Ceph cluster. Rook will not clobber existing data on a disk in order to deploy an OSD to preserve user data on the disk. If you wish Rook to deploy successfully on that disk, you must wipe it.

2022-02-08 19:41:02.828518 I | cephosd: skipping device "/wal/ocs-deviceset-2-wal-0rswt2": failed to detect if there is already an osd. osd.7: "17498860-4536-42fc-981e-c6e8df6d7d89" belonging to a different ceph cluster "77dca5b1-3d9b-436a-94a8-6c35fef679a8".

Generally `sgdisk --zap` is sufficient to wipe the disk. I have also recommended using `dd` to zero out the first 2MB of disk to ensure LVM metadata is removed.

Comment 8 Mudit Agarwal 2022-02-15 13:42:20 UTC
Lowering the severity, please justify the urgent severity if it is required.

Comment 9 Travis Nielsen 2022-02-28 16:23:49 UTC
Moving to 4.11 while waiting for confirmation if this is an issue

Comment 10 Travis Nielsen 2022-03-07 16:24:10 UTC
Did the previous comment help resolve the issue? If we don't hear back in the next week we will close the issue, thanks.

Comment 11 Sébastien Han 2022-03-21 16:43:59 UTC
Closing due to lack of information. Please open again if you encounter this issue.
Thanks!


Note You need to log in before you can comment on or make changes to this bug.