+++ This bug was initially created as a clone of Bug #2274757 +++ +++ This bug was initially created as a clone of Bug #2273398 +++ Description of problem (please be detailed as possible and provide log snippests): Customer upgraded from 4.12.47 to 4.14.16 we have noticed that all OSDs are in crahs loop with the expand-bluefs container showing errors about devices that can not be found. Version of all relevant components (if applicable): ODF 4.14.6 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, all osds are down Is there any workaround available to the best of your knowledge? No Can this issue reproducible? yes, at customer environment --- Additional comment from RHEL Program Management on 2024-04-04 15:14:00 UTC --- This bug having no release flag set previously, is now set with release flag 'odf‑4.16.0' to '?', and so is being proposed to be fixed at the ODF 4.16.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag. --- Additional comment from RHEL Program Management on 2024-04-04 15:14:00 UTC --- Since this bug has severity set to 'urgent', it is being proposed as a blocker for the currently set release flag. Please resolve ASAP. --- Additional comment from RHEL Program Management on 2024-04-04 15:14:00 UTC --- The 'Target Release' is not to be set manually at the Red Hat OpenShift Data Foundation product. The 'Target Release' will be auto set appropriately, after the 3 Acks (pm,devel,qa) are set to "+" for a specific release flag and that release flag gets auto set to "+". --- Additional comment from Manjunatha on 2024-04-04 15:15:20 UTC --- + Attached osd.4 pod description here + ceph osd pods are in crashloop state with below messages. Error: ceph-username is required for osd rook error: ceph-username is required for osd Usage: rook ceph osd init [flags] Flags: --cluster-id string the UID of the cluster CR that owns this cluster --cluster-name string the name of the cluster CR that owns this cluster --encrypted-device whether to encrypt the OSD with dmcrypt -h, --help help for init --is-device whether the osd is a device --location string location of this node for CRUSH placement --node-name string the host name of the node (default "rook-ceph-osd-1-6db9cfc7c9-294jn") --osd-crush-device-class string The device class for all OSDs configured on this node --osd-crush-initial-weight string The initial weight of OSD in TiB units --osd-database-size int default size (MB) for OSD database (bluestore) --osd-id int osd id for which to generate config (default -1) --osd-store-type string the osd store type such as bluestore (default "bluestore") --osd-wal-size int default size (MB) for OSD write ahead log (WAL) (bluestore) (default 576) --osds-per-device int the number of OSDs per device (default 1) Global Flags: --log-level string logging level for logging/tracing output (valid values: ERROR,WARNING,INFO,DEBUG) (default "INFO") '/usr/local/bin/rook' -> '/rook/rook' + PVC_SOURCE=/ocs-deviceset-0-1-78k4w + PVC_DEST=/mnt/ocs-deviceset-0-1-78k4w + CP_ARGS=(--archive --dereference --verbose) + '[' -b /mnt/ocs-deviceset-0-1-78k4w ']' ++ stat --format %t%T /ocs-deviceset-0-1-78k4w + PVC_SOURCE_MAJ_MIN=8e0 ++ stat --format %t%T /mnt/ocs-deviceset-0-1-78k4w + PVC_DEST_MAJ_MIN=8e0 + [[ 8e0 == \8\e\0 ]] + echo 'PVC /mnt/ocs-deviceset-0-1-78k4w already exists and has the same major and minor as /ocs-deviceset-0-1-78k4w: 8e0' PVC /mnt/ocs-deviceset-0-1-78k4w already exists and has the same major and minor as /ocs-deviceset-0-1-78k4w: 8e0 + exit 0 inferring bluefs devices from bluestore path unable to read label for /var/lib/ceph/osd/ceph-1: (2) No such file or directory 2024-04-04T13:22:38.461+0000 7f41cddbf900 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (2) No such file or directory --- Additional comment from Manjunatha on 2024-04-04 15:15:49 UTC --- --- Additional comment from Manjunatha on 2024-04-04 15:20:40 UTC --- odf mustgather logs in below path path /cases/03783266/0020-must-gather-odf.tar.gz/must-gather.local.8213025456072446876/inspect.local.6887047306785235156/namespaces/openshift-storage This issue looks similar to bz https://bugzilla.redhat.com/show_bug.cgi?id=2254378 --------------------------- Events from osd-0 deployment: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 23m default-scheduler Successfully assigned openshift-storage/rook-ceph-osd-0-6764d4c675-f9w2m to storage-00.dev-intranet-01-wob.ocp.vwgroup.com Normal SuccessfulMountVolume 23m kubelet MapVolume.MapPodDevice succeeded for volume "local-pv-abfd62bb" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-abfd62bb" Normal SuccessfulMountVolume 23m kubelet MapVolume.MapPodDevice succeeded for volume "local-pv-abfd62bb" volumeMapPath "/var/lib/kubelet/pods/2ae21d6a-2aa5-43e8-8c0d-7ecb250656a2/volumeDevices/kubernetes.io~local-volume" Normal AddedInterface 23m multus Add eth0 [100.72.0.27/23] from ovn-kubernetes Normal Pulling 23m kubelet Pulling image "registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:23041cec90d0c64f043deb9f5b589c2fe3b2e29163cf7576324341ad855affcc" Normal Pulled 23m kubelet Successfully pulled image "registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:23041cec90d0c64f043deb9f5b589c2fe3b2e29163cf7576324341ad855affcc" in 14.120104593s (14.120122047s including waiting) Normal Created 23m kubelet Created container config-init Normal Started 23m kubelet Started container config-init Normal Pulled 23m kubelet Container image "registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:23041cec90d0c64f043deb9f5b589c2fe3b2e29163cf7576324341ad855affcc" already present on machine Normal Created 23m kubelet Created container copy-bins Normal Started 23m kubelet Started container copy-bins Normal Pulled 23m kubelet Container image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226" already present on machine Normal Created 23m kubelet Created container blkdevmapper Normal Started 23m kubelet Started container blkdevmapper Normal Pulled 23m (x3 over 23m) kubelet Container image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226" already present on machine Normal Created 23m (x3 over 23m) kubelet Created container expand-bluefs Normal Started 23m (x3 over 23m) kubelet Started container expand-bluefs Warning BackOff 3m43s (x94 over 23m) kubelet Back-off restarting failed container expand-bluefs in pod rook-ceph-osd-0-6764d4c675-f9w2m_openshift-storage(2ae21d6a-2aa5-43e8-8c0d-7ecb250656a2) --- Additional comment from Andreas Bleischwitz on 2024-04-04 15:29:15 UTC --- TAM update about the business impact: All developers on this platform are being blocked due to this issue. VW also has a lot of VMs running on this bare-metal cluster on which they do all the testing and simulation of car components. This is now all down and they are not able to work. --- Additional comment from Manjunatha on 2024-04-04 15:46:36 UTC --- Customer rebooted the osd nodes few times as suggested below solution but that dint help https://access.redhat.com/solutions/7015095 --- Additional comment from Andreas Bleischwitz on 2024-04-04 15:53:31 UTC --- The customer just informed us that this is a very "old" cluster starting with 4.3.18 and ODF was installed about 3.5 years ago. So there may be a lot of tweaks/leftovers/* in this cluster. --- Additional comment from Manjunatha on 2024-04-04 16:32:07 UTC --- Latest mustgather and sosreport from storage node 0 in below supportshell path: /cases/03783266 cluster: id: 18c9800f-7f91-4994-ad32-2a8a330babd6 health: HEALTH_WARN 1 filesystem is degraded 1 MDSs report slow metadata IOs 7 osds down 2 hosts (10 osds) down 2 racks (10 osds) down Reduced data availability: 505 pgs inactive services: mon: 3 daemons, quorum b,f,g (age 4h) mgr: a(active, since 4h) mds: 1/1 daemons up, 1 standby osd: 15 osds: 4 up (since 23h), 11 in (since 23h); 5 remapped pgs data: volumes: 0/1 healthy, 1 recovering pools: 12 pools, 505 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: 100.000% pgs unknown 505 unknown ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 26.19896 root default -4 8.73299 rack rack0 -3 8.73299 host storage-00-dev-intranet-01-wob-ocp-vwgroup-com 0 ssd 1.74660 osd.0 down 1.00000 1.00000 1 ssd 1.74660 osd.1 down 1.00000 1.00000 2 ssd 1.74660 osd.2 down 1.00000 1.00000 3 ssd 1.74660 osd.3 down 1.00000 1.00000 4 ssd 1.74660 osd.4 down 1.00000 1.00000 -12 8.73299 rack rack1 -11 8.73299 host storage-01-dev-intranet-01-wob-ocp-vwgroup-com 6 ssd 1.74660 osd.6 up 1.00000 1.00000 7 ssd 1.74660 osd.7 up 1.00000 1.00000 8 ssd 1.74660 osd.8 up 1.00000 1.00000 9 ssd 1.74660 osd.9 up 1.00000 1.00000 10 ssd 1.74660 osd.10 down 0 1.00000 -8 8.73299 rack rack2 -7 8.73299 host storage-02-dev-intranet-01-wob-ocp-vwgroup-com 5 ssd 1.74660 osd.5 down 1.00000 1.00000 11 ssd 1.74660 osd.11 down 0 1.00000 12 ssd 1.74660 osd.12 down 0 1.00000 13 ssd 1.74660 osd.13 down 0 1.00000 14 ssd 1.74660 osd.14 down 1.00000 1.00000 ... --- Additional comment from on 2024-04-04 18:29:13 UTC --- Seems like the backing device was removed or moved?: ````` PVC /mnt/ocs-deviceset-0-1-78k4w already exists and has the same major and minor as /ocs-deviceset-0-1-78k4w: 8e0 + exit 0 inferring bluefs devices from bluestore path unable to read label for /var/lib/ceph/osd/ceph-1: (2) No such file or directory ````` // confirm the device for osd-1 $ omc get pods rook-ceph-osd-1-6db9cfc7c9-294jn -o yaml|grep device ceph.rook.io/DeviceSet: ocs-deviceset-0 ceph.rook.io/pvc: ocs-deviceset-0-1-78k4w device-class: ssd name: devices name: ocs-deviceset-0-1-78k4w-bridge - "\nset -xe\n\nPVC_SOURCE=/ocs-deviceset-0-1-78k4w\nPVC_DEST=/mnt/ocs-deviceset-0-1-78k4w\nCP_ARGS=(--archive - devicePath: /ocs-deviceset-0-1-78k4w name: ocs-deviceset-0-1-78k4w name: ocs-deviceset-0-1-78k4w-bridge name: ocs-deviceset-0-1-78k4w-bridge name: devices name: ocs-deviceset-0-1-78k4w-bridge name: devices - name: ocs-deviceset-0-1-78k4w claimName: ocs-deviceset-0-1-78k4w path: /var/lib/rook/openshift-storage/ocs-deviceset-0-1-78k4w name: ocs-deviceset-0-1-78k4w-bridge $ omc get pvc ocs-deviceset-0-1-78k4w NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ocs-deviceset-0-1-78k4w Bound local-pv-32532e89 1788Gi RWO ocs-localblock 3y $ omc get pv local-pv-32532e89 -o yaml apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/bound-by-controller: "yes" pv.kubernetes.io/provisioned-by: local-volume-provisioner-storage-00.dev-intranet-01-wob.ocp.vwgroup.com-da4c2721-f73c-4626-8c98-7ff9f07f3212 creationTimestamp: "2020-09-09T14:52:54Z" finalizers: - kubernetes.io/pv-protection labels: storage.openshift.com/local-volume-owner-name: ocs-blkvol-storage-00 storage.openshift.com/local-volume-owner-namespace: local-storage name: local-pv-32532e89 resourceVersion: "194139688" uid: fdcb6fab-0a53-49ca-bdb1-6e807e969eb7 spec: accessModes: - ReadWriteOnce capacity: storage: 1788Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: ocs-deviceset-0-1-78k4w namespace: openshift-storage resourceVersion: "194139403" uid: 8acac407-b475-46ed-9e49-29b377b80137 local: path: /mnt/local-storage/ocs-localblock/sdr nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - storage-00.dev-intranet-01-wob.ocp.vwgroup.com persistentVolumeReclaimPolicy: Delete storageClassName: ocs-localblock volumeMode: Block status: phase: Bound For this osd at least, they are using device paths and should be using devices by-id/uuid so the device names never change ~~~ local: path: /mnt/local-storage/ocs-localblock/sdr ~~~ Asking for some more data in the case like: ~~~ $ ls -l /mnt/local-storage/ocs-localblock/ I'd also like to gather the following from LSO: // namespace might be local-storage $ oc get localvolume -o yaml -n openshift-local-storage $ oc get localvolumeset -o yaml -n openshift-local-storage $ oc get localvolumediscovery -o yaml -n openshift-local-storage ~~~ I have a very strong suspicion the kernel picked up the devices in another order and the osds cannot find their backing device. This was caused by the EUS upgrade of OCP, as it does a rollout of MCPs that will reboot the nodes. --- Additional comment from on 2024-04-04 21:45:31 UTC --- Hi, My suspicion was wrong // sym links for devices on storage-00 from lso: [acmdy78@bastion ~]$ oc debug -q node/storage-00.dev-intranet-01-wob.ocp.vwgroup.com -- chroot /host ls -l /mnt/local-storage/ocs-localblock/ total 0 lrwxrwxrwx. 1 root root 50 Sep 9 2020 sdp -> /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02507 lrwxrwxrwx. 1 root root 93 Apr 4 14:34 sdq -> /dev/ceph-e936c994-328c-4f59-8f1d-3a5573a7c64b/osd-block-aaced0de-8884-4551-a5ae-dd86ee436f23 lrwxrwxrwx. 1 root root 50 Sep 9 2020 sdr -> /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490 lrwxrwxrwx. 1 root root 50 Sep 9 2020 sds -> /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02497 lrwxrwxrwx. 1 root root 50 Sep 9 2020 sdt -> /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02489 They have some weird LSO config where each node has its own spec section with its devices listed. Anyways, they have the proper device defined in their LSO configs for the node ~~~ spec: logLevel: Normal managementState: Managed nodeSelector: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - storage-00.dev-intranet-01-wob.ocp.vwgroup.com storageClassDevices: - devicePaths: - /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02489 - /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490 - /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02497 - /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02504 - /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02507 storageClassName: ocs-localblock ~~~ and the symlink that lso knows about for sdr (ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490) points to the correct device. Seems the device names didn't change, and they are using by-id. Taking a step back... since the failure is with the expand-bluefs container: ~~~ - containerID: cri-o://c4163c5dbd33cab921c113b80350ec20a3af48a2865f7ea43c68f4cdd61afc19 image: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226 imageID: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:3d7144d5fe515acf3bf4bbf6456ab8877a4f7cd553c933ca6fd4d891add53038 lastState: terminated: containerID: cri-o://c4163c5dbd33cab921c113b80350ec20a3af48a2865f7ea43c68f4cdd61afc19 exitCode: 1 finishedAt: "2024-04-04T15:10:29Z" reason: Error startedAt: "2024-04-04T15:10:29Z" name: expand-bluefs ready: false restartCount: 37 state: waiting: message: back-off 5m0s restarting failed container=expand-bluefs pod=rook-ceph-osd-1-6db9cfc7c9-294jn_openshift-storage(4cab01f5-438d-4ffc-a133-cd427bb1cda5) reason: CrashLoopBackOff ~~~ because it cannot find its block device: ~~~ $ omc logs rook-ceph-osd-1-6db9cfc7c9-294jn -c expand-bluefs 2024-04-04T15:10:29.457626768Z inferring bluefs devices from bluestore path 2024-04-04T15:10:29.457728034Z unable to read label for /var/lib/ceph/osd/ceph-1: (2) No such file or directory 2024-04-04T15:10:29.457728034Z 2024-04-04T15:10:29.456+0000 7fdba942e900 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (2) No such file or directory ~~~ what if we removed this expand-bluefs container? Maybe the osd will start? If not, is the only option to replace the osd(s)? We do have 1 node up, so we should still hopefully have some good copies? What if setting replica 1 on the pools? --- Additional comment from Andreas Bleischwitz on 2024-04-05 07:58:58 UTC --- Hi, can we have at least an update that this issue is being investigated by engineering? The customer is now suffering from that outage which affects basically their complete development environment (they are developers, and therefore this cluster is their production environment) since about one day. We currently do not have any idea how to re-enable the OSDs so that they would be able to work again. Customer: Volkswagen AG (#556879) @muagarwa, @gsternag are you able to assist here? Best regards, /Andreas --- Additional comment from Bipin Kunal on 2024-04-05 11:36:28 UTC --- (In reply to Andreas Bleischwitz from comment #13) > Hi, > > can we have at least an update that this issue is being investigated by > engineering? The customer is now suffering from that outage which affects > basically their complete development environment (they are developers, and > therefore this cluster is their production environment) since about one day. > We currently do not have any idea how to re-enable the OSDs so that they > would be able to work again. > > Customer: > Volkswagen AG (#556879) > > > @muagarwa, @gsternag are you able to assist here? > > Best regards, > /Andreas Hi Andreas, Thanks for reaching out to me. I am trying to reach out to engineering team. Meanwhile, it will good to have prio-list email if this is really urgent. Removing the needinfo on Mudit. --- Additional comment from Radoslaw Zarzynski on 2024-04-05 11:58:00 UTC --- On it. --- Additional comment from on 2024-04-05 13:52:56 UTC --- Hello, @bkunal found the KCS https://access.redhat.com/solutions/7026462 I'm going to confirm this is the same for this case. Will post findings when I have any. --- Additional comment from Bipin Kunal on 2024-04-05 13:57:53 UTC --- (In reply to kelwhite from comment #18) > Hello, > > @bkunal found the KCS https://access.redhat.com/solutions/7026462 I'm going > to confirm this is the same for this case. Will post findings when I have > any. Actually Shubham from the Rook team found it and gave it to me. --- Additional comment from on 2024-04-05 15:41:15 UTC --- From the customer: // for osd-1: ~~~~ osd-1 is not using /dev/sdr, but i figured it out see this path: [acmdy78@bastion ~]$ oc get -n openshift-storage -o yaml deployment rook-ceph-osd-1 | grep ceph.rook.io/pvc ceph.rook.io/pvc: ocs-deviceset-0-1-78k4w ceph.rook.io/pvc: ocs-deviceset-0-1-78k4w - key: ceph.rook.io/pvc [acmdy78@bastion ~]$ oc get pvc ocs-deviceset-0-1-78k4w NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ocs-deviceset-0-1-78k4w Bound local-pv-32532e89 1788Gi RWO ocs-localblock 3y208d [acmdy78@bastion ~]$ oc get pv local-pv-32532e89 -o custom-columns=NAME:.metadata.name,PATH:.spec.local.path,NODE:.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values NAME PATH NODE local-pv-32532e89 /mnt/local-storage/ocs-localblock/sdr [storage-00.dev-intranet-01-wob.ocp.vwgroup.com] [acmdy78@bastion ~]$ oc debug -q node/storage-00.dev-intranet-01-wob.ocp.vwgroup.com sh-4.4# chroot /host sh-5.1# ls -lah /mnt/local-storage/ocs-localblock/sdr lrwxrwxrwx. 1 root root 50 Sep 9 2020 /mnt/local-storage/ocs-localblock/sdr -> /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490 sh-5.1# ls -lah /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490 lrwxrwxrwx. 1 root root 9 Apr 5 14:58 /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490 -> ../../sdo sh-5.1# ls -lah /dev/sdo brw-rw----. 1 root disk 8, 224 Apr 5 14:58 /dev/sdo sh-5.1# lsblk /dev/sdo NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdo 8:224 0 1.7T 0 disk `-ceph--f557e476--7bd4--41a0--9323--d6061a4318b3-osd--block--7f80e2ac--e21f--4aa6--8886--ec94d0387196 253:5 0 1.7T 0 lvm sh-5.1# head --bytes=60 /dev/sdo sh-5.1# ls -lah /dev/ceph- ceph-309180b2-697b-473a-a19c-d00cec94427a/ ceph-b100dea8-0b24-4d9b-97c8-ed6dba1bd10d/ ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/ ceph-3c187f41-d43e-4bb2-9421-97f78db94d28/ ceph-e936c994-328c-4f59-8f1d-3a5573a7c64b/ sh-5.1# ls -lah /dev/ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/osd-block-7f80e2ac-e21f-4aa6-8886-ec94d0387196 lrwxrwxrwx. 1 root root 7 Apr 4 10:54 /dev/ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/osd-block-7f80e2ac-e21f-4aa6-8886-ec94d0387196 -> ../dm-5 sh-5.1# ls -lah /dev/dm-5 brw-rw----. 1 root disk 253, 5 Apr 4 10:54 /dev/dm-5 sh-5.1# head --bytes=60 /dev/dm-5 bluestore block device 7f80e2ac-e21f-4aa6-8886-ec94d0387196 the bluestore block device id ov the logical volume is the same that you get for osd-1 in the ceph osd dump command. Could the Problem be caused by this lvm layer? On clusters we installed later ODF i don't see that lvm is used. This cluster started with OCS 4.3 or 4.4 ~~~~ --- Additional comment from on 2024-04-05 17:12:25 UTC --- Hi, Update... We've found the block devices dont exist on the nodes (this is from storage-00): ~~~ /var/lib/rook/openshift-storage/ocs-deviceset-0-1-78k4w: total 0 drwxr-xr-x. 2 root root 6 Apr 3 14:50 ceph-1 brw-rw-rw-. 1 root disk 8, 224 Apr 4 10:57 ocs-deviceset-0-1-78k4w /var/lib/rook/openshift-storage/ocs-deviceset-0-1-78k4w/ceph-1: total 0 /var/lib/rook/openshift-storage/ocs-deviceset-0-2-2p2fc: total 0 drwxr-xr-x. 2 root root 6 Apr 3 14:50 ceph-2 brw-rw-rw-. 1 root disk 65, 0 Apr 4 10:57 ocs-deviceset-0-2-2p2fc /var/lib/rook/openshift-storage/ocs-deviceset-0-2-2p2fc/ceph-2: total 0 /var/lib/rook/openshift-storage/ocs-deviceset-0-3-lh2tq: total 0 drwxr-xr-x. 2 root root 6 Apr 3 14:50 ceph-3 brw-rw-rw-. 1 root disk 65, 16 Apr 4 10:57 ocs-deviceset-0-3-lh2tq /var/lib/rook/openshift-storage/ocs-deviceset-0-3-lh2tq/ceph-3: total 0 /var/lib/rook/openshift-storage/ocs-deviceset-0-4-wfm22: total 0 drwxr-xr-x. 2 root root 10 Apr 3 14:50 ceph-4 brw-rw-rw-. 1 root disk 253, 4 Apr 4 14:49 ocs-deviceset-0-4-wfm22 /var/lib/rook/openshift-storage/ocs-deviceset-0-4-wfm22/ceph-4: total 0 ~~~ We need to confirm why these are gone. The current ask from engineering is why did these devices vanish. Would rook do anything with this? Can we find anything that will help? We're confirming the devices are gone on the other nodes and starting the osd replacement processes via a remote call. --- Additional comment from on 2024-04-05 19:50:49 UTC --- Hello All, On a remote with the customer. We've confirmed no data loss, phew. Seems the issue is with ceph-volume, it's not activating the device. We tried to do this manually via the below and got osd-9 up and running: ~~~ - Creating a backup of the osd-9 deployment, we're going to remove the liveness probe - scaled down the rook-ceph and ocs-operators - oc edit the osd-9 deployment and searched for the expand-bluefs section and removed the container - oc get pods to see if osd-9 came up (still 1/2) and rshed info the container - ceph-volume lvm list - ceph-volume lvm active --no-systemd -- 9 79021ece-c52a-46d1-8e99-69640a926822 // this is the osd fsid from ceph-volume lvm list - The osd was activated and when we viewed the osd data dir, the block device was listed: - ls -l '/var/lib/ceph/osd/ceph-{id} ~~~ We're looking to get some ceph-volume logs to determine what's going on... Might need to create another BZ for ceph-volume, but we will know more once we review the fresh odf must-gather --- Additional comment from Travis Nielsen on 2024-04-05 20:56:28 UTC --- Great to see the OSDs can be brought back up with the workaround and there is no data loss. These old LVM-based OSDs that were created (IIRC only in 4.2 and 4.3) are going to be a problem to maintain. We simply don't have tests that upgrades from OSDs created from 10+ releases ago. For this configuration that has not been supported for so long, the way to keep supporting such an old cluster will be to replace each of the OSDs. By purging each OSD one-at-a-time and bringing up a new one, the OSDs can be in a current configuration. It would not surprise me that in 4.14 there could have been an update to ceph-volume that caused this issue, because we just haven't tested this configuration for so long. Guillaume, agreed that old LVM-based OSDs should be replaced? --- Additional comment from Prashant Dhange on 2024-04-05 21:18:53 UTC --- Additional details for the completeness : (In reply to kelwhite from comment #22) > Hello All, > > On a remote with the customer. We've confirmed no data loss, phew. Seems the > issue is with ceph-volume, it's not activating the device. We tried to do > this manually via the below and got osd-9 up and running: > > ~~~ > - Creating a backup of the osd-9 deployment, we're going to remove the > liveness probe > - scaled down the rook-ceph and ocs-operators > - oc edit the osd-9 deployment and searched for the expand-bluefs section > and removed the container > - oc get pods to see if osd-9 came up (still 1/2) and rshed info the > container > - ceph-volume lvm list All LVs associated with ceph cluster are getting listed here and lsblk/lvs recognizing these LVs. > - ceph-volume lvm active --no-systemd -- 9 > 79021ece-c52a-46d1-8e99-69640a926822 // this is the osd fsid from > ceph-volume lvm list > - The osd was activated and when we viewed the osd data dir, the block > device was listed: > - ls -l '/var/lib/ceph/osd/ceph-{id} - Start osd.9 # ceph-osd --id 9 --fsid 18c9800f-7f91-4994-ad32-2a8a330babd6 --setuser ceph --setgroup ceph --crush-location="root=default host=storage-01-dev-intranet-01-wob-ocp-vwgroup-com rack=rack1" --log-to-stderr=true --err-to-stderr=true --mon-cluster-log-to-stderr=true --log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false --ms-learn-addr-from-peer=false NOTE : The OSD daemon will run in background and it's safe to exist the container here. --- Additional comment from Prashant Dhange on 2024-04-05 22:14:32 UTC --- The latest provided must-gather logs and ceph logs does not shed any light on failure to OSD directory priming or ceph-volume activating the OSD device. The next action plan : - Apply the workaround for every OSD on the cluster, refer comment#24 - Get all OSDs up/in and all PGs active+clean - Re-deploy all OSDs one-by-one. For the other clusters which might experience similar issues, the recommendations are to re-deploy all the OSDs then only go for cluster upgrade from 4.12.47 to 4.14.16. Let me know if you need any help on recovering this cluster. --- Additional comment from Prashant Dhange on 2024-04-05 22:58:31 UTC --- (In reply to Prashant Dhange from comment #24) ... > - Start osd.9 > # ceph-osd --id 9 --fsid 18c9800f-7f91-4994-ad32-2a8a330babd6 > --setuser ceph --setgroup ceph --crush-location="root=default > host=storage-01-dev-intranet-01-wob-ocp-vwgroup-com rack=rack1" > --log-to-stderr=true --err-to-stderr=true > --mon-cluster-log-to-stderr=true --log-stderr-prefix=debug > --default-log-to-file=false --default-mon-cluster-log-to-file=false > --ms-learn-addr-from-peer=false > NOTE : The OSD daemon will run in background and it's safe to exist the > container here. In ceph-osd run command, change crush-location according to `ceph osd tree` output or copy it from osd deployment config (under spec.containers section). Do not forget to add double quotes for crush-location value. e.g # oc get deployment rook-ceph-osd-9 -o yaml spec: affinity: ... containers: - args: - ceph - osd - start - -- - --foreground - --id - "9" - --fsid - 18c9800f-7f91-4994-ad32-2a8a330babd6 - --setuser - ceph - --setgroup - ceph - --crush-location=root=default host=storage-01-dev-intranet-01-wob-ocp-vwgroup-com rack=rack1 --- Additional comment from Rafrojas on 2024-04-06 07:19:41 UTC --- Hi Prashant I joined the call with customer and we aplied the Workaround, we edited the deployment of each OSD and removed the expand-bluefs args from that, we have a backup of all the deployments if required. After that ceph started the recovery and finished after some time, a new must-gather is collected and available on the case, there's a WARN on ceph: health: HEALTH_WARN 15 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats I also requested to collect the /var/log/ceph ceph volume logs for the RCA, Donny will collect along the day. We accorded to wait until new data is checked until continue with next steps, we cannot confirm that application is working fine, because developers shifts are MON-FRI but we see that the cluster looks on a better shape, with all operators running. Regards Rafa --- Additional comment from Rafrojas on 2024-04-06 08:59:04 UTC --- Hi Prashant Ceph logs collected and attached to the case, waiting for your instructions for next steps Regards Rafa --- Additional comment from Rafrojas on 2024-04-06 12:12:03 UTC --- Hi Prashant CU is waiting for some feedback, they are running this cluster in abnormal state, NA will join the shift soon I'll add the handover on the case from last call and status, please let us know next steps to share with CU ASAP. Regards Rafa --- Additional comment from Prashant Dhange on 2024-04-07 02:55:59 UTC --- Hi Rafa, (In reply to Rafrojas from comment #27) > Hi Prashant > > I joined the call with customer and we aplied the Workaround, we edited > the deployment of each OSD and removed the expand-bluefs args from that, we > have a backup of all the deployments if required. Good to know that all OSDs are up and running after applying the workaround. There is a quick way to patch the OSD deployment to remove bluefs-expand init container using oc patch command : # oc patch deployment rook-ceph-osd-<osdid> --type=json -p='[{"op": "remove", "path": "/spec/template/spec/initContainers/3"}]' > After that ceph started > the recovery and finished after some time, a new must-gather is collected > and available on the case, there's a WARN on ceph: > > health: HEALTH_WARN > 15 OSD(s) reporting legacy (not per-pool) BlueStore omap usage > stats This warning is because OSDs were created pre-octopus release. This warning will be addressed as we are re-deploying the OSDs. If we were not planning to re-deploy the OSDs then you need to set `ceph config rm osd bluestore_fsck_quick_fix_on_mount` and restart the OSDs. Refer KCS solution https://access.redhat.com/solutions/7041554 for more details. > > I also requested to collect the /var/log/ceph ceph volume logs for the > RCA, Donny will collect along the day. The latest logs have been analyzed and Guillaume able to find the RCA for the issue. The RCA has been provided in BZ-2273724#c3 comment. -------------------------------------------------------------------------------------------------------------------------------------------------- NOTES update(attached files to BZ) Hi, i will upload the latest Version of my Notes and a detailed output of commnds i ran to manually remove the local-storage lvm-Volumes, for your reference. I have seen in the diskmaker-manager-Pods (local-storage Operator) that it had Problem removing the lvm-Disks, thus making the prcedure neccasary to remove the VolumeGroups and logicalVolumes manually. I include the logs from the diskmaker-manager so you can have a look if there is a bug in local-storage-Operator, about deleting lvm ocs-localblock Volumes. Regards Donny --- Additional comment from Prashant Dhange on 2024-04-09 19:53:14 UTC --- (In reply to Prashant Dhange from comment #30) > Hi Rafa, ... > > After that ceph started > > the recovery and finished after some time, a new must-gather is collected > > and available on the case, there's a WARN on ceph: > > > > health: HEALTH_WARN > > 15 OSD(s) reporting legacy (not per-pool) BlueStore omap usage > > stats > This warning is because OSDs were created pre-octopus release. This warning > will be addressed as we are re-deploying the OSDs. If we were not planning > to re-deploy the OSDs then you need to set `ceph config rm osd > bluestore_fsck_quick_fix_on_mount` and restart the OSDs. Correction. Meant to say : This warning is because OSDs were created pre-octopus release. This warning will be addressed as we are re-deploying the OSDs. If we were not planning to re-deploy the OSDs then you need to set `ceph config set osd bluestore_fsck_quick_fix_on_mount true`, restart the OSDs and then `ceph config rm osd bluestore_fsck_quick_fix_on_mount`. > Refer KCS solution > https://access.redhat.com/solutions/7041554 for more details. --- Additional comment from Prashant Dhange on 2024-04-09 21:11:36 UTC --- We are still getting more details about the ODF upgrade history from the customer. Based on available data, here are the steps to reproduce this issue : - Deploy 4.3.18 cluster with LVM based OSDs - Start upgrading to ODF 4.4 then to every major release till 4.13.7 e.g from 4.4 to 4.5 to 4.6 so on. - Verify that ODF cluster is healthy and we are not observing any daemon crash (specifically OSDs) - Upgrade from 4.13.7 to 4.14.16 - Observe the OSDs are stuck in CLBO state --- Additional comment from Prashant Dhange on 2024-04-09 23:11:46 UTC --- Okay. The issue is not related to ceph-volume at all. The problem was OSDs were deployed on OCS 4.3 cluster, so the deployment config has different initContainers compared to ODF versions (probably 4.9 or later). Init containers sequence for 4.3 deployment config (refer point [2] below) : Container-1 : ## Init Container 1 : rook ceph osd init Container-2 : ## Init Container 2 : Copy rook command to OSD pod Container-3 : ## Init Container 3 : expand-bluefs Container-4 : ## Init Container 4 : chown ceph directories then the actual osd container starts, which executes the "ceph osd start" script which internally calls ceph-volume lvm activate then ceph-osd command. Container-5: ceph osd start (refer points [1] and [3] below) When the customer upgraded to 4.14.16, the "rook ceph osd init" container failed to mount the osd data directory. Due to this expand-bluefs container failed to start and exited with "_read_bdev_label failed to open /var/lib/ceph/osd/ceph-<osdid>/block: (2) No such file or directory" error. When we removed expand-bluefs init container as a workaround, the ceph osd started successfully as Container-5 (ceph osd start) was able to execute the lvm activate and start ceph-osd daemon. When I was on the remote session for the first time, we were able to start (after removing expand-bluefs init container) osd.9 manually after executing the lvm activate command then the ceph-osd command. More details : [1] ceph osd container : containers: - args: - ceph - osd - start - -- - --foreground - --id - "1" - --fsid - 18c9800f-7f91-4994-ad32-2a8a330babd6 - --setuser - ceph - --setgroup - ceph - --crush-location=root=default host=storage-00-dev-intranet-01-wob-ocp-vwgroup-com rack=rack0 - --osd-op-num-threads-per-shard=2 - --osd-op-num-shards=8 - --osd-recovery-sleep=0 - --osd-snap-trim-sleep=0 - --osd-delete-sleep=0 - --bluestore-min-alloc-size=4096 - --bluestore-prefer-deferred-size=0 - --bluestore-compression-min-blob-size=8192 - --bluestore-compression-max-blob-size=65536 - --bluestore-max-blob-size=65536 - --bluestore-cache-size=3221225472 - --bluestore-throttle-cost-per-io=4000 - --bluestore-deferred-batch-ops=16 - --default-log-to-stderr=true - --default-err-to-stderr=true - --default-mon-cluster-log-to-stderr=true - '--default-log-stderr-prefix=debug ' - --default-log-to-file=false - --default-mon-cluster-log-to-file=false - --ms-learn-addr-from-peer=false command: - bash - -x - -c - "\nset -o nounset # fail if variables are unset\nchild_pid=\"\"\nsigterm_received=false\nfunction sigterm() {\n\techo \"SIGTERM received\"\n\tsigterm_received=true\n\tkill -TERM \"$child_pid\"\n}\ntrap sigterm SIGTERM\n\"${@}\" &\n# un-fixable race condition: if receive sigterm here, it won't be sent to child process\nchild_pid=\"$!\"\nwait \"$child_pid\" # wait returns the same return code of child process when called with argument\nwait \"$child_pid\" # first wait returns immediately upon SIGTERM, so wait again for child to actually stop; this is a noop if child exited normally\nceph_osd_rc=$?\nif [ $ceph_osd_rc -eq 0 ] && ! $sigterm_received; then\n\ttouch /tmp/osd-sleep\n\techo \"OSD daemon exited with code 0, possibly due to OSD flapping. The OSD pod will sleep for $ROOK_OSD_RESTART_INTERVAL hours. Restart the pod manually once the flapping issue is fixed\"\n\tsleep \"$ROOK_OSD_RESTART_INTERVAL\"h &\n\tchild_pid=\"$!\"\n\twait \"$child_pid\"\n\twait \"$child_pid\" # wait again for sleep to stop\nfi\nexit $ceph_osd_rc\n" - -- - /rook/rook [2] initContainers: ## Init Container 1 : rook ceph osd init - args: - ceph - osd - init env: - name: ROOK_NODE_NAME value: storage-00.dev-intranet-01-wob.ocp.vwgroup.com - name: ROOK_CLUSTER_ID value: aaba77cf-8f28-437d-b88f-36dcafc3a865 - name: ROOK_CLUSTER_NAME value: ocs-storagecluster-cephcluster - name: ROOK_PRIVATE_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP - name: ROOK_PUBLIC_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP - name: POD_NAMESPACE value: openshift-storage - name: ROOK_MON_ENDPOINTS valueFrom: configMapKeyRef: key: data name: rook-ceph-mon-endpoints - name: ROOK_CONFIG_DIR value: /var/lib/rook - name: ROOK_CEPH_CONFIG_OVERRIDE value: /etc/rook/config/override.conf - name: NODE_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName - name: ROOK_CRUSHMAP_ROOT value: default - name: ROOK_CRUSHMAP_HOSTNAME - name: CEPH_VOLUME_DEBUG value: "1" - name: CEPH_VOLUME_SKIP_RESTORECON value: "1" - name: DM_DISABLE_UDEV value: "1" - name: ROOK_OSD_ID value: "1" - name: ROOK_CEPH_VERSION value: ceph version 17.2.6-196 quincy - name: ROOK_IS_DEVICE value: "true" - name: TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES value: "134217728" envFrom: - configMapRef: name: rook-ceph-osd-env-override optional: true image: registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:23041cec90d0c64f043deb9f5b589c2fe3b2e29163cf7576324341ad855affcc imagePullPolicy: IfNotPresent name: config-init resources: {} securityContext: capabilities: drop: - ALL privileged: true readOnlyRootFilesystem: false runAsUser: 0 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/rook name: rook-data - mountPath: /etc/ceph name: rook-config-override readOnly: true - mountPath: /run/ceph name: ceph-daemons-sock-dir - mountPath: /var/log/ceph name: rook-ceph-log - mountPath: /var/lib/ceph/crash name: rook-ceph-crash - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xjzbc readOnly: true ## Init Container 2 : Copy rook command to OSD pod - args: - --archive - --force - --verbose - /usr/local/bin/rook - /rook command: - cp image: registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:23041cec90d0c64f043deb9f5b589c2fe3b2e29163cf7576324341ad855affcc imagePullPolicy: IfNotPresent name: copy-bins resources: {} securityContext: capabilities: drop: - ALL terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /rook name: rook-binaries - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xjzbc readOnly: true - command: - /bin/bash - -c - "\nset -xe\n\nPVC_SOURCE=/ocs-deviceset-0-1-78k4w\nPVC_DEST=/mnt/ocs-deviceset-0-1-78k4w\nCP_ARGS=(--archive --dereference --verbose)\n\nif [ -b \"$PVC_DEST\" ]; then\n\tPVC_SOURCE_MAJ_MIN=$(stat --format '%t%T' $PVC_SOURCE)\n\tPVC_DEST_MAJ_MIN=$(stat --format '%t%T' $PVC_DEST)\n\tif [[ \"$PVC_SOURCE_MAJ_MIN\" == \"$PVC_DEST_MAJ_MIN\" ]]; then\n\t\techo \"PVC $PVC_DEST already exists and has the same major and minor as $PVC_SOURCE: \"$PVC_SOURCE_MAJ_MIN\"\"\n\t\texit 0\n\telse\n\t\techo \"PVC's source major/minor numbers changed\"\n\t\tCP_ARGS+=(--remove-destination)\n\tfi\nfi\n\ncp \"${CP_ARGS[@]}\" \"$PVC_SOURCE\" \"$PVC_DEST\"\n" image: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226 imagePullPolicy: IfNotPresent name: blkdevmapper resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi securityContext: capabilities: add: - MKNOD drop: - ALL privileged: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeDevices: - devicePath: /ocs-deviceset-0-1-78k4w name: ocs-deviceset-0-1-78k4w volumeMounts: - mountPath: /mnt name: ocs-deviceset-0-1-78k4w-bridge - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xjzbc readOnly: true ## Init Container 3 : expand-bluefs - args: - bluefs-bdev-expand - --path - /var/lib/ceph/osd/ceph-1 command: - ceph-bluestore-tool image: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226 imagePullPolicy: IfNotPresent name: expand-bluefs resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi securityContext: capabilities: drop: - ALL privileged: true runAsUser: 0 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/ceph/osd/ceph-1 name: ocs-deviceset-0-1-78k4w-bridge subPath: ceph-1 - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xjzbc readOnly: true ## Init Container 4 : chown ceph directories - args: - --verbose - --recursive - ceph:ceph - /var/log/ceph - /var/lib/ceph/crash - /run/ceph command: - chown image: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226 imagePullPolicy: IfNotPresent name: chown-container-data-dir resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi securityContext: capabilities: drop: - ALL privileged: true readOnlyRootFilesystem: false runAsUser: 0 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/rook name: rook-data - mountPath: /etc/ceph name: rook-config-override readOnly: true - mountPath: /run/ceph name: ceph-daemons-sock-dir - mountPath: /var/log/ceph name: rook-ceph-log - mountPath: /var/lib/ceph/crash name: rook-ceph-crash - mountPath: /dev name: devices - mountPath: /run/udev name: run-udev - mountPath: /rook name: rook-binaries - mountPath: /mnt name: ocs-deviceset-0-1-78k4w-bridge - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xjzbc readOnly: true nodeName: storage-00.dev-intranet-01-wob.ocp.vwgroup.com nodeSelector: kubernetes.io/hostname: storage-00.dev-intranet-01-wob.ocp.vwgroup.com preemptionPolicy: PreemptLowerPriority priority: 2000001000 priorityClassName: system-node-critical restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1000620000 seLinuxOptions: level: s0:c25,c10 serviceAccount: rook-ceph-osd serviceAccountName: rook-ceph-osd shareProcessNamespace: true terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway volumes: - emptyDir: {} name: rook-data - name: rook-config-override projected: defaultMode: 420 sources: - configMap: items: - key: config mode: 292 path: ceph.conf name: rook-config-override - hostPath: path: /var/lib/rook/exporter type: DirectoryOrCreate name: ceph-daemons-sock-dir - hostPath: path: /var/lib/rook/openshift-storage/log type: "" name: rook-ceph-log - hostPath: path: /var/lib/rook/openshift-storage/crash type: "" name: rook-ceph-crash - hostPath: path: /dev type: "" name: devices - name: ocs-deviceset-0-1-78k4w persistentVolumeClaim: claimName: ocs-deviceset-0-1-78k4w - hostPath: path: /var/lib/rook/openshift-storage/ocs-deviceset-0-1-78k4w type: DirectoryOrCreate name: ocs-deviceset-0-1-78k4w-bridge - hostPath: path: /run/udev type: "" name: run-udev - emptyDir: {} name: rook-binaries - name: kube-api-access-xjzbc projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace - configMap: items: - key: service-ca.crt path: service-ca.crt name: openshift-service-ca.crt [3] ceph osd start logs 2024-04-06T05:46:41.593349071Z + set -o nounset 2024-04-06T05:46:41.593349071Z + child_pid= 2024-04-06T05:46:41.593427396Z + sigterm_received=false 2024-04-06T05:46:41.593427396Z + trap sigterm SIGTERM 2024-04-06T05:46:41.593576845Z + child_pid=52 2024-04-06T05:46:41.593589922Z + wait 52 2024-04-06T05:46:41.593726159Z + /rook/rook ceph osd start -- --foreground --id 1 --fsid 18c9800f-7f91-4994-ad32-2a8a330babd6 --setuser ceph --setgroup ceph '--crush-location=root=default host=storage-00-dev-intranet-01-wob-ocp-vwgroup-com rack=rack0' --osd-op-num-threads-per-shard=2 --osd-op-num-shards=8 --osd-recovery-sleep=0 --osd-snap-trim-sleep=0 --osd-delete-sleep=0 --bluestore-min-alloc-size=4096 --bluestore-prefer-deferred-size=0 --bluestore-compression-min-blob-size=8192 --bluestore-compression-max-blob-size=65536 --bluestore-max-blob-size=65536 --bluestore-cache-size=3221225472 --bluestore-throttle-cost-per-io=4000 --bluestore-deferred-batch-ops=16 --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true '--default-log-stderr-prefix=debug ' --default-log-to-file=false --default-mon-cluster-log-to-file=false --ms-learn-addr-from-peer=false 2024-04-06T05:46:41.626980032Z 2024-04-06 05:46:41.626898 I | rookcmd: starting Rook v4.14.6-0.7522dc8ddafd09860f2314db3965ef97671cd138 with arguments '/rook/rook ceph osd start -- --foreground --id 1 --fsid 18c9800f-7f91-4994-ad32-2a8a330babd6 --setuser ceph --setgroup ceph --crush-location=root=default host=storage-00-dev-intranet-01-wob-ocp-vwgroup-com rack=rack0 --osd-op-num-threads-per-shard=2 --osd-op-num-shards=8 --osd-recovery-sleep=0 --osd-snap-trim-sleep=0 --osd-delete-sleep=0 --bluestore-min-alloc-size=4096 --bluestore-prefer-deferred-size=0 --bluestore-compression-min-blob-size=8192 --bluestore-compression-max-blob-size=65536 --bluestore-max-blob-size=65536 --bluestore-cache-size=3221225472 --bluestore-throttle-cost-per-io=4000 --bluestore-deferred-batch-ops=16 --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true --default-log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false --ms-learn-addr-from-peer=false' 2024-04-06T05:46:41.626980032Z 2024-04-06 05:46:41.626956 I | rookcmd: flag values: --block-path=/dev/ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/osd-block-7f80e2ac-e21f-4aa6-8886-ec94d0387196, --help=false, --log-level=INFO, --lv-backed-pv=true, --osd-id=1, --osd-store-type=, --osd-uuid=7f80e2ac-e21f-4aa6-8886-ec94d0387196, --pvc-backed-osd=true 2024-04-06T05:46:41.626980032Z 2024-04-06 05:46:41.626960 I | ceph-spec: parsing mon endpoints: g=100.69.195.205:3300,f=100.70.70.134:6789,b=100.70.78.99:6789 2024-04-06T05:46:41.628815634Z 2024-04-06 05:46:41.628788 I | cephosd: Successfully updated lvm config file "/etc/lvm/lvm.conf" 2024-04-06T05:46:41.925092800Z 2024-04-06 05:46:41.925022 I | exec: Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1 2024-04-06T05:46:41.928518615Z 2024-04-06 05:46:41.928499 I | exec: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1 2024-04-06T05:46:41.931919054Z 2024-04-06 05:46:41.931906 I | exec: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/osd-block-7f80e2ac-e21f-4aa6-8886-ec94d0387196 --path /var/lib/ceph/osd/ceph-1 --no-mon-config 2024-04-06T05:46:41.954830230Z 2024-04-06 05:46:41.954808 I | exec: Running command: /usr/bin/ln -snf /dev/ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/osd-block-7f80e2ac-e21f-4aa6-8886-ec94d0387196 /var/lib/ceph/osd/ceph-1/block 2024-04-06T05:46:41.957864812Z 2024-04-06 05:46:41.957851 I | exec: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block 2024-04-06T05:46:41.961270909Z 2024-04-06 05:46:41.961255 I | exec: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-5 2024-04-06T05:46:41.964681164Z 2024-04-06 05:46:41.964667 I | exec: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1 2024-04-06T05:46:41.967586406Z 2024-04-06 05:46:41.967574 I | exec: --> ceph-volume lvm activate successful for osd ID: 1 2024-04-06T05:46:42.029385070Z 2024-04-06 05:46:42.028473 I | exec: debug 2024-04-06T05:46:42.027+0000 7fa35830c5c0 0 set uid:gid to 167:167 (ceph:ceph) 2024-04-06T05:46:42.029462802Z 2024-04-06 05:46:42.029394 I | exec: debug 2024-04-06T05:46:42.027+0000 7fa35830c5c0 0 ceph version 17.2.6-196.el9cp (cbbf2cfb549196ca18c0c9caff9124d83ed681a4) quincy (stable), process ceph-osd, pid 133 2024-04-06T05:46:42.029462802Z 2024-04-06 05:46:42.029437 I | exec: debug 2024-04-06T05:46:42.027+0000 7fa35830c5c0 0 pidfile_write: ignore empty --pid-file 2024-04-06T05:46:42.029899768Z 2024-04-06 05:46:42.029860 I | exec: debug 2024-04-06T05:46:42.029+0000 7fa35830c5c0 1 bdev(0x55a4d1b87c00 /var/lib/ceph/osd/ceph-1/block) open path /var/lib/ceph/osd/ceph-1/block 2024-04-06T05:46:42.029959756Z 2024-04-06 05:46:42.029947 I | exec: debug 2024-04-06T05:46:42.029+0000 7fa35830c5c0 0 bdev(0x55a4d1b87c00 /var/lib/ceph/osd/ceph-1/block) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-1/block failed: (22) Invalid argument 2024-04-06T05:46:42.030424427Z 2024-04-06 05:46:42.030409 I | exec: debug 2024-04-06T05:46:42.029+0000 7fa35830c5c0 1 bdev(0x55a4d1b87c00 /var/lib/ceph/osd/ceph-1/block) open size 1920378863616 (0x1bf1f800000, 1.7 TiB) block_size 4096 (4 KiB) non-rotational discard supported 2024-04-06T05:46:42.030649989Z 2024-04-06 05:46:42.030627 I | exec: debug 2024-04-06T05:46:42.029+0000 7fa35830c5c0 1 bluestore(/var/lib/ceph/osd/ceph-1) _set_cache_sizes cache_size 3221225472 meta 0.45 kv 0.45 data 0.06 2024-04-06T05:46:42.030665356Z 2024-04-06 05:46:42.030652 I | exec: debug 2024-04-06T05:46:42.030+0000 7fa35830c5c0 1 bdev(0x55a4d1b87400 /var/lib/ceph/osd/ceph-1/block) open path /var/lib/ceph/osd/ceph-1/block 2024-04-06T05:46:42.030775141Z 2024-04-06 05:46:42.030763 I | exec: debug 2024-04-06T05:46:42.030+0000 7fa35830c5c0 0 bdev(0x55a4d1b87400 /var/lib/ceph/osd/ceph-1/block) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-1/block failed: (22) Invalid argument So we need to find out why "rook ceph osd init" was failing to mount the OSD data dir. @Travis Any thoughts on "rook ceph osd init" failure ? --- Additional comment from Prashant Dhange on 2024-04-09 23:29:39 UTC --- (In reply to Prashant Dhange from comment #34) ... > > So we need to find out why "rook ceph osd init" was failing to mount the OSD > data dir. > > @Travis Any thoughts on "rook ceph osd init" failure ? https://github.com/rook/rook/commit/33e824a323291de1a261b70e9bd255d5049ee02b commit likely caused this issue as we have removed the fsid, username configs from the env vars. --- Additional comment from Travis Nielsen on 2024-04-10 02:13:31 UTC --- (In reply to Prashant Dhange from comment #35) > (In reply to Prashant Dhange from comment #34) > ... > > > > So we need to find out why "rook ceph osd init" was failing to mount the OSD > > data dir. > > > > @Travis Any thoughts on "rook ceph osd init" failure ? > https://github.com/rook/rook/commit/33e824a323291de1a261b70e9bd255d5049ee02b > commit likely caused this issue as we have removed the fsid, username > configs from the env vars. That commit was also backported all the way to 4.10 [1], so this change was not new in 4.14. The error about the ceph-username parameter missing must be ignored despite the error in the init container. It would be really helpful if we can repro this, first looking at the OSD spec and logs in 4.13, and then upgrading to 4.14 to see what changed in the OSD spec. I suspect if the "osd init" container fails, the ceph.conf would not be present and cause the bluefs expand container to fail. But I am confused why the "osd init" container failure did not abort starting the OSD in the first place. Init containers are not supposed to continue to the next one if they fail. I still need to dig more, but in the meantime the repro would help. [1] https://github.com/red-hat-storage/rook/commit/673c331a072a9de41ab2aac5405600104bd44ef2 --- Additional comment from RHEL Program Management on 2024-04-12 17:41:42 UTC --- This bug having no release flag set previously, is now set with release flag 'odf‑4.16.0' to '?', and so is being proposed to be fixed at the ODF 4.16.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag. --- Additional comment from RHEL Program Management on 2024-04-12 17:41:42 UTC --- Since this bug has severity set to 'urgent', it is being proposed as a blocker for the currently set release flag. Please resolve ASAP. --- Additional comment from Prashant Dhange on 2024-04-12 18:02:57 UTC --- If OSDs are backed by LVM then the root operator should prevent the ODF upgrade and also alert the admin to re-deploy these OSDs before upgrading the ODF cluster. @Travis, I donot recall exactly but you had some ideas around it during our VW escalation (BZ#2273398) discussion. --- Additional comment from Santosh Pillai on 2024-04-15 08:03:20 UTC --- Should It prevent entire ODF upgrade or just the upgrade of the lvm based OSDs? --- Additional comment from Prashant Dhange on 2024-04-15 22:13:43 UTC --- (In reply to Santosh Pillai from comment #4) > Should It prevent entire ODF upgrade or just the upgrade of the lvm based > OSDs? Prevention of upgrade for lvm based OSDs is preferred but if we alert the end user before the start of the upgrade then we could avoid the unexpected data-loss situation in advance. --- Additional comment from Travis Nielsen on 2024-04-16 20:08:02 UTC --- (In reply to Prashant Dhange from comment #3) > If OSDs are backed by LVM then the root operator should prevent the ODF > upgrade and also alert the admin to re-deploy these OSDs before upgrading > the ODF cluster. > > @Travis, I donot recall exactly but you had some ideas around it during our > VW escalation (BZ#2273398) discussion. Preventing the upgrade will be difficult because we won't detect it until after the upgrade is already in progress. Mons and mgr will be upgraded, then OSDs are reconciled and we would discover these LVM-based OSDs. If we fail the reconcile, then it will be difficult to recover from the situation. Instead of failing/preventing the upgrade, let's consider removing the resize init container from these OSDs. Then separately we can find a way to alert the user that they have these legacy OSDs that should be replaced. This gives the user more time to replace them. --- Additional comment from Travis Nielsen on 2024-04-23 22:45:13 UTC --- Acking for the fix: - Rook will save status on the CephCluster CR that a legacy LVM-based OSD is in the cluster - UI will need to raise an alert based on that status item --- Additional comment from Prasad Desala on 2024-04-24 12:13:30 UTC --- Hi Travis, Since we are unable to deploy a 4.3 cluster to reproduce this issue, could you please provide guidance on the steps to verify this bug on the fix build? Please let us know. --- Additional comment from Prasad Desala on 2024-04-25 05:38:28 UTC --- (In reply to Prasad Desala from comment #8) > Hi Travis, > > Since we are unable to deploy a 4.3 cluster to reproduce this issue, could > you please provide guidance on the steps to verify this bug on the fix > build? Please let us know. Providing qa_ack based on comments https://bugzilla.redhat.com/show_bug.cgi?id=2273398#c39 and https://bugzilla.redhat.com/show_bug.cgi?id=2273398#c41 We may need to verify the fix based on the 4.16 CI regression runs. --- Additional comment from RHEL Program Management on 2024-04-25 05:38:40 UTC --- This BZ is being approved for ODF 4.16.0 release, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.16.0 --- Additional comment from RHEL Program Management on 2024-04-25 05:38:40 UTC --- Since this bug has been approved for ODF 4.16.0 release, through release flag 'odf-4.16.0+', the Target Release is being set to 'ODF 4.16.0 --- Additional comment from Travis Nielsen on 2024-04-29 16:19:50 UTC --- (In reply to Prasad Desala from comment #8) > Hi Travis, > > Since we are unable to deploy a 4.3 cluster to reproduce this issue, could > you please provide guidance on the steps to verify this bug on the fix > build? Please let us know. Discussion in a separate thread is that we will just have to run regression tests, as we have not been able to repro the issue. --- Additional comment from Mudit Agarwal on 2024-05-07 05:57:34 UTC --- Travis, do we have a PR for this? --- Additional comment from Travis Nielsen on 2024-05-07 19:00:20 UTC --- (In reply to Mudit Agarwal from comment #13) > Travis, do we have a PR for this? Not yet. And we will need two PRs: 1) Rook to update its status when it finds the legacy OSDs 2) UI to raise an alert based on the status (unless UI team already has a way to raise alert based on Rook status, I still need to sync with UI team on this) --- Additional comment from Travis Nielsen on 2024-05-09 22:45:33 UTC --- Rook will add status under status.storage.legacyOSDs to the CephCluster CR such as the following: status: storage: deviceClasses: - name: hdd legacyOSDs: - id: 0 reason: LVM-based OSD on a PVC (id=0) is deprecated and should be replaced - id: 1 reason: LVM-based OSD on a PVC (id=1) is deprecated and should be replaced - id: 2 reason: LVM-based OSD on a PVC (id=2) is deprecated and should be replaced osd: storeType: bluestore: 3 I will clone this BZ to get the needed alert raised based on this status.
Based on feedback during PR review, now the output is: storage: deprecatedOSDs: LVM-based OSDs on a PVC are deprecated, see documentation on replacing OSDs: - 0 - 1 - 2 deviceClasses: - name: hdd Please confirm if any concern with this format for raising the alert
There are no concerns that i can see, i'm working on the changes based on this output. Fetching the status on OCS metrics exporter, and generating a metric to generate the alert.
(In reply to Divyansh Kamboj from comment #9) > There are no concerns that i can see, i'm working on the changes based on > this output. Fetching the status on OCS metrics exporter, and generating a > metric to generate the alert. Thanks. The Rook changes are now merged downstream in 4.16 with https://github.com/red-hat-storage/rook/pull/648
Moving this one to MODIFIED, if we need another bug for metrics then please open one.
Sorry, wrong bug.
Updating the RDT on behalf of Divyansh Kamboj.
Moving it to the verified state based on the 4.16 CI regression runs.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591
Can we please backport this to 4.12, 4.13, 4.14, and 4.15? Were seeing this issue be hit in 4.14 upgrades