+++ This bug was initially created as a clone of Bug #2273398 +++ Description of problem (please be detailed as possible and provide log snippests): Customer upgraded from 4.12.47 to 4.14.16 we have noticed that all OSDs are in crahs loop with the expand-bluefs container showing errors about devices that can not be found. Version of all relevant components (if applicable): ODF 4.14.6 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, all osds are down Is there any workaround available to the best of your knowledge? No Can this issue reproducible? yes, at customer environment --- Additional comment from RHEL Program Management on 2024-04-04 15:14:00 UTC --- This bug having no release flag set previously, is now set with release flag 'odf‑4.16.0' to '?', and so is being proposed to be fixed at the ODF 4.16.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag. --- Additional comment from RHEL Program Management on 2024-04-04 15:14:00 UTC --- Since this bug has severity set to 'urgent', it is being proposed as a blocker for the currently set release flag. Please resolve ASAP. --- Additional comment from RHEL Program Management on 2024-04-04 15:14:00 UTC --- The 'Target Release' is not to be set manually at the Red Hat OpenShift Data Foundation product. The 'Target Release' will be auto set appropriately, after the 3 Acks (pm,devel,qa) are set to "+" for a specific release flag and that release flag gets auto set to "+". --- Additional comment from Manjunatha on 2024-04-04 15:15:20 UTC --- + Attached osd.4 pod description here + ceph osd pods are in crashloop state with below messages. Error: ceph-username is required for osd rook error: ceph-username is required for osd Usage: rook ceph osd init [flags] Flags: --cluster-id string the UID of the cluster CR that owns this cluster --cluster-name string the name of the cluster CR that owns this cluster --encrypted-device whether to encrypt the OSD with dmcrypt -h, --help help for init --is-device whether the osd is a device --location string location of this node for CRUSH placement --node-name string the host name of the node (default "rook-ceph-osd-1-6db9cfc7c9-294jn") --osd-crush-device-class string The device class for all OSDs configured on this node --osd-crush-initial-weight string The initial weight of OSD in TiB units --osd-database-size int default size (MB) for OSD database (bluestore) --osd-id int osd id for which to generate config (default -1) --osd-store-type string the osd store type such as bluestore (default "bluestore") --osd-wal-size int default size (MB) for OSD write ahead log (WAL) (bluestore) (default 576) --osds-per-device int the number of OSDs per device (default 1) Global Flags: --log-level string logging level for logging/tracing output (valid values: ERROR,WARNING,INFO,DEBUG) (default "INFO") '/usr/local/bin/rook' -> '/rook/rook' + PVC_SOURCE=/ocs-deviceset-0-1-78k4w + PVC_DEST=/mnt/ocs-deviceset-0-1-78k4w + CP_ARGS=(--archive --dereference --verbose) + '[' -b /mnt/ocs-deviceset-0-1-78k4w ']' ++ stat --format %t%T /ocs-deviceset-0-1-78k4w + PVC_SOURCE_MAJ_MIN=8e0 ++ stat --format %t%T /mnt/ocs-deviceset-0-1-78k4w + PVC_DEST_MAJ_MIN=8e0 + [[ 8e0 == \8\e\0 ]] + echo 'PVC /mnt/ocs-deviceset-0-1-78k4w already exists and has the same major and minor as /ocs-deviceset-0-1-78k4w: 8e0' PVC /mnt/ocs-deviceset-0-1-78k4w already exists and has the same major and minor as /ocs-deviceset-0-1-78k4w: 8e0 + exit 0 inferring bluefs devices from bluestore path unable to read label for /var/lib/ceph/osd/ceph-1: (2) No such file or directory 2024-04-04T13:22:38.461+0000 7f41cddbf900 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (2) No such file or directory --- Additional comment from Manjunatha on 2024-04-04 15:15:49 UTC --- --- Additional comment from Manjunatha on 2024-04-04 15:20:40 UTC --- odf mustgather logs in below path path /cases/03783266/0020-must-gather-odf.tar.gz/must-gather.local.8213025456072446876/inspect.local.6887047306785235156/namespaces/openshift-storage This issue looks similar to bz https://bugzilla.redhat.com/show_bug.cgi?id=2254378 --------------------------- Events from osd-0 deployment: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 23m default-scheduler Successfully assigned openshift-storage/rook-ceph-osd-0-6764d4c675-f9w2m to storage-00.dev-intranet-01-wob.ocp.vwgroup.com Normal SuccessfulMountVolume 23m kubelet MapVolume.MapPodDevice succeeded for volume "local-pv-abfd62bb" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-abfd62bb" Normal SuccessfulMountVolume 23m kubelet MapVolume.MapPodDevice succeeded for volume "local-pv-abfd62bb" volumeMapPath "/var/lib/kubelet/pods/2ae21d6a-2aa5-43e8-8c0d-7ecb250656a2/volumeDevices/kubernetes.io~local-volume" Normal AddedInterface 23m multus Add eth0 [100.72.0.27/23] from ovn-kubernetes Normal Pulling 23m kubelet Pulling image "registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:23041cec90d0c64f043deb9f5b589c2fe3b2e29163cf7576324341ad855affcc" Normal Pulled 23m kubelet Successfully pulled image "registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:23041cec90d0c64f043deb9f5b589c2fe3b2e29163cf7576324341ad855affcc" in 14.120104593s (14.120122047s including waiting) Normal Created 23m kubelet Created container config-init Normal Started 23m kubelet Started container config-init Normal Pulled 23m kubelet Container image "registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:23041cec90d0c64f043deb9f5b589c2fe3b2e29163cf7576324341ad855affcc" already present on machine Normal Created 23m kubelet Created container copy-bins Normal Started 23m kubelet Started container copy-bins Normal Pulled 23m kubelet Container image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226" already present on machine Normal Created 23m kubelet Created container blkdevmapper Normal Started 23m kubelet Started container blkdevmapper Normal Pulled 23m (x3 over 23m) kubelet Container image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226" already present on machine Normal Created 23m (x3 over 23m) kubelet Created container expand-bluefs Normal Started 23m (x3 over 23m) kubelet Started container expand-bluefs Warning BackOff 3m43s (x94 over 23m) kubelet Back-off restarting failed container expand-bluefs in pod rook-ceph-osd-0-6764d4c675-f9w2m_openshift-storage(2ae21d6a-2aa5-43e8-8c0d-7ecb250656a2) --- Additional comment from Andreas Bleischwitz on 2024-04-04 15:29:15 UTC --- TAM update about the business impact: All developers on this platform are being blocked due to this issue. VW also has a lot of VMs running on this bare-metal cluster on which they do all the testing and simulation of car components. This is now all down and they are not able to work. --- Additional comment from Manjunatha on 2024-04-04 15:46:36 UTC --- Customer rebooted the osd nodes few times as suggested below solution but that dint help https://access.redhat.com/solutions/7015095 --- Additional comment from Andreas Bleischwitz on 2024-04-04 15:53:31 UTC --- The customer just informed us that this is a very "old" cluster starting with 4.3.18 and ODF was installed about 3.5 years ago. So there may be a lot of tweaks/leftovers/* in this cluster. --- Additional comment from Manjunatha on 2024-04-04 16:32:07 UTC --- Latest mustgather and sosreport from storage node 0 in below supportshell path: /cases/03783266 cluster: id: 18c9800f-7f91-4994-ad32-2a8a330babd6 health: HEALTH_WARN 1 filesystem is degraded 1 MDSs report slow metadata IOs 7 osds down 2 hosts (10 osds) down 2 racks (10 osds) down Reduced data availability: 505 pgs inactive services: mon: 3 daemons, quorum b,f,g (age 4h) mgr: a(active, since 4h) mds: 1/1 daemons up, 1 standby osd: 15 osds: 4 up (since 23h), 11 in (since 23h); 5 remapped pgs data: volumes: 0/1 healthy, 1 recovering pools: 12 pools, 505 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: 100.000% pgs unknown 505 unknown ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 26.19896 root default -4 8.73299 rack rack0 -3 8.73299 host storage-00-dev-intranet-01-wob-ocp-vwgroup-com 0 ssd 1.74660 osd.0 down 1.00000 1.00000 1 ssd 1.74660 osd.1 down 1.00000 1.00000 2 ssd 1.74660 osd.2 down 1.00000 1.00000 3 ssd 1.74660 osd.3 down 1.00000 1.00000 4 ssd 1.74660 osd.4 down 1.00000 1.00000 -12 8.73299 rack rack1 -11 8.73299 host storage-01-dev-intranet-01-wob-ocp-vwgroup-com 6 ssd 1.74660 osd.6 up 1.00000 1.00000 7 ssd 1.74660 osd.7 up 1.00000 1.00000 8 ssd 1.74660 osd.8 up 1.00000 1.00000 9 ssd 1.74660 osd.9 up 1.00000 1.00000 10 ssd 1.74660 osd.10 down 0 1.00000 -8 8.73299 rack rack2 -7 8.73299 host storage-02-dev-intranet-01-wob-ocp-vwgroup-com 5 ssd 1.74660 osd.5 down 1.00000 1.00000 11 ssd 1.74660 osd.11 down 0 1.00000 12 ssd 1.74660 osd.12 down 0 1.00000 13 ssd 1.74660 osd.13 down 0 1.00000 14 ssd 1.74660 osd.14 down 1.00000 1.00000 oc get pods ---------- csi-rbdplugin-22sd7 3/3 Running 0 3h9m csi-rbdplugin-4jq74 3/3 Running 0 3h8m csi-rbdplugin-5pcb2 3/3 Running 0 3h10m csi-rbdplugin-8bfc6 3/3 Running 0 3h8m csi-rbdplugin-dqk7p 3/3 Running 0 3h9m csi-rbdplugin-fdvn7 3/3 Running 0 3h10m csi-rbdplugin-plpst 3/3 Running 0 3h8m csi-rbdplugin-provisioner-5f9c6986bf-j7p87 6/6 Running 0 3h10m csi-rbdplugin-provisioner-5f9c6986bf-lh28b 6/6 Running 0 3h10m csi-rbdplugin-szt6r 3/3 Running 0 3h9m csi-rbdplugin-v2mbl 3/3 Running 0 3h8m csi-rbdplugin-v2sl2 3/3 Running 0 3h9m noobaa-core-0 1/1 Running 0 4h24m noobaa-db-pg-0 0/1 ContainerCreating 0 4h21m noobaa-endpoint-6b7ffdb8c7-m4sc4 1/1 Running 0 4h24m noobaa-operator-8fbd98874-thsnf 2/2 Running 0 3h10m ocs-metrics-exporter-675445555-57s4l 1/1 Running 0 3h11m ocs-operator-7f94d94cc6-t9grr 1/1 Running 0 3h11m odf-console-57f488895f-9bmn9 1/1 Running 0 3h11m odf-operator-controller-manager-5696cbdd96-9bd8s 2/2 Running 0 3h11m rook-ceph-crashcollector-99efedd2c34d02d8f63821262323e8cf-g7ktw 1/1 Running 0 4h9m rook-ceph-crashcollector-ba2f7f929e41f5b369d230c9d1f57030-hvpx7 1/1 Running 0 4h36m rook-ceph-crashcollector-e268748b9d65a9160da738c1921524fc-bp2xh 1/1 Running 0 4h24m rook-ceph-exporter-99efedd2c34d02d8f63821262323e8cf-cf55b8cdff7 1/1 Running 0 3h10m rook-ceph-exporter-ba2f7f929e41f5b369d230c9d1f57030-868576jrwtl 1/1 Running 0 3h10m rook-ceph-exporter-e268748b9d65a9160da738c1921524fc-cfd99fznfcm 1/1 Running 0 3h10m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5555d4ddxlcbp 2/2 Running 4 (4h10m ago) 4h24m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7c6fbf87kqwvq 2/2 Running 7 (4h11m ago) 4h36m rook-ceph-mgr-a-648fcd4f7-rlpk8 2/2 Running 0 4h36m rook-ceph-mon-b-f75967674-vbh6v 2/2 Running 0 4h24m rook-ceph-mon-f-85459b595-zsmtc 2/2 Running 0 4h36m rook-ceph-mon-g-55dbcc6687-p6jtf 2/2 Running 0 4h44m rook-ceph-operator-5cc55456b5-hk8db 1/1 Running 0 3h11m rook-ceph-osd-0-6764d4c675-f9w2m 0/2 Init:CrashLoopBackOff 41 (4m53s ago) 3h9m rook-ceph-osd-1-6db9cfc7c9-294jn 0/2 Init:CrashLoopBackOff 41 (4m15s ago) 3h9m rook-ceph-osd-10-5cb968ffcc-svnlf 0/2 Init:CrashLoopBackOff 41 (3m52s ago) 3h9m rook-ceph-osd-11-6c9b679dd-gdxfz 0/2 Init:CrashLoopBackOff 41 (4m54s ago) 3h9m rook-ceph-osd-12-569499c577-k5ssz 0/2 Init:CrashLoopBackOff 41 (4m36s ago) 3h9m rook-ceph-osd-13-f6db445dc-cgwc2 0/2 Init:CrashLoopBackOff 41 (3m18s ago) 3h9m rook-ceph-osd-14-74d8c98998-mm6bk 0/2 Init:CrashLoopBackOff 41 (4m9s ago) 3h9m rook-ceph-osd-2-6c5f9b84d5-njc9v 0/2 Init:CrashLoopBackOff 41 (4m31s ago) 3h9m rook-ceph-osd-3-76984bf75-rtqvb 0/2 Init:CrashLoopBackOff 41 (3m56s ago) 3h9m rook-ceph-osd-4-b696776bf-8z9mx 0/2 Init:CrashLoopBackOff 13 (4m4s ago) 45m rook-ceph-osd-5-6684cc7f47-64pq4 0/2 Init:CrashLoopBackOff 14 (2m49s ago) 49m rook-ceph-osd-6-5dcff784bc-gk7st 0/2 Init:CrashLoopBackOff 41 (4m9s ago) 3h8m rook-ceph-osd-7-f66f9d586-rrkjm 0/2 Init:CrashLoopBackOff 41 (4m33s ago) 3h8m rook-ceph-osd-8-7dd767d7f4-g9s6b 0/2 Init:CrashLoopBackOff 41 (3m45s ago) 3h8m rook-ceph-osd-9-76f75fbc55-jq7lh 0/2 Init:CrashLoopBackOff 41 (3m57s ago) 3h8m rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-78f98f5f6c4n 1/2 CrashLoopBackOff 59 (2m38s ago) 4h24m rook-ceph-tools-759496b8f8-4klr9 1/1 Running 0 3h10m ux-backend-server-5fbf8b985-zpjph 2/2 Running 0 3h11m Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning BackOff 4m38s (x856 over 3h9m) kubelet Back-off restarting failed container expand-bluefs in pod rook-ceph-osd-0-6764d4c675-f9w2m_openshift-storage(2ae21d6a-2aa5-43e8-8c0d-7ecb250656a2) --- Additional comment from on 2024-04-04 18:29:13 UTC --- Seems like the backing device was removed or moved?: ````` PVC /mnt/ocs-deviceset-0-1-78k4w already exists and has the same major and minor as /ocs-deviceset-0-1-78k4w: 8e0 + exit 0 inferring bluefs devices from bluestore path unable to read label for /var/lib/ceph/osd/ceph-1: (2) No such file or directory ````` // confirm the device for osd-1 $ omc get pods rook-ceph-osd-1-6db9cfc7c9-294jn -o yaml|grep device ceph.rook.io/DeviceSet: ocs-deviceset-0 ceph.rook.io/pvc: ocs-deviceset-0-1-78k4w device-class: ssd name: devices name: ocs-deviceset-0-1-78k4w-bridge - "\nset -xe\n\nPVC_SOURCE=/ocs-deviceset-0-1-78k4w\nPVC_DEST=/mnt/ocs-deviceset-0-1-78k4w\nCP_ARGS=(--archive - devicePath: /ocs-deviceset-0-1-78k4w name: ocs-deviceset-0-1-78k4w name: ocs-deviceset-0-1-78k4w-bridge name: ocs-deviceset-0-1-78k4w-bridge name: devices name: ocs-deviceset-0-1-78k4w-bridge name: devices - name: ocs-deviceset-0-1-78k4w claimName: ocs-deviceset-0-1-78k4w path: /var/lib/rook/openshift-storage/ocs-deviceset-0-1-78k4w name: ocs-deviceset-0-1-78k4w-bridge $ omc get pvc ocs-deviceset-0-1-78k4w NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ocs-deviceset-0-1-78k4w Bound local-pv-32532e89 1788Gi RWO ocs-localblock 3y $ omc get pv local-pv-32532e89 -o yaml apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/bound-by-controller: "yes" pv.kubernetes.io/provisioned-by: local-volume-provisioner-storage-00.dev-intranet-01-wob.ocp.vwgroup.com-da4c2721-f73c-4626-8c98-7ff9f07f3212 creationTimestamp: "2020-09-09T14:52:54Z" finalizers: - kubernetes.io/pv-protection labels: storage.openshift.com/local-volume-owner-name: ocs-blkvol-storage-00 storage.openshift.com/local-volume-owner-namespace: local-storage name: local-pv-32532e89 resourceVersion: "194139688" uid: fdcb6fab-0a53-49ca-bdb1-6e807e969eb7 spec: accessModes: - ReadWriteOnce capacity: storage: 1788Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: ocs-deviceset-0-1-78k4w namespace: openshift-storage resourceVersion: "194139403" uid: 8acac407-b475-46ed-9e49-29b377b80137 local: path: /mnt/local-storage/ocs-localblock/sdr nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - storage-00.dev-intranet-01-wob.ocp.vwgroup.com persistentVolumeReclaimPolicy: Delete storageClassName: ocs-localblock volumeMode: Block status: phase: Bound For this osd at least, they are using device paths and should be using devices by-id/uuid so the device names never change ~~~ local: path: /mnt/local-storage/ocs-localblock/sdr ~~~ Asking for some more data in the case like: ~~~ $ ls -l /mnt/local-storage/ocs-localblock/ I'd also like to gather the following from LSO: // namespace might be local-storage $ oc get localvolume -o yaml -n openshift-local-storage $ oc get localvolumeset -o yaml -n openshift-local-storage $ oc get localvolumediscovery -o yaml -n openshift-local-storage ~~~ I have a very strong suspicion the kernel picked up the devices in another order and the osds cannot find their backing device. This was caused by the EUS upgrade of OCP, as it does a rollout of MCPs that will reboot the nodes. --- Additional comment from on 2024-04-04 21:45:31 UTC --- Hi, My suspicion was wrong // sym links for devices on storage-00 from lso: [acmdy78@bastion ~]$ oc debug -q node/storage-00.dev-intranet-01-wob.ocp.vwgroup.com -- chroot /host ls -l /mnt/local-storage/ocs-localblock/ total 0 lrwxrwxrwx. 1 root root 50 Sep 9 2020 sdp -> /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02507 lrwxrwxrwx. 1 root root 93 Apr 4 14:34 sdq -> /dev/ceph-e936c994-328c-4f59-8f1d-3a5573a7c64b/osd-block-aaced0de-8884-4551-a5ae-dd86ee436f23 lrwxrwxrwx. 1 root root 50 Sep 9 2020 sdr -> /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490 lrwxrwxrwx. 1 root root 50 Sep 9 2020 sds -> /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02497 lrwxrwxrwx. 1 root root 50 Sep 9 2020 sdt -> /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02489 They have some weird LSO config where each node has its own spec section with its devices listed. Anyways, they have the proper device defined in their LSO configs for the node ~~~ spec: logLevel: Normal managementState: Managed nodeSelector: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - storage-00.dev-intranet-01-wob.ocp.vwgroup.com storageClassDevices: - devicePaths: - /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02489 - /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490 - /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02497 - /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02504 - /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02507 storageClassName: ocs-localblock ~~~ and the symlink that lso knows about for sdr (ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490) points to the correct device. Seems the device names didn't change, and they are using by-id. Taking a step back... since the failure is with the expand-bluefs container: ~~~ - containerID: cri-o://c4163c5dbd33cab921c113b80350ec20a3af48a2865f7ea43c68f4cdd61afc19 image: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226 imageID: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:3d7144d5fe515acf3bf4bbf6456ab8877a4f7cd553c933ca6fd4d891add53038 lastState: terminated: containerID: cri-o://c4163c5dbd33cab921c113b80350ec20a3af48a2865f7ea43c68f4cdd61afc19 exitCode: 1 finishedAt: "2024-04-04T15:10:29Z" reason: Error startedAt: "2024-04-04T15:10:29Z" name: expand-bluefs ready: false restartCount: 37 state: waiting: message: back-off 5m0s restarting failed container=expand-bluefs pod=rook-ceph-osd-1-6db9cfc7c9-294jn_openshift-storage(4cab01f5-438d-4ffc-a133-cd427bb1cda5) reason: CrashLoopBackOff ~~~ because it cannot find its block device: ~~~ $ omc logs rook-ceph-osd-1-6db9cfc7c9-294jn -c expand-bluefs 2024-04-04T15:10:29.457626768Z inferring bluefs devices from bluestore path 2024-04-04T15:10:29.457728034Z unable to read label for /var/lib/ceph/osd/ceph-1: (2) No such file or directory 2024-04-04T15:10:29.457728034Z 2024-04-04T15:10:29.456+0000 7fdba942e900 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (2) No such file or directory ~~~ what if we removed this expand-bluefs container? Maybe the osd will start? If not, is the only option to replace the osd(s)? We do have 1 node up, so we should still hopefully have some good copies? What if setting replica 1 on the pools? --- Additional comment from Andreas Bleischwitz on 2024-04-05 07:58:58 UTC --- Hi, can we have at least an update that this issue is being investigated by engineering? The customer is now suffering from that outage which affects basically their complete development environment (they are developers, and therefore this cluster is their production environment) since about one day. We currently do not have any idea how to re-enable the OSDs so that they would be able to work again. Customer: Volkswagen AG (#556879) @muagarwa, @gsternag are you able to assist here? Best regards, /Andreas --- Additional comment from Bipin Kunal on 2024-04-05 11:36:28 UTC --- (In reply to Andreas Bleischwitz from comment #13) > Hi, > > can we have at least an update that this issue is being investigated by > engineering? The customer is now suffering from that outage which affects > basically their complete development environment (they are developers, and > therefore this cluster is their production environment) since about one day. > We currently do not have any idea how to re-enable the OSDs so that they > would be able to work again. > > Customer: > Volkswagen AG (#556879) > > > @muagarwa, @gsternag are you able to assist here? > > Best regards, > /Andreas Hi Andreas, Thanks for reaching out to me. I am trying to reach out to engineering team. Meanwhile, it will good to have prio-list email if this is really urgent. Removing the needinfo on Mudit. --- Additional comment from Radoslaw Zarzynski on 2024-04-05 11:58:00 UTC --- On it. --- Additional comment from Radoslaw Zarzynski on 2024-04-05 13:15:51 UTC --- ``` [supportshell-1.sush-001.prod.us-west-2.aws.redhat.com] [13:03:07+0000] [rzarzyns@supportshell-1 03783266]$ cat ./0010-rook-ceph-osd-0-5d664bf845-mf956-expand-bluefs.log inferring bluefs devices from bluestore path unable to read label for /var/lib/ceph/osd/ceph-0: (2) No such file or directory 2024-04-04T09:23:13.685+0000 7f9911e4c900 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory ``` Let's read through the pacific's `ceph-bluestore-tool` for the above failure of the `bluefs-bdev-expand` command: ``` else if (action == "bluefs-bdev-expand") { BlueStore bluestore(cct.get(), path); auto r = bluestore.expand_devices(cout); if (r <0) { cerr << "failed to expand bluestore devices: " << cpp_strerror(r) << std::endl; exit(EXIT_FAILURE); } } ``` ``` int BlueStore::expand_devices(ostream& out) { // ... if (_set_bdev_label_size(p, size) >= 0) { out << devid << " : size label updated to " << size << std::endl; } ``` ``` int BlueStore::_set_bdev_label_size(const string& path, uint64_t size) { bluestore_bdev_label_t label; int r = _read_bdev_label(cct, path, &label); if (r < 0) { derr << "unable to read label for " << path << ": " << cpp_strerror(r) << dendl; } else { ``` ``` int BlueStore::_read_bdev_label(CephContext* cct, string path, bluestore_bdev_label_t *label) { dout(10) << __func__ << dendl; int fd = TEMP_FAILURE_RETRY(::open(path.c_str(), O_RDONLY|O_CLOEXEC)); if (fd < 0) { fd = -errno; derr << __func__ << " failed to open " << path << ": " << cpp_strerror(fd) << dendl; return fd; } // ... ``` We can see the direct underlying problem is the failure of the `open()` syscall called on `/var/lib/ceph/osd/ceph-0/block`. Whatever the Rook's container gets inside, COT is unable to open it. Therefore it looks more like orchestrator's failure than Ceph. It seems a good idea to run a shell within the container's environment and check, with standard unix, tools the presence the `block`. If it's a sysmlink, the target must be possible to be `open()ed` as well. Another thing that struck me is this error: ``` [supportshell-1.sush-001.prod.us-west-2.aws.redhat.com] [13:10:18+0000] [rzarzyns@supportshell-1 03783266]$ cat ./0020-must-gather-odf.tar.gz/must-gather.local.8213025456072446876/inspect.local.6887047306785235156/namespaces/openshift-storage/pods/rook-ceph-osd-0-5d664bf845-mf956/config-init/config-init/logs/current.log 2024-04-04T09:02:08.139048570Z Error: ceph-username is required for osd 2024-04-04T09:02:08.139350828Z Usage: 2024-04-04T09:02:08.139350828Z rook ceph osd init [flags] 2024-04-04T09:02:08.139350828Z 2024-04-04T09:02:08.139350828Z Flags: 2024-04-04T09:02:08.139350828Z --cluster-id string the UID of the cluster CR that owns this cluster 2024-04-04T09:02:08.139350828Z --cluster-name string the name of the cluster CR that owns this cluster 2024-04-04T09:02:08.139350828Z --encrypted-device whether to encrypt the OSD with dmcrypt 2024-04-04T09:02:08.139350828Z -h, --help help for init 2024-04-04T09:02:08.139350828Z --is-device whether the osd is a device 2024-04-04T09:02:08.139350828Z --location string location of this node for CRUSH placement 2024-04-04T09:02:08.139350828Z --node-name string the host name of the node (default "rook-ceph-osd-0-5d664bf845-mf956") 2024-04-04T09:02:08.139350828Z --osd-crush-device-class string The device class for all OSDs configured on this node 2024-04-04T09:02:08.139350828Z --osd-crush-initial-weight string The initial weight of OSD in TiB units 2024-04-04T09:02:08.139350828Z --osd-database-size int default size (MB) for OSD database (bluestore) 2024-04-04T09:02:08.139350828Z --osd-id int osd id for which to generate config (default -1) 2024-04-04T09:02:08.139350828Z --osd-store-type string the osd store type such as bluestore (default "bluestore") 2024-04-04T09:02:08.139350828Z --osd-wal-size int default size (MB) for OSD write ahead log (WAL) (bluestore) (default 576) 2024-04-04T09:02:08.139350828Z --osds-per-device int the number of OSDs per device (default 1) 2024-04-04T09:02:08.139350828Z 2024-04-04T09:02:08.139350828Z Global Flags: 2024-04-04T09:02:08.139350828Z --log-level string logging level for logging/tracing output (valid values: ERROR,WARNING,INFO,DEBUG) (default "INFO") 2024-04-04T09:02:08.139350828Z 2024-04-04T09:02:08.139365263Z rook error: ceph-username is required for osd ``` I'm not a Rook expert but this looks weird especially taken into consideration the upgrade. Is `ceph osd init` failing early because of unspecified user parameter, which leaves the container's `block` uninitialized? Just speculating. Best regards, Radek --- Additional comment from Bipin Kunal on 2024-04-05 13:23:30 UTC --- Thanks Radek for checking. I will check if someone from rook can have a look as well. --- Additional comment from on 2024-04-05 13:52:56 UTC --- Hello, @bkunal found the KCS https://access.redhat.com/solutions/7026462 I'm going to confirm this is the same for this case. Will post findings when I have any. --- Additional comment from Bipin Kunal on 2024-04-05 13:57:53 UTC --- (In reply to kelwhite from comment #18) > Hello, > > @bkunal found the KCS https://access.redhat.com/solutions/7026462 I'm going > to confirm this is the same for this case. Will post findings when I have > any. Actually Shubham from the Rook team found it and gave it to me. --- Additional comment from on 2024-04-05 15:41:15 UTC --- From the customer: // for osd-1: ~~~~ osd-1 is not using /dev/sdr, but i figured it out see this path: [acmdy78@bastion ~]$ oc get -n openshift-storage -o yaml deployment rook-ceph-osd-1 | grep ceph.rook.io/pvc ceph.rook.io/pvc: ocs-deviceset-0-1-78k4w ceph.rook.io/pvc: ocs-deviceset-0-1-78k4w - key: ceph.rook.io/pvc [acmdy78@bastion ~]$ oc get pvc ocs-deviceset-0-1-78k4w NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ocs-deviceset-0-1-78k4w Bound local-pv-32532e89 1788Gi RWO ocs-localblock 3y208d [acmdy78@bastion ~]$ oc get pv local-pv-32532e89 -o custom-columns=NAME:.metadata.name,PATH:.spec.local.path,NODE:.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values NAME PATH NODE local-pv-32532e89 /mnt/local-storage/ocs-localblock/sdr [storage-00.dev-intranet-01-wob.ocp.vwgroup.com] [acmdy78@bastion ~]$ oc debug -q node/storage-00.dev-intranet-01-wob.ocp.vwgroup.com sh-4.4# chroot /host sh-5.1# ls -lah /mnt/local-storage/ocs-localblock/sdr lrwxrwxrwx. 1 root root 50 Sep 9 2020 /mnt/local-storage/ocs-localblock/sdr -> /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490 sh-5.1# ls -lah /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490 lrwxrwxrwx. 1 root root 9 Apr 5 14:58 /dev/disk/by-id/ata-MZ7KM1T9HMJP0D3_S3BRNX0KA02490 -> ../../sdo sh-5.1# ls -lah /dev/sdo brw-rw----. 1 root disk 8, 224 Apr 5 14:58 /dev/sdo sh-5.1# lsblk /dev/sdo NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdo 8:224 0 1.7T 0 disk `-ceph--f557e476--7bd4--41a0--9323--d6061a4318b3-osd--block--7f80e2ac--e21f--4aa6--8886--ec94d0387196 253:5 0 1.7T 0 lvm sh-5.1# head --bytes=60 /dev/sdo sh-5.1# ls -lah /dev/ceph- ceph-309180b2-697b-473a-a19c-d00cec94427a/ ceph-b100dea8-0b24-4d9b-97c8-ed6dba1bd10d/ ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/ ceph-3c187f41-d43e-4bb2-9421-97f78db94d28/ ceph-e936c994-328c-4f59-8f1d-3a5573a7c64b/ sh-5.1# ls -lah /dev/ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/osd-block-7f80e2ac-e21f-4aa6-8886-ec94d0387196 lrwxrwxrwx. 1 root root 7 Apr 4 10:54 /dev/ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/osd-block-7f80e2ac-e21f-4aa6-8886-ec94d0387196 -> ../dm-5 sh-5.1# ls -lah /dev/dm-5 brw-rw----. 1 root disk 253, 5 Apr 4 10:54 /dev/dm-5 sh-5.1# head --bytes=60 /dev/dm-5 bluestore block device 7f80e2ac-e21f-4aa6-8886-ec94d0387196 the bluestore block device id ov the logical volume is the same that you get for osd-1 in the ceph osd dump command. Could the Problem be caused by this lvm layer? On clusters we installed later ODF i don't see that lvm is used. This cluster started with OCS 4.3 or 4.4 ~~~~ --- Additional comment from on 2024-04-05 17:12:25 UTC --- Hi, Update... We've found the block devices dont exist on the nodes (this is from storage-00): ~~~ /var/lib/rook/openshift-storage/ocs-deviceset-0-1-78k4w: total 0 drwxr-xr-x. 2 root root 6 Apr 3 14:50 ceph-1 brw-rw-rw-. 1 root disk 8, 224 Apr 4 10:57 ocs-deviceset-0-1-78k4w /var/lib/rook/openshift-storage/ocs-deviceset-0-1-78k4w/ceph-1: total 0 /var/lib/rook/openshift-storage/ocs-deviceset-0-2-2p2fc: total 0 drwxr-xr-x. 2 root root 6 Apr 3 14:50 ceph-2 brw-rw-rw-. 1 root disk 65, 0 Apr 4 10:57 ocs-deviceset-0-2-2p2fc /var/lib/rook/openshift-storage/ocs-deviceset-0-2-2p2fc/ceph-2: total 0 /var/lib/rook/openshift-storage/ocs-deviceset-0-3-lh2tq: total 0 drwxr-xr-x. 2 root root 6 Apr 3 14:50 ceph-3 brw-rw-rw-. 1 root disk 65, 16 Apr 4 10:57 ocs-deviceset-0-3-lh2tq /var/lib/rook/openshift-storage/ocs-deviceset-0-3-lh2tq/ceph-3: total 0 /var/lib/rook/openshift-storage/ocs-deviceset-0-4-wfm22: total 0 drwxr-xr-x. 2 root root 10 Apr 3 14:50 ceph-4 brw-rw-rw-. 1 root disk 253, 4 Apr 4 14:49 ocs-deviceset-0-4-wfm22 /var/lib/rook/openshift-storage/ocs-deviceset-0-4-wfm22/ceph-4: total 0 ~~~ We need to confirm why these are gone. The current ask from engineering is why did these devices vanish. Would rook do anything with this? Can we find anything that will help? We're confirming the devices are gone on the other nodes and starting the osd replacement processes via a remote call. --- Additional comment from on 2024-04-05 19:50:49 UTC --- Hello All, On a remote with the customer. We've confirmed no data loss, phew. Seems the issue is with ceph-volume, it's not activating the device. We tried to do this manually via the below and got osd-9 up and running: ~~~ - Creating a backup of the osd-9 deployment, we're going to remove the liveness probe - scaled down the rook-ceph and ocs-operators - oc edit the osd-9 deployment and searched for the expand-bluefs section and removed the container - oc get pods to see if osd-9 came up (still 1/2) and rshed info the container - ceph-volume lvm list - ceph-volume lvm active --no-systemd -- 9 79021ece-c52a-46d1-8e99-69640a926822 // this is the osd fsid from ceph-volume lvm list - The osd was activated and when we viewed the osd data dir, the block device was listed: - ls -l '/var/lib/ceph/osd/ceph-{id} ~~~ We're looking to get some ceph-volume logs to determine what's going on... Might need to create another BZ for ceph-volume, but we will know more once we review the fresh odf must-gather --- Additional comment from Travis Nielsen on 2024-04-05 20:56:28 UTC --- Great to see the OSDs can be brought back up with the workaround and there is no data loss. These old LVM-based OSDs that were created (IIRC only in 4.2 and 4.3) are going to be a problem to maintain. We simply don't have tests that upgrades from OSDs created from 10+ releases ago. For this configuration that has not been supported for so long, the way to keep supporting such an old cluster will be to replace each of the OSDs. By purging each OSD one-at-a-time and bringing up a new one, the OSDs can be in a current configuration. It would not surprise me that in 4.14 there could have been an update to ceph-volume that caused this issue, because we just haven't tested this configuration for so long. Guillaume, agreed that old LVM-based OSDs should be replaced? --- Additional comment from Prashant Dhange on 2024-04-05 21:18:53 UTC --- Additional details for the completeness : (In reply to kelwhite from comment #22) > Hello All, > > On a remote with the customer. We've confirmed no data loss, phew. Seems the > issue is with ceph-volume, it's not activating the device. We tried to do > this manually via the below and got osd-9 up and running: > > ~~~ > - Creating a backup of the osd-9 deployment, we're going to remove the > liveness probe > - scaled down the rook-ceph and ocs-operators > - oc edit the osd-9 deployment and searched for the expand-bluefs section > and removed the container > - oc get pods to see if osd-9 came up (still 1/2) and rshed info the > container > - ceph-volume lvm list All LVs associated with ceph cluster are getting listed here and lsblk/lvs recognizing these LVs. > - ceph-volume lvm active --no-systemd -- 9 > 79021ece-c52a-46d1-8e99-69640a926822 // this is the osd fsid from > ceph-volume lvm list > - The osd was activated and when we viewed the osd data dir, the block > device was listed: > - ls -l '/var/lib/ceph/osd/ceph-{id} - Start osd.9 # ceph-osd --id 9 --fsid 18c9800f-7f91-4994-ad32-2a8a330babd6 --setuser ceph --setgroup ceph --crush-location="root=default host=storage-01-dev-intranet-01-wob-ocp-vwgroup-com rack=rack1" --log-to-stderr=true --err-to-stderr=true --mon-cluster-log-to-stderr=true --log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false --ms-learn-addr-from-peer=false NOTE : The OSD daemon will run in background and it's safe to exist the container here. --- Additional comment from Prashant Dhange on 2024-04-05 22:14:32 UTC --- The latest provided must-gather logs and ceph logs does not shed any light on failure to OSD directory priming or ceph-volume activating the OSD device. The next action plan : - Apply the workaround for every OSD on the cluster, refer comment#24 - Get all OSDs up/in and all PGs active+clean - Re-deploy all OSDs one-by-one. For the other clusters which might experience similar issues, the recommendations are to re-deploy all the OSDs then only go for cluster upgrade from 4.12.47 to 4.14.16. Let me know if you need any help on recovering this cluster. --- Additional comment from Prashant Dhange on 2024-04-05 22:58:31 UTC --- (In reply to Prashant Dhange from comment #24) ... > - Start osd.9 > # ceph-osd --id 9 --fsid 18c9800f-7f91-4994-ad32-2a8a330babd6 > --setuser ceph --setgroup ceph --crush-location="root=default > host=storage-01-dev-intranet-01-wob-ocp-vwgroup-com rack=rack1" > --log-to-stderr=true --err-to-stderr=true > --mon-cluster-log-to-stderr=true --log-stderr-prefix=debug > --default-log-to-file=false --default-mon-cluster-log-to-file=false > --ms-learn-addr-from-peer=false > NOTE : The OSD daemon will run in background and it's safe to exist the > container here. In ceph-osd run command, change crush-location according to `ceph osd tree` output or copy it from osd deployment config (under spec.containers section). Do not forget to add double quotes for crush-location value. e.g # oc get deployment rook-ceph-osd-9 -o yaml spec: affinity: ... containers: - args: - ceph - osd - start - -- - --foreground - --id - "9" - --fsid - 18c9800f-7f91-4994-ad32-2a8a330babd6 - --setuser - ceph - --setgroup - ceph - --crush-location=root=default host=storage-01-dev-intranet-01-wob-ocp-vwgroup-com rack=rack1 --- Additional comment from Rafrojas on 2024-04-06 07:19:41 UTC --- Hi Prashant I joined the call with customer and we aplied the Workaround, we edited the deployment of each OSD and removed the expand-bluefs args from that, we have a backup of all the deployments if required. After that ceph started the recovery and finished after some time, a new must-gather is collected and available on the case, there's a WARN on ceph: health: HEALTH_WARN 15 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats I also requested to collect the /var/log/ceph ceph volume logs for the RCA, Donny will collect along the day. We accorded to wait until new data is checked until continue with next steps, we cannot confirm that application is working fine, because developers shifts are MON-FRI but we see that the cluster looks on a better shape, with all operators running. Regards Rafa --- Additional comment from Rafrojas on 2024-04-06 08:59:04 UTC --- Hi Prashant Ceph logs collected and attached to the case, waiting for your instructions for next steps Regards Rafa --- Additional comment from Rafrojas on 2024-04-06 12:12:03 UTC --- Hi Prashant CU is waiting for some feedback, they are running this cluster in abnormal state, NA will join the shift soon I'll add the handover on the case from last call and status, please let us know next steps to share with CU ASAP. Regards Rafa --- Additional comment from Prashant Dhange on 2024-04-07 02:55:59 UTC --- Hi Rafa, (In reply to Rafrojas from comment #27) > Hi Prashant > > I joined the call with customer and we aplied the Workaround, we edited > the deployment of each OSD and removed the expand-bluefs args from that, we > have a backup of all the deployments if required. Good to know that all OSDs are up and running after applying the workaround. There is a quick way to patch the OSD deployment to remove bluefs-expand init container using oc patch command : # oc patch deployment rook-ceph-osd-<osdid> --type=json -p='[{"op": "remove", "path": "/spec/template/spec/initContainers/3"}]' > After that ceph started > the recovery and finished after some time, a new must-gather is collected > and available on the case, there's a WARN on ceph: > > health: HEALTH_WARN > 15 OSD(s) reporting legacy (not per-pool) BlueStore omap usage > stats This warning is because OSDs were created pre-octopus release. This warning will be addressed as we are re-deploying the OSDs. If we were not planning to re-deploy the OSDs then you need to set `ceph config rm osd bluestore_fsck_quick_fix_on_mount` and restart the OSDs. Refer KCS solution https://access.redhat.com/solutions/7041554 for more details. > > I also requested to collect the /var/log/ceph ceph volume logs for the > RCA, Donny will collect along the day. The latest logs have been analyzed and Guillaume able to find the RCA for the issue. The RCA has been provided in BZ-2273724#c3 comment. --- Additional comment from Bob Emerson on 2024-04-07 17:25:09 UTC --- Customer has posted an update regarding his notes and status in case 03783266 STATUS update: Hi, also the migration from the last osds from lvm to raw is now completed and i have reset the min_size of the Pools back to 2. bash-5.1$ ceph osd unset nobackfill nobackfill is unset bash-5.1$ ceph -s cluster: id: 18c9800f-7f91-4994-ad32-2a8a330babd6 health: HEALTH_WARN Degraded data redundancy: 1358206/4099128 objects degraded (33.134%), 295 pgs degraded, 295 pgs undersized 1 daemons have recently crashed services: mon: 3 daemons, quorum b,f,g (age 5m) mgr: a(active, since 14m) mds: 1/1 daemons up, 1 hot standby osd: 15 osds: 15 up (since 2m), 15 in (since 2m); 295 remapped pgs rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 313 pgs objects: 1.37M objects, 955 GiB usage: 1.7 TiB used, 24 TiB / 26 TiB avail pgs: 1358206/4099128 objects degraded (33.134%) 14864/4099128 objects misplaced (0.363%) 290 active+undersized+degraded+remapped+backfill_wait 18 active+clean 5 active+undersized+degraded+remapped+backfilling io: client: 20 KiB/s rd, 136 KiB/s wr, 3 op/s rd, 9 op/s wr recovery: 2.0 MiB/s, 4.79k keys/s, 817 objects/s bash-5.1$ ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -1 26.19896 - 26 TiB 1.7 TiB 1.7 TiB 1.6 GiB 11 GiB 24 TiB 6.60 1.00 - root default -4 8.73299 - 8.7 TiB 886 GiB 879 GiB 853 MiB 6.0 GiB 7.9 TiB 9.90 1.50 - rack rack0 -3 8.73299 - 8.7 TiB 886 GiB 879 GiB 853 MiB 6.0 GiB 7.9 TiB 9.90 1.50 - host storage-00-dev-intranet-01-wob-ocp-vwgroup-com 0 ssd 1.74660 1.00000 1.7 TiB 177 GiB 176 GiB 165 MiB 1.0 GiB 1.6 TiB 9.91 1.50 58 up osd.0 1 ssd 1.74660 1.00000 1.7 TiB 199 GiB 197 GiB 261 MiB 1.4 GiB 1.6 TiB 11.13 1.69 65 up osd.1 2 ssd 1.74660 1.00000 1.7 TiB 172 GiB 170 GiB 94 MiB 1.4 GiB 1.6 TiB 9.62 1.46 58 up osd.2 3 ssd 1.74660 1.00000 1.7 TiB 171 GiB 169 GiB 144 MiB 1.0 GiB 1.6 TiB 9.54 1.45 64 up osd.3 4 ssd 1.74660 1.00000 1.7 TiB 167 GiB 165 GiB 189 MiB 1.1 GiB 1.6 TiB 9.32 1.41 68 up osd.4 -12 8.73299 - 8.7 TiB 884 GiB 879 GiB 798 MiB 4.3 GiB 7.9 TiB 9.88 1.50 - rack rack1 -11 8.73299 - 8.7 TiB 884 GiB 879 GiB 798 MiB 4.3 GiB 7.9 TiB 9.88 1.50 - host storage-01-dev-intranet-01-wob-ocp-vwgroup-com 6 ssd 1.74660 1.00000 1.7 TiB 185 GiB 184 GiB 128 MiB 940 MiB 1.6 TiB 10.32 1.56 62 up osd.6 7 ssd 1.74660 1.00000 1.7 TiB 200 GiB 199 GiB 207 MiB 1.0 GiB 1.6 TiB 11.18 1.69 71 up osd.7 8 ssd 1.74660 1.00000 1.7 TiB 161 GiB 160 GiB 173 MiB 939 MiB 1.6 TiB 8.98 1.36 63 up osd.8 9 ssd 1.74660 1.00000 1.7 TiB 181 GiB 180 GiB 137 MiB 848 MiB 1.6 TiB 10.11 1.53 64 up osd.9 10 ssd 1.74660 1.00000 1.7 TiB 158 GiB 157 GiB 153 MiB 629 MiB 1.6 TiB 8.83 1.34 53 up osd.10 -8 8.73299 - 8.7 TiB 693 MiB 121 MiB 0 B 572 MiB 8.7 TiB 0.01 0.00 - rack rack2 -7 8.73299 - 8.7 TiB 693 MiB 121 MiB 0 B 572 MiB 8.7 TiB 0.01 0.00 - host storage-02-dev-intranet-01-wob-ocp-vwgroup-com 5 ssd 1.74660 1.00000 1.7 TiB 117 MiB 19 MiB 0 B 97 MiB 1.7 TiB 0.01 0 3 up osd.5 11 ssd 1.74660 1.00000 1.7 TiB 144 MiB 27 MiB 0 B 117 MiB 1.7 TiB 0.01 0.00 5 up osd.11 12 ssd 1.74660 1.00000 1.7 TiB 148 MiB 27 MiB 0 B 121 MiB 1.7 TiB 0.01 0.00 7 up osd.12 13 ssd 1.74660 1.00000 1.7 TiB 143 MiB 23 MiB 0 B 119 MiB 1.7 TiB 0.01 0.00 2 up osd.13 14 ssd 1.74660 1.00000 1.7 TiB 141 MiB 23 MiB 0 B 117 MiB 1.7 TiB 0.01 0.00 6 up osd.14 TOTAL 26 TiB 1.7 TiB 1.7 TiB 1.6 GiB 11 GiB 24 TiB 6.60 MIN/MAX VAR: 0/1.69 STDDEV: 4.70 bash-5.1$ for i in .rgw.root ocs-storagecluster-cephblockpool ocs-storagecluster-cephfilesystem-metadata ocs-storagecluster-cephobjectstore.rgw.control ocs-storagecluster-cephfilesystem-data0 ocs-storagecluster-cephobjectstore.rgw.meta ocs-storagecluster-cephobjectstore.rgw.log ocs-storagecluster-cephobjectstore.rgw.buckets.index ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec ocs-storagecluster-cephobjectstore.rgw.buckets.data .mgr ocs-storagecluster-cephobjectstore.rgw.otp ; do ceph osd pool set $i min_size 2 ; done set pool 1 min_size to 2 set pool 2 min_size to 2 set pool 3 min_size to 2 set pool 4 min_size to 2 set pool 5 min_size to 2 set pool 6 min_size to 2 set pool 7 min_size to 2 set pool 8 min_size to 2 set pool 9 min_size to 2 set pool 10 min_size to 2 set pool 11 min_size to 2 set pool 12 min_size to 2 I am now waiting for the backfill to finish. Regards Donny -------------------------------------------------------------------------------------------------------------------------------------------------- NOTES update(attached files to BZ) Hi, i will upload the latest Version of my Notes and a detailed output of commnds i ran to manually remove the local-storage lvm-Volumes, for your reference. I have seen in the diskmaker-manager-Pods (local-storage Operator) that it had Problem removing the lvm-Disks, thus making the prcedure neccasary to remove the VolumeGroups and logicalVolumes manually. I include the logs from the diskmaker-manager so you can have a look if there is a bug in local-storage-Operator, about deleting lvm ocs-localblock Volumes. Regards Donny --- Additional comment from Prashant Dhange on 2024-04-09 19:53:14 UTC --- (In reply to Prashant Dhange from comment #30) > Hi Rafa, ... > > After that ceph started > > the recovery and finished after some time, a new must-gather is collected > > and available on the case, there's a WARN on ceph: > > > > health: HEALTH_WARN > > 15 OSD(s) reporting legacy (not per-pool) BlueStore omap usage > > stats > This warning is because OSDs were created pre-octopus release. This warning > will be addressed as we are re-deploying the OSDs. If we were not planning > to re-deploy the OSDs then you need to set `ceph config rm osd > bluestore_fsck_quick_fix_on_mount` and restart the OSDs. Correction. Meant to say : This warning is because OSDs were created pre-octopus release. This warning will be addressed as we are re-deploying the OSDs. If we were not planning to re-deploy the OSDs then you need to set `ceph config set osd bluestore_fsck_quick_fix_on_mount true`, restart the OSDs and then `ceph config rm osd bluestore_fsck_quick_fix_on_mount`. > Refer KCS solution > https://access.redhat.com/solutions/7041554 for more details. --- Additional comment from Prashant Dhange on 2024-04-09 21:11:36 UTC --- We are still getting more details about the ODF upgrade history from the customer. Based on available data, here are the steps to reproduce this issue : - Deploy 4.3.18 cluster with LVM based OSDs - Start upgrading to ODF 4.4 then to every major release till 4.13.7 e.g from 4.4 to 4.5 to 4.6 so on. - Verify that ODF cluster is healthy and we are not observing any daemon crash (specifically OSDs) - Upgrade from 4.13.7 to 4.14.16 - Observe the OSDs are stuck in CLBO state --- Additional comment from Prashant Dhange on 2024-04-09 23:11:46 UTC --- Okay. The issue is not related to ceph-volume at all. The problem was OSDs were deployed on OCS 4.3 cluster, so the deployment config has different initContainers compared to ODF versions (probably 4.9 or later). Init containers sequence for 4.3 deployment config (refer point [2] below) : Container-1 : ## Init Container 1 : rook ceph osd init Container-2 : ## Init Container 2 : Copy rook command to OSD pod Container-3 : ## Init Container 3 : expand-bluefs Container-4 : ## Init Container 4 : chown ceph directories then the actual osd container starts, which executes the "ceph osd start" script which internally calls ceph-volume lvm activate then ceph-osd command. Container-5: ceph osd start (refer points [1] and [3] below) When the customer upgraded to 4.14.16, the "rook ceph osd init" container failed to mount the osd data directory. Due to this expand-bluefs container failed to start and exited with "_read_bdev_label failed to open /var/lib/ceph/osd/ceph-<osdid>/block: (2) No such file or directory" error. When we removed expand-bluefs init container as a workaround, the ceph osd started successfully as Container-5 (ceph osd start) was able to execute the lvm activate and start ceph-osd daemon. When I was on the remote session for the first time, we were able to start (after removing expand-bluefs init container) osd.9 manually after executing the lvm activate command then the ceph-osd command. [3] ceph osd start logs 2024-04-06T05:46:41.593349071Z + set -o nounset 2024-04-06T05:46:41.593349071Z + child_pid= 2024-04-06T05:46:41.593427396Z + sigterm_received=false 2024-04-06T05:46:41.593427396Z + trap sigterm SIGTERM 2024-04-06T05:46:41.593576845Z + child_pid=52 2024-04-06T05:46:41.593589922Z + wait 52 2024-04-06T05:46:41.593726159Z + /rook/rook ceph osd start -- --foreground --id 1 --fsid 18c9800f-7f91-4994-ad32-2a8a330babd6 --setuser ceph --setgroup ceph '--crush-location=root=default host=storage-00-dev-intranet-01-wob-ocp-vwgroup-com rack=rack0' --osd-op-num-threads-per-shard=2 --osd-op-num-shards=8 --osd-recovery-sleep=0 --osd-snap-trim-sleep=0 --osd-delete-sleep=0 --bluestore-min-alloc-size=4096 --bluestore-prefer-deferred-size=0 --bluestore-compression-min-blob-size=8192 --bluestore-compression-max-blob-size=65536 --bluestore-max-blob-size=65536 --bluestore-cache-size=3221225472 --bluestore-throttle-cost-per-io=4000 --bluestore-deferred-batch-ops=16 --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true '--default-log-stderr-prefix=debug ' --default-log-to-file=false --default-mon-cluster-log-to-file=false --ms-learn-addr-from-peer=false 2024-04-06T05:46:41.626980032Z 2024-04-06 05:46:41.626898 I | rookcmd: starting Rook v4.14.6-0.7522dc8ddafd09860f2314db3965ef97671cd138 with arguments '/rook/rook ceph osd start -- --foreground --id 1 --fsid 18c9800f-7f91-4994-ad32-2a8a330babd6 --setuser ceph --setgroup ceph --crush-location=root=default host=storage-00-dev-intranet-01-wob-ocp-vwgroup-com rack=rack0 --osd-op-num-threads-per-shard=2 --osd-op-num-shards=8 --osd-recovery-sleep=0 --osd-snap-trim-sleep=0 --osd-delete-sleep=0 --bluestore-min-alloc-size=4096 --bluestore-prefer-deferred-size=0 --bluestore-compression-min-blob-size=8192 --bluestore-compression-max-blob-size=65536 --bluestore-max-blob-size=65536 --bluestore-cache-size=3221225472 --bluestore-throttle-cost-per-io=4000 --bluestore-deferred-batch-ops=16 --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true --default-log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false --ms-learn-addr-from-peer=false' 2024-04-06T05:46:41.626980032Z 2024-04-06 05:46:41.626956 I | rookcmd: flag values: --block-path=/dev/ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/osd-block-7f80e2ac-e21f-4aa6-8886-ec94d0387196, --help=false, --log-level=INFO, --lv-backed-pv=true, --osd-id=1, --osd-store-type=, --osd-uuid=7f80e2ac-e21f-4aa6-8886-ec94d0387196, --pvc-backed-osd=true 2024-04-06T05:46:41.626980032Z 2024-04-06 05:46:41.626960 I | ceph-spec: parsing mon endpoints: g=100.69.195.205:3300,f=100.70.70.134:6789,b=100.70.78.99:6789 2024-04-06T05:46:41.628815634Z 2024-04-06 05:46:41.628788 I | cephosd: Successfully updated lvm config file "/etc/lvm/lvm.conf" 2024-04-06T05:46:41.925092800Z 2024-04-06 05:46:41.925022 I | exec: Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1 2024-04-06T05:46:41.928518615Z 2024-04-06 05:46:41.928499 I | exec: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1 2024-04-06T05:46:41.931919054Z 2024-04-06 05:46:41.931906 I | exec: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/osd-block-7f80e2ac-e21f-4aa6-8886-ec94d0387196 --path /var/lib/ceph/osd/ceph-1 --no-mon-config 2024-04-06T05:46:41.954830230Z 2024-04-06 05:46:41.954808 I | exec: Running command: /usr/bin/ln -snf /dev/ceph-f557e476-7bd4-41a0-9323-d6061a4318b3/osd-block-7f80e2ac-e21f-4aa6-8886-ec94d0387196 /var/lib/ceph/osd/ceph-1/block 2024-04-06T05:46:41.957864812Z 2024-04-06 05:46:41.957851 I | exec: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block 2024-04-06T05:46:41.961270909Z 2024-04-06 05:46:41.961255 I | exec: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-5 2024-04-06T05:46:41.964681164Z 2024-04-06 05:46:41.964667 I | exec: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1 2024-04-06T05:46:41.967586406Z 2024-04-06 05:46:41.967574 I | exec: --> ceph-volume lvm activate successful for osd ID: 1 2024-04-06T05:46:42.029385070Z 2024-04-06 05:46:42.028473 I | exec: debug 2024-04-06T05:46:42.027+0000 7fa35830c5c0 0 set uid:gid to 167:167 (ceph:ceph) 2024-04-06T05:46:42.029462802Z 2024-04-06 05:46:42.029394 I | exec: debug 2024-04-06T05:46:42.027+0000 7fa35830c5c0 0 ceph version 17.2.6-196.el9cp (cbbf2cfb549196ca18c0c9caff9124d83ed681a4) quincy (stable), process ceph-osd, pid 133 2024-04-06T05:46:42.029462802Z 2024-04-06 05:46:42.029437 I | exec: debug 2024-04-06T05:46:42.027+0000 7fa35830c5c0 0 pidfile_write: ignore empty --pid-file 2024-04-06T05:46:42.029899768Z 2024-04-06 05:46:42.029860 I | exec: debug 2024-04-06T05:46:42.029+0000 7fa35830c5c0 1 bdev(0x55a4d1b87c00 /var/lib/ceph/osd/ceph-1/block) open path /var/lib/ceph/osd/ceph-1/block 2024-04-06T05:46:42.029959756Z 2024-04-06 05:46:42.029947 I | exec: debug 2024-04-06T05:46:42.029+0000 7fa35830c5c0 0 bdev(0x55a4d1b87c00 /var/lib/ceph/osd/ceph-1/block) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-1/block failed: (22) Invalid argument 2024-04-06T05:46:42.030424427Z 2024-04-06 05:46:42.030409 I | exec: debug 2024-04-06T05:46:42.029+0000 7fa35830c5c0 1 bdev(0x55a4d1b87c00 /var/lib/ceph/osd/ceph-1/block) open size 1920378863616 (0x1bf1f800000, 1.7 TiB) block_size 4096 (4 KiB) non-rotational discard supported 2024-04-06T05:46:42.030649989Z 2024-04-06 05:46:42.030627 I | exec: debug 2024-04-06T05:46:42.029+0000 7fa35830c5c0 1 bluestore(/var/lib/ceph/osd/ceph-1) _set_cache_sizes cache_size 3221225472 meta 0.45 kv 0.45 data 0.06 2024-04-06T05:46:42.030665356Z 2024-04-06 05:46:42.030652 I | exec: debug 2024-04-06T05:46:42.030+0000 7fa35830c5c0 1 bdev(0x55a4d1b87400 /var/lib/ceph/osd/ceph-1/block) open path /var/lib/ceph/osd/ceph-1/block 2024-04-06T05:46:42.030775141Z 2024-04-06 05:46:42.030763 I | exec: debug 2024-04-06T05:46:42.030+0000 7fa35830c5c0 0 bdev(0x55a4d1b87400 /var/lib/ceph/osd/ceph-1/block) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-1/block failed: (22) Invalid argument So we need to find out why "rook ceph osd init" was failing to mount the OSD data dir. @Travis Any thoughts on "rook ceph osd init" failure ? --- Additional comment from Prashant Dhange on 2024-04-09 23:29:39 UTC --- (In reply to Prashant Dhange from comment #34) ... > > So we need to find out why "rook ceph osd init" was failing to mount the OSD > data dir. > > @Travis Any thoughts on "rook ceph osd init" failure ? https://github.com/rook/rook/commit/33e824a323291de1a261b70e9bd255d5049ee02b commit likely caused this issue as we have removed the fsid, username configs from the env vars. --- Additional comment from Travis Nielsen on 2024-04-10 02:13:31 UTC --- (In reply to Prashant Dhange from comment #35) > (In reply to Prashant Dhange from comment #34) > ... > > > > So we need to find out why "rook ceph osd init" was failing to mount the OSD > > data dir. > > > > @Travis Any thoughts on "rook ceph osd init" failure ? > https://github.com/rook/rook/commit/33e824a323291de1a261b70e9bd255d5049ee02b > commit likely caused this issue as we have removed the fsid, username > configs from the env vars. That commit was also backported all the way to 4.10 [1], so this change was not new in 4.14. The error about the ceph-username parameter missing must be ignored despite the error in the init container. It would be really helpful if we can repro this, first looking at the OSD spec and logs in 4.13, and then upgrading to 4.14 to see what changed in the OSD spec. I suspect if the "osd init" container fails, the ceph.conf would not be present and cause the bluefs expand container to fail. But I am confused why the "osd init" container failure did not abort starting the OSD in the first place. Init containers are not supposed to continue to the next one if they fail. I still need to dig more, but in the meantime the repro would help. [1] https://github.com/red-hat-storage/rook/commit/673c331a072a9de41ab2aac5405600104bd44ef2 --- Additional comment from Travis Nielsen on 2024-04-19 22:21:59 UTC --- Thus far we have not been able to repro: 1. QE is not able to install OCS 4.3 given how much the QE infrastructure has changed since that release ~4 years ago. 2. OCS/ODF only created these types of affected OSDs in 4.3. Since then, all OSDs are created in a different mode that is unaffected (raw mode). 3. Rook upstream does not have an option for a long time either to create these affected types of OSDs. The lowest risk approach to avoid other customers hitting this issue if they have also upgraded since OCS 4.3 is to remove this expand init container from these types of OSDs. The only downside is that these legacy OSDs won't be resizable. Anyway, that's a rare operation, and in can be remedied by wiping and replacing these legacy OSDs. Removing the resize container in this case is very simple. Moving to POST with this fix. I recommend backporting this to 4.14. We also need to consider raising an alert for users to replace these OSDs when detected in the cluster.
Please backport the fix to ODF-4.14 and update the RDT flag/text appropriately.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.14.10 Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:6398