Description of problem (please be detailed as possible and provide log snippests): Unable to perform Scale up using localVolumeSet on IBM Power Platform Version of all relevant components (if applicable): OCP: 4.12.0 ODF: 4.12.0 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy ODF4.12.0 on OCP4.12.0 using local volume set. 2. Add new disk on all the 3 worker nodes in OCP Cluster 3. Perform Scale up on Storage System by adding capacity. Actual results: new osd pods didn't come up. Expected results: 3 new osd pods should be created. Additional info:
must-gather logs: must-gather logs: https://drive.google.com/file/d/1DTqiBCkGEinXfaqgB63WPLPcUD6SCSgx/view?usp=sharing
[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc get pods NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-5c875c5f5f-fq44d 2/2 Running 0 3h15m csi-cephfsplugin-552w6 2/2 Running 0 3h13m csi-cephfsplugin-bglct 2/2 Running 0 3h13m csi-cephfsplugin-dw8rx 2/2 Running 0 3h13m csi-cephfsplugin-provisioner-fb588677b-gjd2w 5/5 Running 0 3h13m csi-cephfsplugin-provisioner-fb588677b-lbzz4 5/5 Running 0 3h13m csi-rbdplugin-p46m9 3/3 Running 0 3h13m csi-rbdplugin-provisioner-858fd9c5c7-78xpl 6/6 Running 0 3h13m csi-rbdplugin-provisioner-858fd9c5c7-g2nc8 6/6 Running 0 3h13m csi-rbdplugin-vzgz4 3/3 Running 0 3h13m csi-rbdplugin-zc5bk 3/3 Running 0 3h13m noobaa-core-0 1/1 Running 0 3h10m noobaa-db-pg-0 1/1 Running 0 3h10m noobaa-endpoint-fcbf85d48-4wgd4 1/1 Running 0 3h8m noobaa-operator-665d987554-hwwpz 1/1 Running 0 3h15m ocs-metrics-exporter-8585b9bd9b-897l4 1/1 Running 0 3h15m ocs-operator-64c4b7bb44-jtszn 1/1 Running 0 3h15m odf-console-669bd79499-xbxml 1/1 Running 0 3h15m odf-operator-controller-manager-bd47859c-bpl5c 2/2 Running 0 3h15m rook-ceph-crashcollector-worker-0-568c85f64d-hpt2l 1/1 Running 0 3h11m rook-ceph-crashcollector-worker-1-5448fdc6b5-f49t9 1/1 Running 0 3h12m rook-ceph-crashcollector-worker-2-59d789995d-jg47h 1/1 Running 0 3h11m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-794fd4f8zx87r 2/2 Running 0 3h11m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5546cfc5sqjwj 2/2 Running 0 3h11m rook-ceph-mgr-a-596c5865bf-vsdpl 2/2 Running 0 3h12m rook-ceph-mon-a-8ccb4848c-dk4ht 2/2 Running 0 3h13m rook-ceph-mon-b-7c6fff665-2z5km 2/2 Running 0 3h12m rook-ceph-mon-c-55b9c49b6c-k25sb 2/2 Running 0 3h12m rook-ceph-operator-7896488fc-nprlt 1/1 Running 0 3h15m rook-ceph-osd-0-79f9bfd78d-glt96 2/2 Running 0 3h11m rook-ceph-osd-1-d66768c5b-mb8lh 2/2 Running 0 3h11m rook-ceph-osd-2-6848569fcb-x4p8g 2/2 Running 0 3h11m rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf-f7222 0/1 Completed 0 7m44s rook-ceph-osd-prepare-4d974aae61e20c6b2523d76584c59451-vpkdr 0/1 Completed 0 3h11m rook-ceph-osd-prepare-509e05e67fd495f59c66466cf91f3e92-dgcrb 0/1 Completed 0 3h11m rook-ceph-osd-prepare-7adc6fdb0aabc7fd2ebed78ca909e7db-5ldmb 0/1 Completed 0 3h11m rook-ceph-osd-prepare-c4b3ca14f4584de2d088d590fd0f4345-8cn5z 0/1 Completed 0 7m50s rook-ceph-osd-prepare-c8af26daa15306c2084654732615b22c-kzv6f 0/1 Completed 0 7m47s rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-6cb859bxs56r 2/2 Running 0 3h11m rook-ceph-tools-8b6fbc449-6k48c 1/1 Running 0 3h10m [root@rdr-cicd-odf-0d4b-bastion-0 ~]# [root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc describe pod rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf-f7222 Name: rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf-f7222 Namespace: openshift-storage Priority: 2000001000 Priority Class Name: system-node-critical Service Account: rook-ceph-osd Node: worker-0/9.47.90.209 Start Time: Thu, 10 Nov 2022 03:59:01 -0500 Labels: app=rook-ceph-osd-prepare ceph.rook.io/DeviceSet=ocs-deviceset-localblock-0 ceph.rook.io/pvc=ocs-deviceset-localblock-0-data-542m54 controller-uid=c9c3ab5c-d597-48f3-9989-2d2b8cf2162a job-name=rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf rook_cluster=openshift-storage Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.38" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.38" ], "default": true, "dns": {} }] openshift.io/scc: rook-ceph Status: Succeeded IP: 10.131.0.38 IPs: IP: 10.131.0.38 Controlled By: Job/rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf Init Containers: copy-bins: Container ID: cri-o://eb8b9c79b7b196e5d510738d15e7272df6ff6d6176a9cc31614a1febfa8510b2 Image: quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:58f3a10e38232e24a408b08e9a6babfa7ccf7e9d06dcd207f28dfe2301d82374 Image ID: quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:58f3a10e38232e24a408b08e9a6babfa7ccf7e9d06dcd207f28dfe2301d82374 Port: <none> Host Port: <none> Args: copy-binaries --copy-to-dir /rook State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 10 Nov 2022 03:59:03 -0500 Finished: Thu, 10 Nov 2022 03:59:03 -0500 Ready: True Restart Count: 0 Environment: <none> Mounts: /rook from rook-binaries (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qqg7b (ro) blkdevmapper: Container ID: cri-o://45a7fc393ad23e6159d92f0cd12b371570829138c79492c328b043461f917100 Image: quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7 Image ID: quay.io/rhceph-dev/rhceph@sha256:8e8a2a243ceb4275a9ab383025714b62e8b03b6ac0f98f89c55a91086c287192 Port: <none> Host Port: <none> Command: /bin/bash -c set -xe PVC_SOURCE=/ocs-deviceset-localblock-0-data-542m54 PVC_DEST=/mnt/ocs-deviceset-localblock-0-data-542m54 CP_ARGS=(--archive --dereference --verbose) if [ -b "$PVC_DEST" ]; then PVC_SOURCE_MAJ_MIN=$(stat --format '%t%T' $PVC_SOURCE) PVC_DEST_MAJ_MIN=$(stat --format '%t%T' $PVC_DEST) if [[ "$PVC_SOURCE_MAJ_MIN" == "$PVC_DEST_MAJ_MIN" ]]; then echo "PVC $PVC_DEST already exists and has the same major and minor as $PVC_SOURCE: "$PVC_SOURCE_MAJ_MIN"" exit 0 else echo "PVC's source major/minor numbers changed" CP_ARGS+=(--remove-destination) fi fi cp "${CP_ARGS[@]}" "$PVC_SOURCE" "$PVC_DEST" State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 10 Nov 2022 03:59:05 -0500 Finished: Thu, 10 Nov 2022 03:59:05 -0500 Ready: True Restart Count: 0 Environment: <none> Mounts: /mnt from ocs-deviceset-localblock-0-data-542m54-bridge (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qqg7b (ro) Devices: /ocs-deviceset-localblock-0-data-542m54 from ocs-deviceset-localblock-0-data-542m54 Containers: provision: Container ID: cri-o://d8500656c63cca5fb4fd9f2eea76c488f9f45230d86bdfd397584da2c8c26715 Image: quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7 Image ID: quay.io/rhceph-dev/rhceph@sha256:8e8a2a243ceb4275a9ab383025714b62e8b03b6ac0f98f89c55a91086c287192 Port: <none> Host Port: <none> Command: /rook/rook Args: ceph osd provision State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 10 Nov 2022 03:59:06 -0500 Finished: Thu, 10 Nov 2022 03:59:08 -0500 Ready: False Restart Count: 0 Environment Variables from: rook-ceph-osd-env-override ConfigMap Optional: true Environment: ROOK_NODE_NAME: ocs-deviceset-localblock-0-data-542m54 ROOK_CLUSTER_ID: 29755a55-9b45-49e3-8851-0366ce52ca04 ROOK_CLUSTER_NAME: ocs-storagecluster-cephcluster ROOK_PRIVATE_IP: (v1:status.podIP) ROOK_PUBLIC_IP: (v1:status.podIP) POD_NAMESPACE: openshift-storage ROOK_MON_ENDPOINTS: <set to the key 'data' of config map 'rook-ceph-mon-endpoints'> Optional: false ROOK_MON_SECRET: <set to the key 'mon-secret' in secret 'rook-ceph-mon'> Optional: false ROOK_CEPH_USERNAME: <set to the key 'ceph-username' in secret 'rook-ceph-mon'> Optional: false ROOK_CEPH_SECRET: <set to the key 'ceph-secret' in secret 'rook-ceph-mon'> Optional: false ROOK_CONFIG_DIR: /var/lib/rook ROOK_CEPH_CONFIG_OVERRIDE: /etc/rook/config/override.conf ROOK_FSID: <set to the key 'fsid' in secret 'rook-ceph-mon'> Optional: false NODE_NAME: (v1:spec.nodeName) ROOK_CRUSHMAP_ROOT: default ROOK_CRUSHMAP_HOSTNAME: CEPH_VOLUME_DEBUG: 1 CEPH_VOLUME_SKIP_RESTORECON: 1 DM_DISABLE_UDEV: 1 ROOK_LOG_LEVEL: DEBUG ROOK_CEPH_VERSION: ceph version 16.2.10-50 pacific ROOK_OSD_CRUSH_DEVICE_CLASS: ROOK_OSD_CRUSH_INITIAL_WEIGHT: ROOK_DATA_DEVICES: [{"id":"/mnt/ocs-deviceset-localblock-0-data-542m54","storeConfig":{"osdsPerDevice":1}}] ROOK_PVC_BACKED_OSD: true ROOK_ENCRYPTED_DEVICE: false ROOK_PVC_NAME: ocs-deviceset-localblock-0-data-542m54 Mounts: /dev from devices (rw) /etc/ceph from ceph-conf-emptydir (rw) /mnt from ocs-deviceset-localblock-0-data-542m54-bridge (rw) /rook from rook-binaries (rw) /run/udev from udev (rw) /var/lib/ceph/crash from rook-ceph-crash (rw) /var/lib/rook from rook-data (rw) /var/log/ceph from rook-ceph-log (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qqg7b (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: rook-data: Type: HostPath (bare host directory volume) Path: /var/lib/rook HostPathType: ceph-conf-emptydir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> rook-ceph-log: Type: HostPath (bare host directory volume) Path: /var/lib/rook/openshift-storage/log HostPathType: rook-ceph-crash: Type: HostPath (bare host directory volume) Path: /var/lib/rook/openshift-storage/crash HostPathType: rook-binaries: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> devices: Type: HostPath (bare host directory volume) Path: /dev HostPathType: udev: Type: HostPath (bare host directory volume) Path: /run/udev HostPathType: ocs-deviceset-localblock-0-data-542m54: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: ocs-deviceset-localblock-0-data-542m54 ReadOnly: false ocs-deviceset-localblock-0-data-542m54-bridge: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: <unset> kube-api-access-qqg7b: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s node.ocs.openshift.io/storage=true:NoSchedule node.ocs.openshift.io/storage=true:NoSchedule Topology Spread Constraints: kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector ceph.rook.io/pvc Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 8m12s default-scheduler Successfully assigned openshift-storage/rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf-f7222 to worker-0 Normal SuccessfulMountVolume 8m12s kubelet MapVolume.MapPodDevice succeeded for volume "local-pv-c988e86f" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-c988e86f" Normal SuccessfulMountVolume 8m12s kubelet MapVolume.MapPodDevice succeeded for volume "local-pv-c988e86f" volumeMapPath "/var/lib/kubelet/pods/8ebc64fb-ea27-4382-a078-0bdf21fb912c/volumeDevices/kubernetes.io~local-volume" Normal AddedInterface 8m11s multus Add eth0 [10.131.0.38/23] from openshift-sdn Normal Pulled 8m10s kubelet Container image "quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:58f3a10e38232e24a408b08e9a6babfa7ccf7e9d06dcd207f28dfe2301d82374" already present on machine Normal Created 8m10s kubelet Created container copy-bins Normal Started 8m10s kubelet Started container copy-bins Normal Pulled 8m9s kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7" already present on machine Normal Created 8m8s kubelet Created container blkdevmapper Normal Started 8m8s kubelet Started container blkdevmapper Normal Pulled 8m8s kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7" already present on machine Normal Created 8m7s kubelet Created container provision Normal Started 8m7s kubelet Started container provision [root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc logs -f pod/rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf-f7222 Defaulted container "provision" out of: provision, copy-bins (init), blkdevmapper (init) 2022-11-10 08:59:06.220369 I | cephcmd: desired devices to configure osds: [{Name:/mnt/ocs-deviceset-localblock-0-data-542m54 OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}] 2022-11-10 08:59:06.221113 I | rookcmd: starting Rook v4.12.0-0.6d9b42e8640617ef19eb68feb636bb23c283ab00 with arguments '/rook/rook ceph osd provision' 2022-11-10 08:59:06.221121 I | rookcmd: flag values: --cluster-id=29755a55-9b45-49e3-8851-0366ce52ca04, --cluster-name=ocs-storagecluster-cephcluster, --data-device-filter=, --data-device-path-filter=, --data-devices=[{"id":"/mnt/ocs-deviceset-localblock-0-data-542m54","storeConfig":{"osdsPerDevice":1}}], --encrypted-device=false, --force-format=false, --help=false, --location=, --log-level=DEBUG, --metadata-device=, --node-name=ocs-deviceset-localblock-0-data-542m54, --operator-image=, --osd-crush-device-class=, --osd-crush-initial-weight=, --osd-database-size=0, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=true, --service-account= 2022-11-10 08:59:06.221126 I | op-mon: parsing mon endpoints: b=172.30.252.124:6789,c=172.30.39.41:6789,a=172.30.215.72:6789 2022-11-10 08:59:06.231515 I | op-osd: CRUSH location=root=default host=worker-0 2022-11-10 08:59:06.231532 I | cephcmd: crush location of osd: root=default host=worker-0 2022-11-10 08:59:06.231547 D | exec: Running command: dmsetup version 2022-11-10 08:59:06.235093 I | cephosd: Library version: 1.02.181-RHEL8 (2021-10-20) Driver version: 4.43.0 2022-11-10 08:59:06.246361 I | cephclient: writing config file /var/lib/rook/openshift-storage/openshift-storage.config 2022-11-10 08:59:06.246523 I | cephclient: generated admin config in /var/lib/rook/openshift-storage 2022-11-10 08:59:06.246713 D | cephclient: config file @ /etc/ceph/ceph.conf: [global] fsid = 3b5897f8-8705-48d3-b59d-5c023ec65c7e mon initial members = b c a mon host = [v2:172.30.252.124:3300,v1:172.30.252.124:6789],[v2:172.30.39.41:3300,v1:172.30.39.41:6789],[v2:172.30.215.72:3300,v1:172.30.215.72:6789] bdev_flock_retry = 20 mon_osd_full_ratio = .85 mon_osd_backfillfull_ratio = .8 mon_osd_nearfull_ratio = .75 mon_max_pg_per_osd = 600 mon_pg_warn_max_object_skew = 0 mon_data_avail_warn = 15 [osd] osd_memory_target_cgroup_limit_ratio = 0.8 [client.admin] keyring = /var/lib/rook/openshift-storage/client.admin.keyring 2022-11-10 08:59:06.246722 I | cephosd: discovering hardware 2022-11-10 08:59:06.246734 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-542m54 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE 2022-11-10 08:59:06.250587 D | sys: lsblk output: "SIZE=\"536870912000\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sdd\" KNAME=\"/dev/sdd\" MOUNTPOINT=\"\" FSTYPE=\"\"" 2022-11-10 08:59:06.250658 D | exec: Running command: sgdisk --print /mnt/ocs-deviceset-localblock-0-data-542m54 2022-11-10 08:59:06.253226 D | exec: Running command: udevadm info --query=property /dev/sdd 2022-11-10 08:59:06.259736 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-id/wwn-0x6005076d0281005ef000000000028d24 /dev/disk/by-id/scsi-36005076d0281005ef000000000028d24 /dev/disk/by-id/scsi-SAIX_VDASD_332136005076D0281005EF000000000028D2404214503IBMfcp\nDEVNAME=/dev/sdd\nDEVPATH=/devices/vio/30000002/host0/target0:0:5/0:0:5:0/block/sdd\nDEVTYPE=disk\nID_BUS=scsi\nID_MODEL=VDASD\nID_MODEL_ENC=VDASD\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\nID_REVISION=0001\nID_SCSI=1\nID_SCSI_INQUIRY=1\nID_SCSI_SERIAL=332136005076D0281005EF000000000028D2404214503IBMfcp\nID_SERIAL=36005076d0281005ef000000000028d24\nID_SERIAL_SHORT=6005076d0281005ef000000000028d24\nID_TARGET_PORT=0\nID_TYPE=disk\nID_VENDOR=AIX\nID_VENDOR_ENC=AIX\\x20\\x20\\x20\\x20\\x20\nID_WWN=0x6005076d0281005e\nID_WWN_VENDOR_EXTENSION=0xf000000000028d24\nID_WWN_WITH_EXTENSION=0x6005076d0281005ef000000000028d24\nMAJOR=8\nMINOR=48\nSCSI_IDENT_LUN_NAA_REGEXT=6005076d0281005ef000000000028d24\nSCSI_IDENT_PORT_RELATIVE=2177\nSCSI_IDENT_PORT_TARGET_PORT_GROUP=0x0\nSCSI_IDENT_SERIAL=332136005076D0281005EF000000000028D2404214503IBMfcp\nSCSI_MODEL=VDASD\nSCSI_MODEL_ENC=VDASD\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\nSCSI_REVISION=0001\nSCSI_TPGS=0\nSCSI_TYPE=disk\nSCSI_VENDOR=AIX\nSCSI_VENDOR_ENC=AIX\\x20\\x20\\x20\\x20\\x20\nSUBSYSTEM=block\nTAGS=:systemd:\nUSEC_INITIALIZED=13435165299" 2022-11-10 08:59:06.259784 I | cephosd: creating and starting the osds 2022-11-10 08:59:06.259809 D | cephosd: desiredDevices are [{Name:/mnt/ocs-deviceset-localblock-0-data-542m54 OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}] 2022-11-10 08:59:06.259814 D | cephosd: context.Devices are: 2022-11-10 08:59:06.259850 D | cephosd: &{Name:/mnt/ocs-deviceset-localblock-0-data-542m54 Parent: HasChildren:false DevLinks:/dev/disk/by-id/wwn-0x6005076d0281005ef000000000028d24 /dev/disk/by-id/scsi-36005076d0281005ef000000000028d24 /dev/disk/by-id/scsi-SAIX_VDASD_332136005076D0281005EF000000000028D2404214503IBMfcp Size:536870912000 UUID:0d93e4da-5df7-4f73-b33b-18dafeefe024 Serial:36005076d0281005ef000000000028d24 Type:data Rotational:true Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor:AIX Model:VDASD WWN:0x6005076d0281005e WWNVendorExtension:0x6005076d0281005ef000000000028d24 Empty:false CephVolumeData: RealPath:/dev/sdd KernelName:sdd Encrypted:false} 2022-11-10 08:59:06.259860 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here 2022-11-10 08:59:06.259903 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-542m54", detected an existing OSD. UUID=1bc2aa4b-b3a5-4c85-b663-b1b363d73264 2022-11-10 08:59:06.267719 I | cephosd: configuring osd devices: {"Entries":{}} 2022-11-10 08:59:06.267746 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume. 2022-11-10 08:59:06.267762 D | exec: Running command: pvdisplay -C -o lvpath --noheadings /mnt/ocs-deviceset-localblock-0-data-542m54 2022-11-10 08:59:06.844247 W | cephosd: failed to retrieve logical volume path for "/mnt/ocs-deviceset-localblock-0-data-542m54". exit status 5 2022-11-10 08:59:06.844332 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-542m54 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE 2022-11-10 08:59:06.848115 D | sys: lsblk output: "SIZE=\"536870912000\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sdd\" KNAME=\"/dev/sdd\" MOUNTPOINT=\"\" FSTYPE=\"\"" 2022-11-10 08:59:06.848390 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list --format json 2022-11-10 08:59:07.641820 D | cephosd: {} 2022-11-10 08:59:07.641891 I | cephosd: 0 ceph-volume lvm osd devices configured on this node 2022-11-10 08:59:07.641906 D | exec: Running command: cryptsetup luksDump /mnt/ocs-deviceset-localblock-0-data-542m54 2022-11-10 08:59:07.654420 E | cephosd: failed to determine if the encrypted block "/mnt/ocs-deviceset-localblock-0-data-542m54" is from our cluster. failed to dump LUKS header for disk "/mnt/ocs-deviceset-localblock-0-data-542m54". Device /mnt/ocs-deviceset-localblock-0-data-542m54 is not a valid LUKS device.: exit status 1 2022-11-10 08:59:07.654457 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/ocs-deviceset-localblock-0-data-542m54 --format json 2022-11-10 08:59:07.994847 D | cephosd: {} 2022-11-10 08:59:07.994910 I | cephosd: 0 ceph-volume raw osd devices configured on this node 2022-11-10 08:59:07.994925 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "ocs-deviceset-localblock-0-data-542m54" [root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-183393b6 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-3kjw2h localblock 28m local-pv-3babaca9 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-4zkhcz localblock 28m local-pv-bec7af9c 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-26s9vd localblock 3h23m local-pv-c988e86f 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-542m54 localblock 29m local-pv-d0b1dff6 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-1tpfs6 localblock 3h23m local-pv-f6a955b1 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-0x6z2b localblock 3h23m pvc-155f125c-8ca8-4742-b5a3-e79760898a50 40Gi RWO Delete Bound openshift-monitoring/my-prometheus-claim-prometheus-k8s-0 ocs-storagecluster-ceph-rbd 3h16m pvc-5f47b390-5788-42f0-927f-a07b7f5fe4d2 50Gi RWO Delete Bound openshift-storage/db-noobaa-db-pg-0 ocs-storagecluster-ceph-rbd 3h17m pvc-6f6bf1ea-e61d-4d45-bab8-79f957a959fe 40Gi RWO Delete Bound openshift-monitoring/my-alertmanager-claim-alertmanager-main-1 ocs-storagecluster-ceph-rbd 3h16m pvc-8ed35488-1029-403e-8f4e-f0573d38b524 40Gi RWO Delete Bound openshift-monitoring/my-alertmanager-claim-alertmanager-main-0 ocs-storagecluster-ceph-rbd 3h16m pvc-95f979ff-18c2-4044-9b01-3a5c11a6dbf8 100Gi RWX Delete Bound openshift-image-registry/registry-cephfs-rwx-pvc ocs-storagecluster-cephfs 3h16m pvc-d7d2e0fc-0fa7-410f-9223-0001209c6ee7 40Gi RWO Delete Bound openshift-monitoring/my-prometheus-claim-prometheus-k8s-1 ocs-storagecluster-ceph-rbd 3h16m [root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-pg-0 Bound pvc-5f47b390-5788-42f0-927f-a07b7f5fe4d2 50Gi RWO ocs-storagecluster-ceph-rbd 3h17m ocs-deviceset-localblock-0-data-0x6z2b Bound local-pv-f6a955b1 500Gi RWO localblock 3h18m ocs-deviceset-localblock-0-data-1tpfs6 Bound local-pv-d0b1dff6 500Gi RWO localblock 3h18m ocs-deviceset-localblock-0-data-26s9vd Bound local-pv-bec7af9c 500Gi RWO localblock 3h18m ocs-deviceset-localblock-0-data-3kjw2h Bound local-pv-183393b6 500Gi RWO localblock 15m ocs-deviceset-localblock-0-data-4zkhcz Bound local-pv-3babaca9 500Gi RWO localblock 15m ocs-deviceset-localblock-0-data-542m54 Bound local-pv-c988e86f 500Gi RWO localblock 15m
[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc get nodes NAME STATUS ROLES AGE VERSION master-0 Ready control-plane,master 4d4h v1.25.2+93b33ea master-1 Ready control-plane,master 4d4h v1.25.2+93b33ea master-2 Ready control-plane,master 4d4h v1.25.2+93b33ea worker-0 Ready worker 4d4h v1.25.2+93b33ea worker-1 Ready worker 4d4h v1.25.2+93b33ea worker-2 Ready worker 4d4h v1.25.2+93b33ea [root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc debug node/worker-0 Warning: would violate PodSecurity "baseline:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true) Starting pod/worker-0-debug ... To use host binaries, run `chroot /host` Pod IP: 9.47.90.209 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# sh-4.4# lsblk |grep 500 loop1 7:1 0 500G 0 loop sdb 8:16 0 500G 0 disk sdd 8:48 0 500G 0 disk [root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc debug node/worker-1 Warning: would violate PodSecurity "baseline:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true) Starting pod/worker-1-debug ... To use host binaries, run `chroot /host` Pod IP: 9.47.90.190 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# chroot /host sh-4.4# sh-4.4# lsblk |grep 500 loop1 7:1 0 500G 0 loop sdb 8:16 0 500G 0 disk sdd 8:48 0 500G 0 disk sh-4.4# exit sh-4.4# exit Removing debug pod ... [root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc debug node/worker-2 Warning: would violate PodSecurity "baseline:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true) Starting pod/worker-2-debug ... To use host binaries, run `chroot /host` Pod IP: 9.47.90.165 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# lsblk |grep 500 loop1 7:1 0 500G 0 loop sdb 8:16 0 500G 0 disk sdd 8:48 0 500G 0 disk sh-4.4# exit exit sh-4.4# exit Removing debug pod ... [root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc get localvolumeset -n openshift-local-storage NAME STORAGECLASS PROVISIONED AGE localblock localblock 6 4d3h [root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc describe localvolumeset localblock -n openshift-local-storage Name: localblock Namespace: openshift-local-storage Labels: <none> Annotations: <none> API Version: local.storage.openshift.io/v1alpha1 Kind: LocalVolumeSet Metadata: Creation Timestamp: 2022-11-10T05:49:06Z Finalizers: storage.openshift.com/local-volume-protection Generation: 1 Managed Fields: API Version: local.storage.openshift.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:spec: .: f:deviceInclusionSpec: .: f:deviceMechanicalProperties: f:deviceTypes: f:minSize: f:nodeSelector: .: f:nodeSelectorTerms: f:storageClassName: f:volumeMode: Manager: kubectl-create Operation: Update Time: 2022-11-10T05:49:06Z API Version: local.storage.openshift.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"storage.openshift.com/local-volume-protection": Manager: local-storage-operator Operation: Update Time: 2022-11-10T05:49:06Z API Version: local.storage.openshift.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:conditions: f:observedGeneration: f:totalProvisionedDeviceCount: Manager: local-storage-operator Operation: Update Subresource: status Time: 2022-11-10T08:45:05Z Resource Version: 261137 UID: 16584bf9-0e8b-40b1-9665-2dd083b1ac11 Spec: Device Inclusion Spec: Device Mechanical Properties: NonRotational Rotational Device Types: disk part Min Size: 100Gi Node Selector: Node Selector Terms: Match Expressions: Key: kubernetes.io/hostname Operator: In Values: worker-0 worker-1 worker-2 Storage Class Name: localblock Volume Mode: Block Status: Conditions: Last Transition Time: 2022-11-10T05:49:06Z Message: DiskMaker: Available Status: True Type: DaemonSetsAvailable Last Transition Time: 2022-11-10T05:49:06Z Message: Operator reconciled successfully. Status: True Type: Available Observed Generation: 1 Total Provisioned Device Count: 6 Events: <none> [root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc describe storagecluster ocs-storagecluster Name: ocs-storagecluster Namespace: openshift-storage Labels: <none> Annotations: cluster.ocs.openshift.io/local-devices: true uninstall.ocs.openshift.io/cleanup-policy: delete uninstall.ocs.openshift.io/mode: graceful API Version: ocs.openshift.io/v1 Kind: StorageCluster Metadata: Creation Timestamp: 2022-11-10T05:52:53Z Finalizers: storagecluster.ocs.openshift.io Generation: 3 Managed Fields: API Version: ocs.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:cluster.ocs.openshift.io/local-devices: f:spec: .: f:flexibleScaling: f:monDataDirHostPath: Manager: kubectl-create Operation: Update Time: 2022-11-10T05:52:53Z API Version: ocs.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:ownerReferences: .: k:{"uid":"ef253c10-2986-4cd4-a76f-35514c225041"}: Manager: manager Operation: Update Time: 2022-11-10T05:52:54Z API Version: ocs.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: f:uninstall.ocs.openshift.io/cleanup-policy: f:uninstall.ocs.openshift.io/mode: f:finalizers: .: v:"storagecluster.ocs.openshift.io": f:spec: f:arbiter: f:encryption: .: f:kms: f:externalStorage: f:managedResources: .: f:cephBlockPools: f:cephCluster: f:cephConfig: f:cephDashboard: f:cephFilesystems: f:cephNonResilientPools: f:cephObjectStoreUsers: f:cephObjectStores: f:cephToolbox: f:mirroring: Manager: ocs-operator Operation: Update Time: 2022-11-10T05:52:54Z API Version: ocs.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:spec: f:storageDeviceSets: Manager: kubectl-edit Operation: Update Time: 2022-11-10T08:57:47Z API Version: ocs.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:conditions: f:externalStorage: .: f:grantedCapacity: f:failureDomain: f:failureDomainKey: f:failureDomainValues: f:images: .: f:ceph: .: f:actualImage: f:desiredImage: f:noobaaCore: .: f:actualImage: f:desiredImage: f:noobaaDB: .: f:actualImage: f:desiredImage: f:kmsServerConnection: f:nodeTopologies: .: f:labels: .: f:kubernetes.io/hostname: f:phase: f:relatedObjects: f:version: Manager: ocs-operator Operation: Update Subresource: status Time: 2022-11-14T09:10:30Z Owner References: API Version: odf.openshift.io/v1alpha1 Kind: StorageSystem Name: ocs-storagecluster-storagesystem UID: ef253c10-2986-4cd4-a76f-35514c225041 Resource Version: 7088995 UID: b06ab659-eb9f-40b3-ae11-48edbb01c95f Spec: Arbiter: Encryption: Kms: External Storage: Flexible Scaling: true Managed Resources: Ceph Block Pools: Ceph Cluster: Ceph Config: Ceph Dashboard: Ceph Filesystems: Ceph Non Resilient Pools: Ceph Object Store Users: Ceph Object Stores: Ceph Toolbox: Mirroring: Mon Data Dir Host Path: /var/lib/rook Storage Device Sets: Config: Count: 6 Data PVC Template: Metadata: Spec: Access Modes: ReadWriteOnce Resources: Requests: Storage: 100Gi Storage Class Name: localblock Volume Mode: Block Status: Name: ocs-deviceset-localblock Placement: Prepare Placement: Replica: 1 Resources: Status: Conditions: Last Heartbeat Time: 2022-11-14T09:10:30Z Last Transition Time: 2022-11-14T06:38:36Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: True Type: ReconcileComplete Last Heartbeat Time: 2022-11-14T09:10:30Z Last Transition Time: 2022-11-10T08:58:53Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: True Type: Available Last Heartbeat Time: 2022-11-14T09:10:30Z Last Transition Time: 2022-11-10T08:59:09Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: False Type: Progressing Last Heartbeat Time: 2022-11-14T09:10:30Z Last Transition Time: 2022-11-10T08:58:53Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: False Type: Degraded Last Heartbeat Time: 2022-11-14T09:10:30Z Last Transition Time: 2022-11-10T08:59:09Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: True Type: Upgradeable External Storage: Granted Capacity: 0 Failure Domain: host Failure Domain Key: kubernetes.io/hostname Failure Domain Values: worker-0 worker-1 worker-2 Images: Ceph: Actual Image: quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7 Desired Image: quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7 Noobaa Core: Actual Image: quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:ee1bc56dc3cf3b7f0136184668700caca835712f3252bb79c6c745e772850e25 Desired Image: quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:ee1bc56dc3cf3b7f0136184668700caca835712f3252bb79c6c745e772850e25 Noobaa DB: Actual Image: quay.io/rhceph-dev/rhel8-postgresql-12@sha256:aa65868b9684f7715214f5f3fac3139245c212019cc17742f237965a7508222d Desired Image: quay.io/rhceph-dev/rhel8-postgresql-12@sha256:aa65868b9684f7715214f5f3fac3139245c212019cc17742f237965a7508222d Kms Server Connection: Node Topologies: Labels: kubernetes.io/hostname: worker-0 worker-1 worker-2 Phase: Ready Related Objects: API Version: ceph.rook.io/v1 Kind: CephCluster Name: ocs-storagecluster-cephcluster Namespace: openshift-storage Resource Version: 7088190 UID: 29755a55-9b45-49e3-8851-0366ce52ca04 API Version: noobaa.io/v1alpha1 Kind: NooBaa Name: noobaa Namespace: openshift-storage Resource Version: 7088990 UID: 4964df62-a9d1-4c6d-89bf-f28e79a525db Version: 4.12.0 Events: <none>
Could there have been a previous install on these OSDs? The error in the osd prepare job shows the device already had another OSD on it: 2022-11-10 08:59:06.259903 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-542m54", detected an existing OSD. UUID=1bc2aa4b-b3a5-4c85-b663-b1b363d73264
No, it is a fresh installation. We tried on the different cluster as well and the same was happening there too.
Do all of the osd prepare job logs that are created for the new OSDs have the same message about the device already being configured for an OSD? 2022-11-10 08:59:06.259903 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-542m54", detected an existing OSD. UUID=1bc2aa4b-b3a5-4c85-b663-b1b363d73264 So the question is why rook detects they already have an OSD configured. Can you see in the toolbox with a ceph osd command which existing OSD that UUID corresponds to?
> Do all of the osd prepare job logs that are created for the new OSDs have the same message about the device already being configured for an OSD? Yes, other 2 osd prepare pods also have same message. pod/rook-ceph-osd-prepare-c4b3ca14f4584de2d088d590fd0f4345-8cn5z : 2022-11-10 08:58:59.757457 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-3kjw2h", detected an existing OSD. UUID=dcbd2132-30bf-413d-86a8-ea3e5612265d pod/rook-ceph-osd-prepare-c8af26daa15306c2084654732615b22c-kzv6f : 2022-11-10 08:59:03.518164 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-4zkhcz", detected an existing OSD. UUID=44e28a1b-63e4-4eb8-9ce3-b7f9838b3212 [root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc rsh rook-ceph-tools-8b6fbc449-6k48c sh-4.4$ sh-4.4$ ceph osd ls 0 1 2 sh-4.4$ ceph osd info osd.0 up in weight 1 up_from 9 up_thru 25 down_at 0 last_clean_interval [0,0) [v2:10.129.2.29:6800/405,v1:10.129.2.29:6801/405] [v2:10.129.2.29:6802/405,v1:10.129.2.29:6803/405] exists,up 1b29c065-6906-4b52-b07c-8bbfdf0f4abb osd.1 up in weight 1 up_from 10 up_thru 25 down_at 0 last_clean_interval [0,0) [v2:10.131.0.30:6800/406,v1:10.131.0.30:6801/406] [v2:10.131.0.30:6802/406,v1:10.131.0.30:6803/406] exists,up 24b59909-e757-4f23-990f-43bd7806f7ae osd.2 up in weight 1 up_from 10 up_thru 25 down_at 0 last_clean_interval [0,0) [v2:10.128.2.25:6800/406,v1:10.128.2.25:6801/406] [v2:10.128.2.25:6802/406,v1:10.128.2.25:6803/406] exists,up 71792079-5156-499c-8712-e3a6001b0758
output of ceph osd tree : sh-4.4$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.46489 root default -7 0.48830 host worker-0 1 hdd 0.48830 osd.1 up 1.00000 1.00000 -5 0.48830 host worker-1 2 hdd 0.48830 osd.2 up 1.00000 1.00000 -3 0.48830 host worker-2 0 hdd 0.48830 osd.0 up 1.00000 1.00000
I just don't see how these disks could be clean. The disk has a ceph bluestore label [1] and the osd UUID exists for some osd that does not belong to this cluster. [1] https://github.com/red-hat-storage/rook/blob/release-4.12/pkg/daemon/ceph/osd/daemon.go#L294-L336
So, what can be done now, so that osd pods gets created for the 3 new osd-prepare pods?
To ensure the disks are clean, please try wiping the disks and running the install again. Here are some steps recommended for cleaning the disks: https://rook.io/docs/rook/latest/Getting-Started/ceph-teardown/#zapping-devices
I tried wiping the disks using the same procedure which you shared, in different cluster having same issue and ran the installation again, but it didn't help.
Can you use 'sgdisk' to zap the disks. I see it is a similar issue https://bugzilla.redhat.com/show_bug.cgi?id=2123077#c31 targeted here, Ashish can you give more insights, And also if this method can help us we can add as per the document for re-installation
Changing previous comment to public. Aaruni were you able to get the disk cleaned?
Yes, I cleaned the disks by following the instructions from here: https://rook.io/docs/rook/latest/Getting-Started/ceph-teardown/#zapping-devices I also tried with the new disks in new cluster and same is happening there as well.
@aaaggarw It's hard to determine what is happening on a host/disk level that is causing ceph-volume to detect that the disk is already associated with a different ceph cluster. First, double or triple check that the (1) correct disks and/or partitions are being cleaned and that (2) they are wiped using the `dd` command from the cleanup doc you linked. If the disks/partitions are for sure cleaned using `dd`, then we need to see what ceph-volume is doing. Let's do this in 2 ways: 1. Put the rook operator into debug mode by setting `ROOK_LOG_LEVEL: DEBUG` in the rook-ceph-operator-config configmap (you may need to stop the ocs operator for this), and get New provision logs. 2. We want to run `ceph-volume --log-level debug raw list` on the host and get the output. I think the best way to do that on openshift is to create a k8s job that has access to `/dev` on the host that runs that command. Attach both of those logs for us, please. Thanks, Blaine
Sure, Blaine, I will provide the required logs once I get the new cluster. I don't have an older cluster with me now.
Please reopen when you get the required logs