2142497 – Unable to perform scale up using LocalVolumeset on IBM Power platform

Bug 2142497 - Unable to perform scale up using LocalVolumeset on IBM Power platform

Summary: Unable to perform scale up using LocalVolumeset on IBM Power platform

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.12
Hardware:	ppc64le
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Travis Nielsen
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-11-14 09:06 UTC by Aaruni Aggarwal
Modified:	2023-12-08 04:31 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-02-07 15:09:45 UTC
Embargoed:

Attachments	(Terms of Use)

Description Aaruni Aggarwal 2022-11-14 09:06:32 UTC

Description of problem (please be detailed as possible and provide log
snippests):
Unable to perform Scale up using localVolumeSet on IBM Power Platform

Version of all relevant components (if applicable):
OCP: 4.12.0
ODF: 4.12.0

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

No
Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy ODF4.12.0 on OCP4.12.0 using local volume set.
2. Add new disk on all the 3 worker nodes in OCP Cluster
3. Perform Scale up on Storage System by adding capacity. 


Actual results:
new osd pods didn't come up.

Expected results:
3 new osd pods should be created.

Additional info:

Comment 2 Aaruni Aggarwal 2022-11-14 09:07:23 UTC

must-gather logs: must-gather logs: https://drive.google.com/file/d/1DTqiBCkGEinXfaqgB63WPLPcUD6SCSgx/view?usp=sharing

Comment 3 Aaruni Aggarwal 2022-11-14 09:09:30 UTC

[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc get pods
NAME                                                              READY   STATUS      RESTARTS   AGE
csi-addons-controller-manager-5c875c5f5f-fq44d                    2/2     Running     0          3h15m
csi-cephfsplugin-552w6                                            2/2     Running     0          3h13m
csi-cephfsplugin-bglct                                            2/2     Running     0          3h13m
csi-cephfsplugin-dw8rx                                            2/2     Running     0          3h13m
csi-cephfsplugin-provisioner-fb588677b-gjd2w                      5/5     Running     0          3h13m
csi-cephfsplugin-provisioner-fb588677b-lbzz4                      5/5     Running     0          3h13m
csi-rbdplugin-p46m9                                               3/3     Running     0          3h13m
csi-rbdplugin-provisioner-858fd9c5c7-78xpl                        6/6     Running     0          3h13m
csi-rbdplugin-provisioner-858fd9c5c7-g2nc8                        6/6     Running     0          3h13m
csi-rbdplugin-vzgz4                                               3/3     Running     0          3h13m
csi-rbdplugin-zc5bk                                               3/3     Running     0          3h13m
noobaa-core-0                                                     1/1     Running     0          3h10m
noobaa-db-pg-0                                                    1/1     Running     0          3h10m
noobaa-endpoint-fcbf85d48-4wgd4                                   1/1     Running     0          3h8m
noobaa-operator-665d987554-hwwpz                                  1/1     Running     0          3h15m
ocs-metrics-exporter-8585b9bd9b-897l4                             1/1     Running     0          3h15m
ocs-operator-64c4b7bb44-jtszn                                     1/1     Running     0          3h15m
odf-console-669bd79499-xbxml                                      1/1     Running     0          3h15m
odf-operator-controller-manager-bd47859c-bpl5c                    2/2     Running     0          3h15m
rook-ceph-crashcollector-worker-0-568c85f64d-hpt2l                1/1     Running     0          3h11m
rook-ceph-crashcollector-worker-1-5448fdc6b5-f49t9                1/1     Running     0          3h12m
rook-ceph-crashcollector-worker-2-59d789995d-jg47h                1/1     Running     0          3h11m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-794fd4f8zx87r   2/2     Running     0          3h11m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5546cfc5sqjwj   2/2     Running     0          3h11m
rook-ceph-mgr-a-596c5865bf-vsdpl                                  2/2     Running     0          3h12m
rook-ceph-mon-a-8ccb4848c-dk4ht                                   2/2     Running     0          3h13m
rook-ceph-mon-b-7c6fff665-2z5km                                   2/2     Running     0          3h12m
rook-ceph-mon-c-55b9c49b6c-k25sb                                  2/2     Running     0          3h12m
rook-ceph-operator-7896488fc-nprlt                                1/1     Running     0          3h15m
rook-ceph-osd-0-79f9bfd78d-glt96                                  2/2     Running     0          3h11m
rook-ceph-osd-1-d66768c5b-mb8lh                                   2/2     Running     0          3h11m
rook-ceph-osd-2-6848569fcb-x4p8g                                  2/2     Running     0          3h11m
rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf-f7222      0/1     Completed   0          7m44s
rook-ceph-osd-prepare-4d974aae61e20c6b2523d76584c59451-vpkdr      0/1     Completed   0          3h11m
rook-ceph-osd-prepare-509e05e67fd495f59c66466cf91f3e92-dgcrb      0/1     Completed   0          3h11m
rook-ceph-osd-prepare-7adc6fdb0aabc7fd2ebed78ca909e7db-5ldmb      0/1     Completed   0          3h11m
rook-ceph-osd-prepare-c4b3ca14f4584de2d088d590fd0f4345-8cn5z      0/1     Completed   0          7m50s
rook-ceph-osd-prepare-c8af26daa15306c2084654732615b22c-kzv6f      0/1     Completed   0          7m47s
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-6cb859bxs56r   2/2     Running     0          3h11m
rook-ceph-tools-8b6fbc449-6k48c                                   1/1     Running     0          3h10m
[root@rdr-cicd-odf-0d4b-bastion-0 ~]# 
[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc describe pod rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf-f7222
Name:                 rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf-f7222
Namespace:            openshift-storage
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      rook-ceph-osd
Node:                 worker-0/9.47.90.209
Start Time:           Thu, 10 Nov 2022 03:59:01 -0500
Labels:               app=rook-ceph-osd-prepare
                      ceph.rook.io/DeviceSet=ocs-deviceset-localblock-0
                      ceph.rook.io/pvc=ocs-deviceset-localblock-0-data-542m54
                      controller-uid=c9c3ab5c-d597-48f3-9989-2d2b8cf2162a
                      job-name=rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf
                      rook_cluster=openshift-storage
Annotations:          k8s.v1.cni.cncf.io/network-status:
                        [{
                            "name": "openshift-sdn",
                            "interface": "eth0",
                            "ips": [
                                "10.131.0.38"
                            ],
                            "default": true,
                            "dns": {}
                        }]
                      k8s.v1.cni.cncf.io/networks-status:
                        [{
                            "name": "openshift-sdn",
                            "interface": "eth0",
                            "ips": [
                                "10.131.0.38"
                            ],
                            "default": true,
                            "dns": {}
                        }]
                      openshift.io/scc: rook-ceph
Status:               Succeeded
IP:                   10.131.0.38
IPs:
  IP:           10.131.0.38
Controlled By:  Job/rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf
Init Containers:
  copy-bins:
    Container ID:  cri-o://eb8b9c79b7b196e5d510738d15e7272df6ff6d6176a9cc31614a1febfa8510b2
    Image:         quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:58f3a10e38232e24a408b08e9a6babfa7ccf7e9d06dcd207f28dfe2301d82374
    Image ID:      quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:58f3a10e38232e24a408b08e9a6babfa7ccf7e9d06dcd207f28dfe2301d82374
    Port:          <none>
    Host Port:     <none>
    Args:
      copy-binaries
      --copy-to-dir
      /rook
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 10 Nov 2022 03:59:03 -0500
      Finished:     Thu, 10 Nov 2022 03:59:03 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /rook from rook-binaries (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qqg7b (ro)
  blkdevmapper:
    Container ID:  cri-o://45a7fc393ad23e6159d92f0cd12b371570829138c79492c328b043461f917100
    Image:         quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7
    Image ID:      quay.io/rhceph-dev/rhceph@sha256:8e8a2a243ceb4275a9ab383025714b62e8b03b6ac0f98f89c55a91086c287192
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      
      set -xe
      
      PVC_SOURCE=/ocs-deviceset-localblock-0-data-542m54
      PVC_DEST=/mnt/ocs-deviceset-localblock-0-data-542m54
      CP_ARGS=(--archive --dereference --verbose)
      
      if [ -b "$PVC_DEST" ]; then
        PVC_SOURCE_MAJ_MIN=$(stat --format '%t%T' $PVC_SOURCE)
        PVC_DEST_MAJ_MIN=$(stat --format '%t%T' $PVC_DEST)
        if [[ "$PVC_SOURCE_MAJ_MIN" == "$PVC_DEST_MAJ_MIN" ]]; then
          echo "PVC $PVC_DEST already exists and has the same major and minor as $PVC_SOURCE: "$PVC_SOURCE_MAJ_MIN""
          exit 0
        else
          echo "PVC's source major/minor numbers changed"
          CP_ARGS+=(--remove-destination)
        fi
      fi
      
      cp "${CP_ARGS[@]}" "$PVC_SOURCE" "$PVC_DEST"
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 10 Nov 2022 03:59:05 -0500
      Finished:     Thu, 10 Nov 2022 03:59:05 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt from ocs-deviceset-localblock-0-data-542m54-bridge (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qqg7b (ro)
    Devices:
      /ocs-deviceset-localblock-0-data-542m54 from ocs-deviceset-localblock-0-data-542m54
Containers:
  provision:
    Container ID:  cri-o://d8500656c63cca5fb4fd9f2eea76c488f9f45230d86bdfd397584da2c8c26715
    Image:         quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7
    Image ID:      quay.io/rhceph-dev/rhceph@sha256:8e8a2a243ceb4275a9ab383025714b62e8b03b6ac0f98f89c55a91086c287192
    Port:          <none>
    Host Port:     <none>
    Command:
      /rook/rook
    Args:
      ceph
      osd
      provision
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 10 Nov 2022 03:59:06 -0500
      Finished:     Thu, 10 Nov 2022 03:59:08 -0500
    Ready:          False
    Restart Count:  0
    Environment Variables from:
      rook-ceph-osd-env-override  ConfigMap  Optional: true
    Environment:
      ROOK_NODE_NAME:                 ocs-deviceset-localblock-0-data-542m54
      ROOK_CLUSTER_ID:                29755a55-9b45-49e3-8851-0366ce52ca04
      ROOK_CLUSTER_NAME:              ocs-storagecluster-cephcluster
      ROOK_PRIVATE_IP:                 (v1:status.podIP)
      ROOK_PUBLIC_IP:                  (v1:status.podIP)
      POD_NAMESPACE:                  openshift-storage
      ROOK_MON_ENDPOINTS:             <set to the key 'data' of config map 'rook-ceph-mon-endpoints'>  Optional: false
      ROOK_MON_SECRET:                <set to the key 'mon-secret' in secret 'rook-ceph-mon'>          Optional: false
      ROOK_CEPH_USERNAME:             <set to the key 'ceph-username' in secret 'rook-ceph-mon'>       Optional: false
      ROOK_CEPH_SECRET:               <set to the key 'ceph-secret' in secret 'rook-ceph-mon'>         Optional: false
      ROOK_CONFIG_DIR:                /var/lib/rook
      ROOK_CEPH_CONFIG_OVERRIDE:      /etc/rook/config/override.conf
      ROOK_FSID:                      <set to the key 'fsid' in secret 'rook-ceph-mon'>  Optional: false
      NODE_NAME:                       (v1:spec.nodeName)
      ROOK_CRUSHMAP_ROOT:             default
      ROOK_CRUSHMAP_HOSTNAME:         
      CEPH_VOLUME_DEBUG:              1
      CEPH_VOLUME_SKIP_RESTORECON:    1
      DM_DISABLE_UDEV:                1
      ROOK_LOG_LEVEL:                 DEBUG
      ROOK_CEPH_VERSION:              ceph version 16.2.10-50 pacific
      ROOK_OSD_CRUSH_DEVICE_CLASS:    
      ROOK_OSD_CRUSH_INITIAL_WEIGHT:  
      ROOK_DATA_DEVICES:              [{"id":"/mnt/ocs-deviceset-localblock-0-data-542m54","storeConfig":{"osdsPerDevice":1}}]
      ROOK_PVC_BACKED_OSD:            true
      ROOK_ENCRYPTED_DEVICE:          false
      ROOK_PVC_NAME:                  ocs-deviceset-localblock-0-data-542m54
    Mounts:
      /dev from devices (rw)
      /etc/ceph from ceph-conf-emptydir (rw)
      /mnt from ocs-deviceset-localblock-0-data-542m54-bridge (rw)
      /rook from rook-binaries (rw)
      /run/udev from udev (rw)
      /var/lib/ceph/crash from rook-ceph-crash (rw)
      /var/lib/rook from rook-data (rw)
      /var/log/ceph from rook-ceph-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qqg7b (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  rook-data:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook
    HostPathType:  
  ceph-conf-emptydir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  rook-ceph-log:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/openshift-storage/log
    HostPathType:  
  rook-ceph-crash:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/openshift-storage/crash
    HostPathType:  
  rook-binaries:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  devices:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  udev:
    Type:          HostPath (bare host directory volume)
    Path:          /run/udev
    HostPathType:  
  ocs-deviceset-localblock-0-data-542m54:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  ocs-deviceset-localblock-0-data-542m54
    ReadOnly:   false
  ocs-deviceset-localblock-0-data-542m54-bridge:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  kube-api-access-qqg7b:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
    ConfigMapName:            openshift-service-ca.crt
    ConfigMapOptional:        <nil>
QoS Class:                    BestEffort
Node-Selectors:               <none>
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
                              node.ocs.openshift.io/storage=true:NoSchedule
                              node.ocs.openshift.io/storage=true:NoSchedule
Topology Spread Constraints:  kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector ceph.rook.io/pvc
Events:
  Type    Reason                 Age    From               Message
  ----    ------                 ----   ----               -------
  Normal  Scheduled              8m12s  default-scheduler  Successfully assigned openshift-storage/rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf-f7222 to worker-0
  Normal  SuccessfulMountVolume  8m12s  kubelet            MapVolume.MapPodDevice succeeded for volume "local-pv-c988e86f" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io~local-volume/volumeDevices/local-pv-c988e86f"
  Normal  SuccessfulMountVolume  8m12s  kubelet            MapVolume.MapPodDevice succeeded for volume "local-pv-c988e86f" volumeMapPath "/var/lib/kubelet/pods/8ebc64fb-ea27-4382-a078-0bdf21fb912c/volumeDevices/kubernetes.io~local-volume"
  Normal  AddedInterface         8m11s  multus             Add eth0 [10.131.0.38/23] from openshift-sdn
  Normal  Pulled                 8m10s  kubelet            Container image "quay.io/rhceph-dev/odf4-rook-ceph-rhel8-operator@sha256:58f3a10e38232e24a408b08e9a6babfa7ccf7e9d06dcd207f28dfe2301d82374" already present on machine
  Normal  Created                8m10s  kubelet            Created container copy-bins
  Normal  Started                8m10s  kubelet            Started container copy-bins
  Normal  Pulled                 8m9s   kubelet            Container image "quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7" already present on machine
  Normal  Created                8m8s   kubelet            Created container blkdevmapper
  Normal  Started                8m8s   kubelet            Started container blkdevmapper
  Normal  Pulled                 8m8s   kubelet            Container image "quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7" already present on machine
  Normal  Created                8m7s   kubelet            Created container provision
  Normal  Started                8m7s   kubelet            Started container provision

[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc logs -f pod/rook-ceph-osd-prepare-45d780aaf8458e624ae84f1341ac8bcf-f7222
Defaulted container "provision" out of: provision, copy-bins (init), blkdevmapper (init)
2022-11-10 08:59:06.220369 I | cephcmd: desired devices to configure osds: [{Name:/mnt/ocs-deviceset-localblock-0-data-542m54 OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-11-10 08:59:06.221113 I | rookcmd: starting Rook v4.12.0-0.6d9b42e8640617ef19eb68feb636bb23c283ab00 with arguments '/rook/rook ceph osd provision'
2022-11-10 08:59:06.221121 I | rookcmd: flag values: --cluster-id=29755a55-9b45-49e3-8851-0366ce52ca04, --cluster-name=ocs-storagecluster-cephcluster, --data-device-filter=, --data-device-path-filter=, --data-devices=[{"id":"/mnt/ocs-deviceset-localblock-0-data-542m54","storeConfig":{"osdsPerDevice":1}}], --encrypted-device=false, --force-format=false, --help=false, --location=, --log-level=DEBUG, --metadata-device=, --node-name=ocs-deviceset-localblock-0-data-542m54, --operator-image=, --osd-crush-device-class=, --osd-crush-initial-weight=, --osd-database-size=0, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=true, --service-account=
2022-11-10 08:59:06.221126 I | op-mon: parsing mon endpoints: b=172.30.252.124:6789,c=172.30.39.41:6789,a=172.30.215.72:6789
2022-11-10 08:59:06.231515 I | op-osd: CRUSH location=root=default host=worker-0
2022-11-10 08:59:06.231532 I | cephcmd: crush location of osd: root=default host=worker-0
2022-11-10 08:59:06.231547 D | exec: Running command: dmsetup version
2022-11-10 08:59:06.235093 I | cephosd: Library version:   1.02.181-RHEL8 (2021-10-20)
Driver version:    4.43.0
2022-11-10 08:59:06.246361 I | cephclient: writing config file /var/lib/rook/openshift-storage/openshift-storage.config
2022-11-10 08:59:06.246523 I | cephclient: generated admin config in /var/lib/rook/openshift-storage
2022-11-10 08:59:06.246713 D | cephclient: config file @ /etc/ceph/ceph.conf:
[global]
fsid                        = 3b5897f8-8705-48d3-b59d-5c023ec65c7e
mon initial members         = b c a
mon host                    = [v2:172.30.252.124:3300,v1:172.30.252.124:6789],[v2:172.30.39.41:3300,v1:172.30.39.41:6789],[v2:172.30.215.72:3300,v1:172.30.215.72:6789]
bdev_flock_retry            = 20
mon_osd_full_ratio          = .85
mon_osd_backfillfull_ratio  = .8
mon_osd_nearfull_ratio      = .75
mon_max_pg_per_osd          = 600
mon_pg_warn_max_object_skew = 0
mon_data_avail_warn         = 15

[osd]
osd_memory_target_cgroup_limit_ratio = 0.8

[client.admin]
keyring = /var/lib/rook/openshift-storage/client.admin.keyring
2022-11-10 08:59:06.246722 I | cephosd: discovering hardware
2022-11-10 08:59:06.246734 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-542m54 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-11-10 08:59:06.250587 D | sys: lsblk output: "SIZE=\"536870912000\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sdd\" KNAME=\"/dev/sdd\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-11-10 08:59:06.250658 D | exec: Running command: sgdisk --print /mnt/ocs-deviceset-localblock-0-data-542m54
2022-11-10 08:59:06.253226 D | exec: Running command: udevadm info --query=property /dev/sdd
2022-11-10 08:59:06.259736 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-id/wwn-0x6005076d0281005ef000000000028d24 /dev/disk/by-id/scsi-36005076d0281005ef000000000028d24 /dev/disk/by-id/scsi-SAIX_VDASD_332136005076D0281005EF000000000028D2404214503IBMfcp\nDEVNAME=/dev/sdd\nDEVPATH=/devices/vio/30000002/host0/target0:0:5/0:0:5:0/block/sdd\nDEVTYPE=disk\nID_BUS=scsi\nID_MODEL=VDASD\nID_MODEL_ENC=VDASD\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\nID_REVISION=0001\nID_SCSI=1\nID_SCSI_INQUIRY=1\nID_SCSI_SERIAL=332136005076D0281005EF000000000028D2404214503IBMfcp\nID_SERIAL=36005076d0281005ef000000000028d24\nID_SERIAL_SHORT=6005076d0281005ef000000000028d24\nID_TARGET_PORT=0\nID_TYPE=disk\nID_VENDOR=AIX\nID_VENDOR_ENC=AIX\\x20\\x20\\x20\\x20\\x20\nID_WWN=0x6005076d0281005e\nID_WWN_VENDOR_EXTENSION=0xf000000000028d24\nID_WWN_WITH_EXTENSION=0x6005076d0281005ef000000000028d24\nMAJOR=8\nMINOR=48\nSCSI_IDENT_LUN_NAA_REGEXT=6005076d0281005ef000000000028d24\nSCSI_IDENT_PORT_RELATIVE=2177\nSCSI_IDENT_PORT_TARGET_PORT_GROUP=0x0\nSCSI_IDENT_SERIAL=332136005076D0281005EF000000000028D2404214503IBMfcp\nSCSI_MODEL=VDASD\nSCSI_MODEL_ENC=VDASD\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\nSCSI_REVISION=0001\nSCSI_TPGS=0\nSCSI_TYPE=disk\nSCSI_VENDOR=AIX\nSCSI_VENDOR_ENC=AIX\\x20\\x20\\x20\\x20\\x20\nSUBSYSTEM=block\nTAGS=:systemd:\nUSEC_INITIALIZED=13435165299"
2022-11-10 08:59:06.259784 I | cephosd: creating and starting the osds
2022-11-10 08:59:06.259809 D | cephosd: desiredDevices are [{Name:/mnt/ocs-deviceset-localblock-0-data-542m54 OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-11-10 08:59:06.259814 D | cephosd: context.Devices are:
2022-11-10 08:59:06.259850 D | cephosd: &{Name:/mnt/ocs-deviceset-localblock-0-data-542m54 Parent: HasChildren:false DevLinks:/dev/disk/by-id/wwn-0x6005076d0281005ef000000000028d24 /dev/disk/by-id/scsi-36005076d0281005ef000000000028d24 /dev/disk/by-id/scsi-SAIX_VDASD_332136005076D0281005EF000000000028D2404214503IBMfcp Size:536870912000 UUID:0d93e4da-5df7-4f73-b33b-18dafeefe024 Serial:36005076d0281005ef000000000028d24 Type:data Rotational:true Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor:AIX Model:VDASD WWN:0x6005076d0281005e WWNVendorExtension:0x6005076d0281005ef000000000028d24 Empty:false CephVolumeData: RealPath:/dev/sdd KernelName:sdd Encrypted:false}
2022-11-10 08:59:06.259860 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here
2022-11-10 08:59:06.259903 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-542m54", detected an existing OSD. UUID=1bc2aa4b-b3a5-4c85-b663-b1b363d73264
2022-11-10 08:59:06.267719 I | cephosd: configuring osd devices: {"Entries":{}}
2022-11-10 08:59:06.267746 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-11-10 08:59:06.267762 D | exec: Running command: pvdisplay -C -o lvpath --noheadings /mnt/ocs-deviceset-localblock-0-data-542m54
2022-11-10 08:59:06.844247 W | cephosd: failed to retrieve logical volume path for "/mnt/ocs-deviceset-localblock-0-data-542m54". exit status 5
2022-11-10 08:59:06.844332 D | exec: Running command: lsblk /mnt/ocs-deviceset-localblock-0-data-542m54 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-11-10 08:59:06.848115 D | sys: lsblk output: "SIZE=\"536870912000\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sdd\" KNAME=\"/dev/sdd\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-11-10 08:59:06.848390 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
2022-11-10 08:59:07.641820 D | cephosd: {}
2022-11-10 08:59:07.641891 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-11-10 08:59:07.641906 D | exec: Running command: cryptsetup luksDump /mnt/ocs-deviceset-localblock-0-data-542m54
2022-11-10 08:59:07.654420 E | cephosd: failed to determine if the encrypted block "/mnt/ocs-deviceset-localblock-0-data-542m54" is from our cluster. failed to dump LUKS header for disk "/mnt/ocs-deviceset-localblock-0-data-542m54". Device /mnt/ocs-deviceset-localblock-0-data-542m54 is not a valid LUKS device.: exit status 1
2022-11-10 08:59:07.654457 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/ocs-deviceset-localblock-0-data-542m54 --format json
2022-11-10 08:59:07.994847 D | cephosd: {}
2022-11-10 08:59:07.994910 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-11-10 08:59:07.994925 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "ocs-deviceset-localblock-0-data-542m54"

[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                            STORAGECLASS                  REASON   AGE
local-pv-183393b6                          500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-3kjw2h         localblock                             28m
local-pv-3babaca9                          500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-4zkhcz         localblock                             28m
local-pv-bec7af9c                          500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-26s9vd         localblock                             3h23m
local-pv-c988e86f                          500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-542m54         localblock                             29m
local-pv-d0b1dff6                          500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-1tpfs6         localblock                             3h23m
local-pv-f6a955b1                          500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-0x6z2b         localblock                             3h23m
pvc-155f125c-8ca8-4742-b5a3-e79760898a50   40Gi       RWO            Delete           Bound    openshift-monitoring/my-prometheus-claim-prometheus-k8s-0        ocs-storagecluster-ceph-rbd            3h16m
pvc-5f47b390-5788-42f0-927f-a07b7f5fe4d2   50Gi       RWO            Delete           Bound    openshift-storage/db-noobaa-db-pg-0                              ocs-storagecluster-ceph-rbd            3h17m
pvc-6f6bf1ea-e61d-4d45-bab8-79f957a959fe   40Gi       RWO            Delete           Bound    openshift-monitoring/my-alertmanager-claim-alertmanager-main-1   ocs-storagecluster-ceph-rbd            3h16m
pvc-8ed35488-1029-403e-8f4e-f0573d38b524   40Gi       RWO            Delete           Bound    openshift-monitoring/my-alertmanager-claim-alertmanager-main-0   ocs-storagecluster-ceph-rbd            3h16m
pvc-95f979ff-18c2-4044-9b01-3a5c11a6dbf8   100Gi      RWX            Delete           Bound    openshift-image-registry/registry-cephfs-rwx-pvc                 ocs-storagecluster-cephfs              3h16m
pvc-d7d2e0fc-0fa7-410f-9223-0001209c6ee7   40Gi       RWO            Delete           Bound    openshift-monitoring/my-prometheus-claim-prometheus-k8s-1        ocs-storagecluster-ceph-rbd            3h16m

[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc get pvc
NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-pg-0                        Bound    pvc-5f47b390-5788-42f0-927f-a07b7f5fe4d2   50Gi       RWO            ocs-storagecluster-ceph-rbd   3h17m
ocs-deviceset-localblock-0-data-0x6z2b   Bound    local-pv-f6a955b1                          500Gi      RWO            localblock                    3h18m
ocs-deviceset-localblock-0-data-1tpfs6   Bound    local-pv-d0b1dff6                          500Gi      RWO            localblock                    3h18m
ocs-deviceset-localblock-0-data-26s9vd   Bound    local-pv-bec7af9c                          500Gi      RWO            localblock                    3h18m
ocs-deviceset-localblock-0-data-3kjw2h   Bound    local-pv-183393b6                          500Gi      RWO            localblock                    15m
ocs-deviceset-localblock-0-data-4zkhcz   Bound    local-pv-3babaca9                          500Gi      RWO            localblock                    15m
ocs-deviceset-localblock-0-data-542m54   Bound    local-pv-c988e86f                          500Gi      RWO            localblock                    15m

Comment 4 Aaruni Aggarwal 2022-11-14 09:15:43 UTC

[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc get nodes
NAME       STATUS   ROLES                  AGE    VERSION
master-0   Ready    control-plane,master   4d4h   v1.25.2+93b33ea
master-1   Ready    control-plane,master   4d4h   v1.25.2+93b33ea
master-2   Ready    control-plane,master   4d4h   v1.25.2+93b33ea
worker-0   Ready    worker                 4d4h   v1.25.2+93b33ea
worker-1   Ready    worker                 4d4h   v1.25.2+93b33ea
worker-2   Ready    worker                 4d4h   v1.25.2+93b33ea

[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc debug node/worker-0
Warning: would violate PodSecurity "baseline:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/worker-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 9.47.90.209
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# 
sh-4.4# lsblk |grep 500
loop1    7:1    0   500G  0 loop 
sdb      8:16   0   500G  0 disk 
sdd      8:48   0   500G  0 disk 

[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc debug node/worker-1
Warning: would violate PodSecurity "baseline:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/worker-1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 9.47.90.190
If you don't see a command prompt, try pressing enter.
sh-4.4# 
sh-4.4# chroot /host
sh-4.4# 
sh-4.4# lsblk |grep 500
loop1    7:1    0   500G  0 loop 
sdb      8:16   0   500G  0 disk 
sdd      8:48   0   500G  0 disk 
sh-4.4# exit
sh-4.4# exit

Removing debug pod ...

[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc debug node/worker-2
Warning: would violate PodSecurity "baseline:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/worker-2-debug ...
To use host binaries, run `chroot /host`
Pod IP: 9.47.90.165
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# lsblk |grep 500
loop1    7:1    0   500G  0 loop 
sdb      8:16   0   500G  0 disk 
sdd      8:48   0   500G  0 disk 
sh-4.4# exit
exit
sh-4.4# exit

Removing debug pod ...


[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc get localvolumeset -n openshift-local-storage
NAME         STORAGECLASS   PROVISIONED   AGE
localblock   localblock     6             4d3h

[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc describe localvolumeset localblock -n openshift-local-storage
Name:         localblock
Namespace:    openshift-local-storage
Labels:       <none>
Annotations:  <none>
API Version:  local.storage.openshift.io/v1alpha1
Kind:         LocalVolumeSet
Metadata:
  Creation Timestamp:  2022-11-10T05:49:06Z
  Finalizers:
    storage.openshift.com/local-volume-protection
  Generation:  1
  Managed Fields:
    API Version:  local.storage.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:deviceInclusionSpec:
          .:
          f:deviceMechanicalProperties:
          f:deviceTypes:
          f:minSize:
        f:nodeSelector:
          .:
          f:nodeSelectorTerms:
        f:storageClassName:
        f:volumeMode:
    Manager:      kubectl-create
    Operation:    Update
    Time:         2022-11-10T05:49:06Z
    API Version:  local.storage.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"storage.openshift.com/local-volume-protection":
    Manager:      local-storage-operator
    Operation:    Update
    Time:         2022-11-10T05:49:06Z
    API Version:  local.storage.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
        f:observedGeneration:
        f:totalProvisionedDeviceCount:
    Manager:         local-storage-operator
    Operation:       Update
    Subresource:     status
    Time:            2022-11-10T08:45:05Z
  Resource Version:  261137
  UID:               16584bf9-0e8b-40b1-9665-2dd083b1ac11
Spec:
  Device Inclusion Spec:
    Device Mechanical Properties:
      NonRotational
      Rotational
    Device Types:
      disk
      part
    Min Size:  100Gi
  Node Selector:
    Node Selector Terms:
      Match Expressions:
        Key:       kubernetes.io/hostname
        Operator:  In
        Values:
          worker-0
          worker-1
          worker-2
  Storage Class Name:  localblock
  Volume Mode:         Block
Status:
  Conditions:
    Last Transition Time:          2022-11-10T05:49:06Z
    Message:                       DiskMaker: Available
    Status:                        True
    Type:                          DaemonSetsAvailable
    Last Transition Time:          2022-11-10T05:49:06Z
    Message:                       Operator reconciled successfully.
    Status:                        True
    Type:                          Available
  Observed Generation:             1
  Total Provisioned Device Count:  6
Events:                            <none>


[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc describe storagecluster ocs-storagecluster
Name:         ocs-storagecluster
Namespace:    openshift-storage
Labels:       <none>
Annotations:  cluster.ocs.openshift.io/local-devices: true
              uninstall.ocs.openshift.io/cleanup-policy: delete
              uninstall.ocs.openshift.io/mode: graceful
API Version:  ocs.openshift.io/v1
Kind:         StorageCluster
Metadata:
  Creation Timestamp:  2022-11-10T05:52:53Z
  Finalizers:
    storagecluster.ocs.openshift.io
  Generation:  3
  Managed Fields:
    API Version:  ocs.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:cluster.ocs.openshift.io/local-devices:
      f:spec:
        .:
        f:flexibleScaling:
        f:monDataDirHostPath:
    Manager:      kubectl-create
    Operation:    Update
    Time:         2022-11-10T05:52:53Z
    API Version:  ocs.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .:
          k:{"uid":"ef253c10-2986-4cd4-a76f-35514c225041"}:
    Manager:      manager
    Operation:    Update
    Time:         2022-11-10T05:52:54Z
    API Version:  ocs.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:uninstall.ocs.openshift.io/cleanup-policy:
          f:uninstall.ocs.openshift.io/mode:
        f:finalizers:
          .:
          v:"storagecluster.ocs.openshift.io":
      f:spec:
        f:arbiter:
        f:encryption:
          .:
          f:kms:
        f:externalStorage:
        f:managedResources:
          .:
          f:cephBlockPools:
          f:cephCluster:
          f:cephConfig:
          f:cephDashboard:
          f:cephFilesystems:
          f:cephNonResilientPools:
          f:cephObjectStoreUsers:
          f:cephObjectStores:
          f:cephToolbox:
        f:mirroring:
    Manager:      ocs-operator
    Operation:    Update
    Time:         2022-11-10T05:52:54Z
    API Version:  ocs.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:storageDeviceSets:
    Manager:      kubectl-edit
    Operation:    Update
    Time:         2022-11-10T08:57:47Z
    API Version:  ocs.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
        f:externalStorage:
          .:
          f:grantedCapacity:
        f:failureDomain:
        f:failureDomainKey:
        f:failureDomainValues:
        f:images:
          .:
          f:ceph:
            .:
            f:actualImage:
            f:desiredImage:
          f:noobaaCore:
            .:
            f:actualImage:
            f:desiredImage:
          f:noobaaDB:
            .:
            f:actualImage:
            f:desiredImage:
        f:kmsServerConnection:
        f:nodeTopologies:
          .:
          f:labels:
            .:
            f:kubernetes.io/hostname:
        f:phase:
        f:relatedObjects:
        f:version:
    Manager:      ocs-operator
    Operation:    Update
    Subresource:  status
    Time:         2022-11-14T09:10:30Z
  Owner References:
    API Version:     odf.openshift.io/v1alpha1
    Kind:            StorageSystem
    Name:            ocs-storagecluster-storagesystem
    UID:             ef253c10-2986-4cd4-a76f-35514c225041
  Resource Version:  7088995
  UID:               b06ab659-eb9f-40b3-ae11-48edbb01c95f
Spec:
  Arbiter:
  Encryption:
    Kms:
  External Storage:
  Flexible Scaling:  true
  Managed Resources:
    Ceph Block Pools:
    Ceph Cluster:
    Ceph Config:
    Ceph Dashboard:
    Ceph Filesystems:
    Ceph Non Resilient Pools:
    Ceph Object Store Users:
    Ceph Object Stores:
    Ceph Toolbox:
  Mirroring:
  Mon Data Dir Host Path:  /var/lib/rook
  Storage Device Sets:
    Config:
    Count:  6
    Data PVC Template:
      Metadata:
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:         100Gi
        Storage Class Name:  localblock
        Volume Mode:         Block
      Status:
    Name:  ocs-deviceset-localblock
    Placement:
    Prepare Placement:
    Replica:  1
    Resources:
Status:
  Conditions:
    Last Heartbeat Time:   2022-11-14T09:10:30Z
    Last Transition Time:  2022-11-14T06:38:36Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  ReconcileComplete
    Last Heartbeat Time:   2022-11-14T09:10:30Z
    Last Transition Time:  2022-11-10T08:58:53Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  Available
    Last Heartbeat Time:   2022-11-14T09:10:30Z
    Last Transition Time:  2022-11-10T08:59:09Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                False
    Type:                  Progressing
    Last Heartbeat Time:   2022-11-14T09:10:30Z
    Last Transition Time:  2022-11-10T08:58:53Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                False
    Type:                  Degraded
    Last Heartbeat Time:   2022-11-14T09:10:30Z
    Last Transition Time:  2022-11-10T08:59:09Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  Upgradeable
  External Storage:
    Granted Capacity:  0
  Failure Domain:      host
  Failure Domain Key:  kubernetes.io/hostname
  Failure Domain Values:
    worker-0
    worker-1
    worker-2
  Images:
    Ceph:
      Actual Image:   quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7
      Desired Image:  quay.io/rhceph-dev/rhceph@sha256:9b9d1dffa2254ee04f6d7628daa244e805637cf03420bad89545495fadb491d7
    Noobaa Core:
      Actual Image:   quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:ee1bc56dc3cf3b7f0136184668700caca835712f3252bb79c6c745e772850e25
      Desired Image:  quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:ee1bc56dc3cf3b7f0136184668700caca835712f3252bb79c6c745e772850e25
    Noobaa DB:
      Actual Image:   quay.io/rhceph-dev/rhel8-postgresql-12@sha256:aa65868b9684f7715214f5f3fac3139245c212019cc17742f237965a7508222d
      Desired Image:  quay.io/rhceph-dev/rhel8-postgresql-12@sha256:aa65868b9684f7715214f5f3fac3139245c212019cc17742f237965a7508222d
  Kms Server Connection:
  Node Topologies:
    Labels:
      kubernetes.io/hostname:
        worker-0
        worker-1
        worker-2
  Phase:  Ready
  Related Objects:
    API Version:       ceph.rook.io/v1
    Kind:              CephCluster
    Name:              ocs-storagecluster-cephcluster
    Namespace:         openshift-storage
    Resource Version:  7088190
    UID:               29755a55-9b45-49e3-8851-0366ce52ca04
    API Version:       noobaa.io/v1alpha1
    Kind:              NooBaa
    Name:              noobaa
    Namespace:         openshift-storage
    Resource Version:  7088990
    UID:               4964df62-a9d1-4c6d-89bf-f28e79a525db
  Version:             4.12.0
Events:                <none>

Comment 5 Travis Nielsen 2022-11-14 19:23:48 UTC

Could there have been a previous install on these OSDs? The error in the osd prepare job shows the device already had another OSD on it:

2022-11-10 08:59:06.259903 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-542m54", detected an existing OSD. UUID=1bc2aa4b-b3a5-4c85-b663-b1b363d73264

Comment 6 Aaruni Aggarwal 2022-11-15 05:18:10 UTC

No, it is a fresh installation. We tried on the different cluster as well and the same was happening there too.

Comment 8 Travis Nielsen 2022-11-15 23:43:38 UTC

Do all of the osd prepare job logs that are created for the new OSDs have the same message about the device already being configured for an OSD?

2022-11-10 08:59:06.259903 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-542m54", detected an existing OSD. UUID=1bc2aa4b-b3a5-4c85-b663-b1b363d73264

So the question is why rook detects they already have an OSD configured. Can you see in the toolbox with a ceph osd command which existing OSD that UUID corresponds to?

Comment 9 Aaruni Aggarwal 2022-11-16 06:10:36 UTC

> Do all of the osd prepare job logs that are created for the new OSDs have the same message about the device already being configured for an OSD?

Yes, other 2 osd prepare pods also have same message.

pod/rook-ceph-osd-prepare-c4b3ca14f4584de2d088d590fd0f4345-8cn5z : 
2022-11-10 08:58:59.757457 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-3kjw2h", detected an existing OSD. UUID=dcbd2132-30bf-413d-86a8-ea3e5612265d

pod/rook-ceph-osd-prepare-c8af26daa15306c2084654732615b22c-kzv6f :
2022-11-10 08:59:03.518164 I | cephosd: skipping device "/mnt/ocs-deviceset-localblock-0-data-4zkhcz", detected an existing OSD. UUID=44e28a1b-63e4-4eb8-9ce3-b7f9838b3212

[root@rdr-cicd-odf-0d4b-bastion-0 ~]# oc rsh rook-ceph-tools-8b6fbc449-6k48c
sh-4.4$ 
sh-4.4$ ceph osd ls
0
1
2

sh-4.4$ ceph osd info
osd.0 up   in  weight 1 up_from 9 up_thru 25 down_at 0 last_clean_interval [0,0) [v2:10.129.2.29:6800/405,v1:10.129.2.29:6801/405] [v2:10.129.2.29:6802/405,v1:10.129.2.29:6803/405] exists,up 1b29c065-6906-4b52-b07c-8bbfdf0f4abb
osd.1 up   in  weight 1 up_from 10 up_thru 25 down_at 0 last_clean_interval [0,0) [v2:10.131.0.30:6800/406,v1:10.131.0.30:6801/406] [v2:10.131.0.30:6802/406,v1:10.131.0.30:6803/406] exists,up 24b59909-e757-4f23-990f-43bd7806f7ae
osd.2 up   in  weight 1 up_from 10 up_thru 25 down_at 0 last_clean_interval [0,0) [v2:10.128.2.25:6800/406,v1:10.128.2.25:6801/406] [v2:10.128.2.25:6802/406,v1:10.128.2.25:6803/406] exists,up 71792079-5156-499c-8712-e3a6001b0758

Comment 10 Aaruni Aggarwal 2022-11-16 08:59:32 UTC

output of ceph osd tree :

sh-4.4$ ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-1         1.46489  root default                                
-7         0.48830      host worker-0                           
 1    hdd  0.48830          osd.1          up   1.00000  1.00000
-5         0.48830      host worker-1                           
 2    hdd  0.48830          osd.2          up   1.00000  1.00000
-3         0.48830      host worker-2                           
 0    hdd  0.48830          osd.0          up   1.00000  1.00000

Comment 11 Travis Nielsen 2022-11-16 23:30:37 UTC

I just don't see how these disks could be clean. The disk has a ceph bluestore label [1] and the osd UUID exists for some osd that does not belong to this cluster.

[1] https://github.com/red-hat-storage/rook/blob/release-4.12/pkg/daemon/ceph/osd/daemon.go#L294-L336

Comment 12 Aaruni Aggarwal 2022-11-18 08:38:43 UTC

So, what can be done now, so that osd pods gets created for the 3 new osd-prepare pods?

Comment 13 Travis Nielsen 2022-11-18 18:19:16 UTC

To ensure the disks are clean, please try wiping the disks and running the install again. Here are some steps recommended for cleaning the disks:
https://rook.io/docs/rook/latest/Getting-Started/ceph-teardown/#zapping-devices

Comment 14 Aaruni Aggarwal 2022-11-24 09:16:32 UTC

I tried wiping the disks using the same procedure which you shared, in different cluster having same issue and ran the installation again, but it didn't help.

Comment 15 Parth Arora 2022-12-08 14:03:33 UTC

Can you use 'sgdisk' to zap the disks.

I see it is a similar issue https://bugzilla.redhat.com/show_bug.cgi?id=2123077#c31 targeted here,
Ashish can you give more insights, 
And also if this method can help us we can add as per the document for re-installation

Comment 16 Travis Nielsen 2022-12-13 15:16:29 UTC

Changing previous comment to public.

Aaruni were you able to get the disk cleaned?

Comment 17 Aaruni Aggarwal 2022-12-20 06:26:30 UTC

Yes, I cleaned the disks by following the instructions from here: https://rook.io/docs/rook/latest/Getting-Started/ceph-teardown/#zapping-devices 
I also tried with the new disks in new cluster and same is happening there as well.

Comment 22 Blaine Gardner 2023-01-10 15:41:39 UTC

@aaaggarw It's hard to determine what is happening on a host/disk level that is causing ceph-volume to detect that the disk is already associated with a different ceph cluster. 

First, double or triple check that the (1) correct disks and/or partitions are being cleaned and that (2) they are wiped using the `dd` command from the cleanup doc you linked.

If the disks/partitions are for sure cleaned using `dd`, then we need to see what ceph-volume is doing. Let's do this in 2 ways:

1. Put the rook operator into debug mode by setting `ROOK_LOG_LEVEL: DEBUG` in the rook-ceph-operator-config configmap (you may need to stop the ocs operator for this), and get New provision logs. 
2. We want to run `ceph-volume --log-level debug raw list` on the host and get the output. I think the best way to do that on openshift is to create a k8s job that has access to `/dev` on the host that runs that command.

Attach both of those logs for us, please.

Thanks,
Blaine

Comment 23 Aaruni Aggarwal 2023-01-12 12:45:03 UTC

Sure, Blaine, I will provide the required logs once I get the new cluster. I don't have an older cluster with me now.

Comment 26 Travis Nielsen 2023-02-07 15:09:45 UTC

Please reopen when you get the required logs

Comment 27 Red Hat Bugzilla 2023-12-08 04:31:18 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.