Description of problem: Panel "Data Resiliency" in OCP web interface shows "Rebuilding data resiliency 99% " for very long time. Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.16 True False 14d Cluster version is 4.7.16 [elvir@makina datapresent]$ oc get storagecluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 14d Ready 2021-07-06T08:56:17Z 4.7.0 How reproducible: always 2 of 2 tries . Steps to Reproduce: 1. Write 50 TBs of data to ceph storage backend ( available storage is 100 TB so storage is not full ). We believe this is visible with smaller data set too. 2. Replace one of the nodes in the cluster (Delete old node, create a new node) - via (ibmcloud ks worker replace < worker id > ) from command line 3. Monitor Data Resiliency in console - has be stuck on 99% for over 3 hours now - please check graph . Actual results: Data Resiliency in console - has be stuck on 99% for over 3 hours now - please check graph Expected results: Data Resiliency to finish faster Additional info: ODF/OCP cluster was OK - possible to use it. It is unclear why web console was reporting that status for long time even ODF cluster was up and HEALTHY for hours prior and after test. ceph cluster was HEALTHY when OCP web console was reporting above status. $ ceph -s cluster: id: 5506601c-7254-498c-aeea-d9331b4be16e health: HEALTH_OK services: mon: 3 daemons, quorum a,b,d (age 23h) mgr: a(active, since 21h) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay osd: 24 osds: 24 up (since 16h), 24 in (since 16h) rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a) data: pools: 10 pools, 656 pgs objects: 12.59M objects, 48 TiB usage: 144 TiB used, 231 TiB / 375 TiB avail pgs: 655 active+clean 1 active+clean+scrubbing+deep+repair io: client: 938 B/s rd, 12 KiB/s wr, 2 op/s rd, 1 op/s wr
What is the impact of this taking so long? e.g if other nodes went down during this period do we risk any data/availability loss?
Form UI perspective, the message looks good. Your Pgs are: pgs: 655 active+clean 1 active+clean+scrubbing+deep+repair Resiliency = active+clean/total = 655/656 We should move this BZ to operator/ceph. They can invenstigate it better.
Since one PG is still not active+clean, the recovery is still in progress: pgs: 655 active+clean 1 active+clean+scrubbing+deep+repair Can you collect a must-gather on the cluster? Ceph would need more details to troubleshoot.
(In reply to Travis Nielsen from comment #6) > Since one PG is still not active+clean, the recovery is still in progress: > > pgs: 655 active+clean > 1 active+clean+scrubbing+deep+repair You should able to run I/O against a PG, which has a state active+*something*. In this case, the PG is in active+clean+scrubbing+deep+repair, which means one of the following: 1. during a deep-scrub there were issues that were found which needed a repair (hence "+repair") 2. we are seeing this cosmetic issue, tracked in https://tracker.ceph.com/issues/50446, which puts a PG in active+clean+scrubbing+deep+repair state instead of active+clean+scrubbing+deep, when a deep-scrub is run on a PG You can set nodeep-scrub in the cluster during the tests to rule either of the above. > > Can you collect a must-gather on the cluster? Ceph would need more details > to troubleshoot.
Thank you Travis and Neha! setting # ceph osd noscrup nodeep-scrub helps it is shown faster "Rebuilding data resiliency" achieve 100%. After additional tests we figured out the following: 1. node is replaced, eg command : ibmcloud ks worker replace --worker kube-c3hidk8d0cktobbskgr0-dmodff1-default-00000f73 --cluster dm-odff1 2. OSDs will be down cluster: id: 5506601c-7254-498c-aeea-d9331b4be16e health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set 3 osds down 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set 3 hosts (3 osds) down Degraded data redundancy: 5126099/40375602 objects degraded (12.696%), 275 pgs degraded, 262 pgs undersized services: mon: 3 daemons, quorum a,b,d (age 5h) mgr: a(active, since 10m) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay osd: 24 osds: 21 up (since 6m), 24 in (since 14h); 9 remapped pgs flags noscrub,nodeep-scrub rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a) data: pools: 10 pools, 656 pgs objects: 13.46M objects, 51 TiB usage: 134 TiB used, 194 TiB / 328 TiB avail pgs: 5126099/40375602 objects degraded (12.696%) 364 active+clean 237 active+undersized+degraded 28 active+recovery_wait+degraded 16 active+undersized 8 active+recovery_wait+undersized+degraded+remapped 1 active+recovering+degraded 1 active+recovering+undersized+degraded+remapped 1 active+recovery_wait io: client: 29 KiB/s rd, 9.2 MiB/s wr, 5 op/s rd, 13 op/s wr recovery: 47 MiB/s, 11 objects/s 3. after some time , now will start and OSDs will up up cluster: id: 5506601c-7254-498c-aeea-d9331b4be16e health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set Degraded data redundancy: 35856/40380021 objects degraded (0.089%), 203 pgs degraded services: mon: 3 daemons, quorum a,b,d (age 5h) mgr: a(active, since 23m) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay osd: 24 osds: 24 up (since 9s), 24 in (since 15h); 197 remapped pgs flags noscrub,nodeep-scrub rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a) data: pools: 10 pools, 656 pgs objects: 13.46M objects, 51 TiB usage: 154 TiB used, 221 TiB / 375 TiB avail pgs: 35856/40380021 objects degraded (0.089%) 450 active+clean 197 active+recovery_wait+undersized+degraded+remapped 5 active+recovery_wait+degraded 2 active+recovery_wait 1 active+recovering 1 active+recovering+degraded io: client: 6.7 MiB/s wr, 0 op/s rd, 7 op/s wr recovery: 134 B/s, 10 objects/s Wed-2021-07-28-10-52-06 ceph -s status 4. it will take time till all these "objects degraded" are fixed, with noscrub, nodeep-scrup this is faster. We found that after test with node replace, OSDs will not spread equally across nodes $ oc get pods -n openshift-storage -o wide |grep osd rook-ceph-osd-0-fd95fb959-9dqwx 2/2 Running 1 5d21h 172.17.99.137 10.240.64.10 <none> <none> rook-ceph-osd-1-d45fbd654-tt4dk 2/2 Running 1 5d21h 172.17.99.140 10.240.64.10 <none> <none> rook-ceph-osd-12-5d9cbdc748-8f846 2/2 Running 1 5d22h 172.17.99.138 10.240.64.10 <none> <none> rook-ceph-osd-8-78c8d778f9-tcjvp 2/2 Running 1 5d21h 172.17.99.139 10.240.64.10 <none> <none> rook-ceph-osd-10-78647675cf-cwh4h 2/2 Running 0 5d19h 172.17.89.109 10.240.128.12 <none> <none> rook-ceph-osd-11-7f88545866-72s2x 2/2 Running 0 5d21h 172.17.89.74 10.240.128.12 <none> <none> rook-ceph-osd-19-5675f5d695-d7g4m 2/2 Running 0 5d21h 172.17.89.73 10.240.128.12 <none> <none> rook-ceph-osd-3-5656f8598f-jph8k 2/2 Running 0 15h 172.17.89.76 10.240.128.12 <none> <none> rook-ceph-osd-4-547ff5697f-bc2xm 2/2 Running 0 5d21h 172.17.89.75 10.240.128.12 <none> <none> rook-ceph-osd-9-5fbb9c9d9c-lkwgs 2/2 Running 0 15h 172.17.89.111 10.240.128.12 <none> <none> rook-ceph-osd-13-7574d8f5c-qjfj9 2/2 Running 3 5d19h 172.17.104.9 10.240.0.19 <none> <none> rook-ceph-osd-14-7d6858967f-x9jsq 2/2 Running 0 5d19h 172.17.104.12 10.240.0.19 <none> <none> rook-ceph-osd-21-6474d6c979-xc6p2 2/2 Running 1 5d19h 172.17.104.11 10.240.0.19 <none> <none> rook-ceph-osd-22-57b89578f6-gdvz4 2/2 Running 3 5d19h 172.17.104.10 10.240.0.19 <none> <none> rook-ceph-osd-15-76fbdc74f4-62qqc 2/2 Running 0 5d21h 172.17.88.144 10.240.64.9 <none> <none> rook-ceph-osd-17-7998f86fc6-rsmmb 2/2 Running 1 5d21h 172.17.88.142 10.240.64.9 <none> <none> rook-ceph-osd-6-6647966dd6-d6bcz 2/2 Running 2 5d21h 172.17.88.140 10.240.64.9 <none> <none> rook-ceph-osd-7-7b758d7bb9-x9xgn 2/2 Running 0 5d21h 172.17.88.143 10.240.64.9 <none> <none> rook-ceph-osd-16-6f5c8f7548-5ssps 2/2 Running 3 6d22h 172.17.88.79 10.240.0.17 <none> <none> rook-ceph-osd-18-7b87f6c669-24mlb 2/2 Running 3 6d22h 172.17.88.78 10.240.0.17 <none> <none> rook-ceph-osd-20-6957574bc9-79mdm 2/2 Running 1 6d22h 172.17.88.80 10.240.0.17 <none> <none> rook-ceph-osd-23-85f6c89956-kvsh4 2/2 Running 2 5d19h 172.17.88.94 10.240.0.17 <none> <none> rook-ceph-osd-2-8fcdd546f-4qqjn 2/2 Running 0 15h 172.17.113.137 10.240.128.14 <none> <none> rook-ceph-osd-5-756f9cd4ff-nvctd 2/2 Running 0 15h 172.17.113.136 10.240.128.14 <none> <none> You can see that node "10.240.128.12" got 2 more OSDs ( if we delete these newest ones on 10.240.128.12 , they will move to 10.240.128.14. In second attempt output was rook-ceph-osd-0-fd95fb959-28hdw 2/2 Running 0 85m 172.17.88.155 10.240.64.9 <none> <none> rook-ceph-osd-15-76fbdc74f4-62qqc 2/2 Running 0 6d1h 172.17.88.144 10.240.64.9 <none> <none> rook-ceph-osd-17-7998f86fc6-rsmmb 2/2 Running 1 6d1h 172.17.88.142 10.240.64.9 <none> <none> rook-ceph-osd-6-6647966dd6-d6bcz 2/2 Running 3 6d1h 172.17.88.140 10.240.64.9 <none> <none> rook-ceph-osd-7-7b758d7bb9-x9xgn 2/2 Running 1 6d1h 172.17.88.143 10.240.64.9 <none> <none> rook-ceph-osd-1-d45fbd654-ll4qx 2/2 Running 0 85m 172.17.103.73 10.240.64.11 <none> <none> rook-ceph-osd-12-5d9cbdc748-lmzhw 2/2 Running 0 85m 172.17.103.74 10.240.64.11 <none> <none> rook-ceph-osd-8-78c8d778f9-xs97w 2/2 Running 1 85m 172.17.103.72 10.240.64.11 <none> <none> rook-ceph-osd-10-78647675cf-cwh4h 2/2 Running 0 5d23h 172.17.89.109 10.240.128.12 <none> <none> rook-ceph-osd-11-7f88545866-72s2x 2/2 Running 0 6d 172.17.89.74 10.240.128.12 <none> <none> rook-ceph-osd-19-5675f5d695-d7g4m 2/2 Running 0 6d 172.17.89.73 10.240.128.12 <none> <none> rook-ceph-osd-4-547ff5697f-bc2xm 2/2 Running 0 6d 172.17.89.75 10.240.128.12 <none> <none> rook-ceph-osd-13-7574d8f5c-qjfj9 2/2 Running 3 5d23h 172.17.104.9 10.240.0.19 <none> <none> rook-ceph-osd-14-7d6858967f-x9jsq 2/2 Running 0 5d23h 172.17.104.12 10.240.0.19 <none> <none> rook-ceph-osd-21-6474d6c979-xc6p2 2/2 Running 1 5d23h 172.17.104.11 10.240.0.19 <none> <none> rook-ceph-osd-22-57b89578f6-gdvz4 2/2 Running 3 5d23h 172.17.104.10 10.240.0.19 <none> <none> rook-ceph-osd-16-6f5c8f7548-5ssps 2/2 Running 4 7d1h 172.17.88.79 10.240.0.17 <none> <none> rook-ceph-osd-18-7b87f6c669-24mlb 2/2 Running 4 7d1h 172.17.88.78 10.240.0.17 <none> <none> rook-ceph-osd-20-6957574bc9-79mdm 2/2 Running 1 7d1h 172.17.88.80 10.240.0.17 <none> <none> rook-ceph-osd-23-85f6c89956-kvsh4 2/2 Running 2 5d23h 172.17.88.94 10.240.0.17 <none> <none> rook-ceph-osd-2-8fcdd546f-4qqjn 2/2 Running 0 18h 172.17.113.137 10.240.128.14 <none> <none> rook-ceph-osd-3-5656f8598f-zmx9k 2/2 Running 0 95m 172.17.113.145 10.240.128.14 <none> <none> rook-ceph-osd-5-756f9cd4ff-nvctd 2/2 Running 0 18h 172.17.113.136 10.240.128.14 <none> <none> rook-ceph-osd-9-5fbb9c9d9c-r8rs9 2/2 Running 0 95m 172.17.113.143 10.240.128.14 <none> <none> Is it real to expect that after replacing node ( ibmcloud ks worker replace --worker kube-c3hidk8d0cktobbskgr0-dmodff1-default-00000f73 --cluster dm-odff1 ) new nodes get same number of OSDs and that OSDs are equally balanced across nodes?
Not a 4.8 blocker
A few questions: 1. Please show the output of "ceph osd tree" from the toolbox to show if there is any other hierarchy than the nodes 2. You followed the docs to purge the old OSDs from Ceph, right? For example, ceph osd tree should not still show the osds from the bad node 3. Please share the CephCluster CR. This will show the topology spread constraints and other settings used to create the new OSDs. This might be related to an OCS bug that was fixed recently for spreading OSDs evenly across racks, though I'll have to look for that BZ another day...
`. see [1] 2. only "ibmcloud ks worker replace < worker id >" is executed from machine ( laptop ) which rights to run commands against cluster on IBM cloud, then it does in background where it replaces machine in OCP cluster. 3. [2] [1] # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 375.00000 root default -5 375.00000 region us-south -30 125.00000 zone us-south-1 -55 15.62500 host ocs-deviceset-0-data-0mfz2p 23 hdd 15.62500 osd.23 up 1.00000 1.00000 -57 15.62500 host ocs-deviceset-0-data-4tb4cl 22 hdd 15.62500 osd.22 up 1.00000 1.00000 -29 15.62500 host ocs-deviceset-0-data-6mbqx6 13 hdd 15.62500 osd.13 up 1.00000 1.00000 -47 15.62500 host ocs-deviceset-1-data-39j6rg 18 hdd 15.62500 osd.18 up 1.00000 1.00000 -41 15.62500 host ocs-deviceset-1-data-4265dj 16 hdd 15.62500 osd.16 up 1.00000 1.00000 -45 15.62500 host ocs-deviceset-2-data-1ljbq8 21 hdd 15.62500 osd.21 up 1.00000 1.00000 -37 15.62500 host ocs-deviceset-2-data-2gnr96 14 hdd 15.62500 osd.14 up 1.00000 1.00000 -49 15.62500 host ocs-deviceset-2-data-7gbks9 20 hdd 15.62500 osd.20 up 1.00000 1.00000 -4 125.00000 zone us-south-2 -3 15.62500 host ocs-deviceset-0-data-26kq8b 0 hdd 15.62500 osd.0 up 1.00000 1.00000 -9 15.62500 host ocs-deviceset-0-data-5bxw5z 1 hdd 15.62500 osd.1 up 1.00000 1.00000 -39 15.62500 host ocs-deviceset-1-data-0dkqxr 15 hdd 15.62500 osd.15 up 1.00000 1.00000 -51 15.62500 host ocs-deviceset-1-data-2hwzcm 8 hdd 15.62500 osd.8 up 1.00000 1.00000 -27 15.62500 host ocs-deviceset-1-data-65q279 7 hdd 15.62500 osd.7 up 1.00000 1.00000 -17 15.62500 host ocs-deviceset-2-data-0vqtjj 6 hdd 15.62500 osd.6 up 1.00000 1.00000 -35 15.62500 host ocs-deviceset-2-data-4d2sp6 12 hdd 15.62500 osd.12 up 1.00000 1.00000 -53 15.62500 host ocs-deviceset-2-data-6x467v 17 hdd 15.62500 osd.17 up 1.00000 1.00000 -12 125.00000 zone us-south-3 -43 15.62500 host ocs-deviceset-0-data-1bxjfw 19 hdd 15.62500 osd.19 up 1.00000 1.00000 -19 15.62500 host ocs-deviceset-0-data-3gsspv 2 hdd 15.62500 osd.2 up 1.00000 1.00000 -33 15.62500 host ocs-deviceset-0-data-7cljwc 11 hdd 15.62500 osd.11 up 1.00000 1.00000 -25 15.62500 host ocs-deviceset-1-data-12dvjw 3 hdd 15.62500 osd.3 up 1.00000 1.00000 -23 15.62500 host ocs-deviceset-1-data-575th8 9 hdd 15.62500 osd.9 up 1.00000 1.00000 -11 15.62500 host ocs-deviceset-1-data-7jvrkg 4 hdd 15.62500 osd.4 up 1.00000 1.00000 -21 15.62500 host ocs-deviceset-2-data-3ddvnb 10 hdd 15.62500 osd.10 up 1.00000 1.00000 -15 15.62500 host ocs-deviceset-2-data-59qvrt 5 hdd 15.62500 osd.5 up 1.00000 1.0000 [2] $ oc get storagecluster -n openshift-storage -o yaml apiVersion: v1 items: - apiVersion: ocs.openshift.io/v1 kind: StorageCluster metadata: annotations: uninstall.ocs.openshift.io/cleanup-policy: delete uninstall.ocs.openshift.io/mode: graceful creationTimestamp: "2021-07-06T08:56:17Z" finalizers: - storagecluster.ocs.openshift.io generation: 3 managedFields: - apiVersion: ocs.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:encryption: {} f:externalStorage: {} f:monPVCTemplate: .: {} f:metadata: {} f:spec: .: {} f:accessModes: {} f:resources: .: {} f:requests: {} f:storageClassName: {} f:volumeMode: {} f:status: {} manager: manager operation: Update time: "2021-07-06T08:56:17Z" - apiVersion: ocs.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:monPVCTemplate: f:spec: f:resources: f:requests: f:storage: {} manager: oc operation: Update time: "2021-07-06T09:32:58Z" - apiVersion: ocs.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:uninstall.ocs.openshift.io/cleanup-policy: {} f:uninstall.ocs.openshift.io/mode: {} f:finalizers: {} f:spec: f:arbiter: {} f:encryption: f:kms: {} f:managedResources: .: {} f:cephBlockPools: {} f:cephConfig: {} f:cephFilesystems: {} f:cephObjectStoreUsers: {} f:cephObjectStores: {} f:storageDeviceSets: {} f:version: {} f:status: .: {} f:conditions: {} f:failureDomain: {} f:failureDomainKey: {} f:failureDomainValues: {} f:images: .: {} f:ceph: .: {} f:actualImage: {} f:desiredImage: {} f:noobaaCore: .: {} f:actualImage: {} f:desiredImage: {} f:noobaaDB: .: {} f:actualImage: {} f:desiredImage: {} f:nodeTopologies: .: {} f:labels: .: {} f:kubernetes.io/hostname: {} f:topology.kubernetes.io/region: {} f:topology.kubernetes.io/zone: {} f:phase: {} f:relatedObjects: {} manager: ocs-operator operation: Update time: "2021-07-06T09:59:57Z" name: ocs-storagecluster namespace: openshift-storage resourceVersion: "21454329" selfLink: /apis/ocs.openshift.io/v1/namespaces/openshift-storage/storageclusters/ocs-storagecluster uid: 143ed9e8-cd7a-4242-9108-c95bdda70106 spec: arbiter: {} encryption: kms: {} externalStorage: {} managedResources: cephBlockPools: {} cephConfig: {} cephFilesystems: {} cephObjectStoreUsers: {} cephObjectStores: {} monPVCTemplate: metadata: {} spec: accessModes: - ReadWriteOnce resources: requests: storage: 25Gi storageClassName: ibmc-vpc-block-metro-10iops-tier volumeMode: Filesystem status: {} storageDeviceSets: - config: {} count: 8 dataPVCTemplate: metadata: {} spec: accessModes: - ReadWriteOnce resources: requests: storage: 16000Gi storageClassName: ibmc-vpc-block-metro-3iops-tier volumeMode: Block status: {} name: ocs-deviceset placement: {} portable: true preparePlacement: {} replica: 3 resources: {} version: 4.7.0 status: conditions: - lastHeartbeatTime: "2021-08-04T19:11:24Z" lastTransitionTime: "2021-08-04T16:11:30Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: ReconcileComplete - lastHeartbeatTime: "2021-08-04T19:11:24Z" lastTransitionTime: "2021-07-30T20:04:18Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: Available - lastHeartbeatTime: "2021-08-04T19:11:24Z" lastTransitionTime: "2021-07-30T20:04:18Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "False" type: Progressing - lastHeartbeatTime: "2021-08-04T19:11:24Z" lastTransitionTime: "2021-07-30T20:04:18Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "False" type: Degraded - lastHeartbeatTime: "2021-08-04T19:11:24Z" lastTransitionTime: "2021-07-30T20:04:18Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: Upgradeable failureDomain: zone failureDomainKey: topology.kubernetes.io/zone failureDomainValues: - us-south-3 - us-south-2 - us-south-1 images: ceph: actualImage: registry.redhat.io/rhceph/rhceph-4-rhel8@sha256:725f93133acc0fb1ca845bd12e77f20d8629cad0e22d46457b2736578698eb6c desiredImage: registry.redhat.io/rhceph/rhceph-4-rhel8@sha256:725f93133acc0fb1ca845bd12e77f20d8629cad0e22d46457b2736578698eb6c noobaaCore: actualImage: registry.redhat.io/ocs4/mcg-core-rhel8@sha256:6ff8645efdde95fa97d496084d3555b7680895f0b79c147f2a880b43742af3a4 desiredImage: registry.redhat.io/ocs4/mcg-core-rhel8@sha256:6ff8645efdde95fa97d496084d3555b7680895f0b79c147f2a880b43742af3a4 noobaaDB: actualImage: registry.redhat.io/rhel8/postgresql-12@sha256:f486bbe07f1ddef166bab5a2a6bdcd0e63e6e14d15b42d2425762f83627747bf desiredImage: registry.redhat.io/rhel8/postgresql-12@sha256:f486bbe07f1ddef166bab5a2a6bdcd0e63e6e14d15b42d2425762f83627747bf nodeTopologies: labels: kubernetes.io/hostname: - 10.240.128.7 - 10.240.64.4 - 10.240.64.5 - 10.240.0.6 - 10.240.0.7 - 10.240.128.6 - 10.240.0.12 - 10.240.0.13 - 10.240.128.10 - 10.240.0.14 - 10.240.128.11 - 10.240.0.15 - 10.240.0.17 - 10.240.64.10 - 10.240.64.9 - 10.240.128.12 - 10.240.128.13 - 10.240.0.19 - 10.240.128.14 - 10.240.64.11 - 10.240.64.12 topology.kubernetes.io/region: - us-south topology.kubernetes.io/zone: - us-south-3 - us-south-2 - us-south-1 phase: Ready relatedObjects: - apiVersion: ceph.rook.io/v1 kind: CephCluster name: ocs-storagecluster-cephcluster namespace: openshift-storage resourceVersion: "21453982" uid: 83035032-0845-40f7-8e40-a3caa0de138d - apiVersion: noobaa.io/v1alpha1 kind: NooBaa name: noobaa namespace: openshift-storage resourceVersion: "21454327" uid: bc2a2fce-e507-41f4-8543-233169a292b0 kind: List metadata: resourceVersion: "" selfLink: ""
Could you also share the CephCluster CR in addition to the StorageCluster CR? The CephCluster would have the topology spread constraints to show exactly how the OSDs are expected to be spread across hosts. Or if the TSCs are only specified for zones, then this explains why the hosts are not evenly spread. At least the OSDs are spread evenly across zones as expected.
(In reply to Travis Nielsen from comment #12) > Could you also share the CephCluster CR in addition to the StorageCluster this is roks installation - do you mind to write commands you want me to run on this cluster. I do not have CephCluster CR as cluster is installed via addon > CR? The CephCluster would have the topology spread constraints to show > exactly how the OSDs are expected to be spread across hosts. Or if the TSCs > are only specified for zones, then this explains why the hosts are not > evenly spread. At least the OSDs are spread evenly across zones as expected. Also, from #11 we see below [1] , cluster originally has 6 nodes, but in storagecluster configuration it preserves old nodes once --replace is issued and new node is created. Per [1] one can conclude that we have many nodes ( but there are actually only 6 ) and all other were at some point in time cluster member but not any more. --- [1] kubernetes.io/hostname: - 10.240.128.7 - 10.240.64.4 - 10.240.64.5 - 10.240.0.6 - 10.240.0.7 - 10.240.128.6 - 10.240.0.12 - 10.240.0.13 - 10.240.128.10 - 10.240.0.14 - 10.240.128.11 - 10.240.0.15 - 10.240.0.17 - 10.240.64.10 - 10.240.64.9 - 10.240.128.12 - 10.240.128.13 - 10.240.0.19 - 10.240.128.14 - 10.240.64.11 - 10.240.64.12 ---
The CephCluster CR can be retrieved with this: oc -n openshift-storage get cephcluster -o yaml
Please reopen once we have all the details.