Description of problem (please be detailed as possible and provide log snippests): I'm deploying OCS + OCP in a pipeline and having fun with the deployed OCP aftwards (Lab environment). I'm currently struggling to understand what makes OCS or its operator select an OSD from a node. Version of all relevant components (if applicable): $ oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-cnv kubevirt-hyperconverged-operator.v2.5.2 OpenShift Virtualization 2.5.2 kubevirt-hyperconverged-operator.v2.5.1 Succeeded openshift-local-storage local-storage-operator.4.6.0-202012161211.p0 Local Storage 4.6.0-202012161211.p0 Succeeded openshift-operator-lifecycle-manager packageserver Package Server 0.16.1 Succeeded openshift-storage ocs-operator.v4.6.0 OpenShift Container Storage 4.6.0 Succeeded Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 (neither easy nor complex) Can this issue reproducible? Yes, the 'balancing' of OSDs is messed up every time I select an 'odd' number for counts. Can this issue reproduce from the UI? No idea. If this is a regression, please provide more details to justify this: No idea Steps to Reproduce:
Here's an idea of what's going on. I'm deploying OCP 4.6.8 with 3 masters, 3 workers and 6 infra nodes to use for ceph: $ oc get nodes NAME STATUS ROLES AGE VERSION ocp4d-gcvpn-infra-0-4w4sh Ready infra,worker 47m v1.19.0+7070803 ocp4d-gcvpn-infra-0-gzmvx Ready infra,worker 48m v1.19.0+7070803 ocp4d-gcvpn-infra-0-kzlnn Ready infra,worker 47m v1.19.0+7070803 ocp4d-gcvpn-infra-0-vncgl Ready infra,worker 47m v1.19.0+7070803 ocp4d-gcvpn-infra-0-zc2sm Ready infra,worker 47m v1.19.0+7070803 ocp4d-gcvpn-infra-0-zjfdm Ready infra,worker 47m v1.19.0+7070803 ocp4d-gcvpn-master-0 Ready master 105m v1.19.0+7070803 ocp4d-gcvpn-master-1 Ready master 105m v1.19.0+7070803 ocp4d-gcvpn-master-2 Ready master 105m v1.19.0+7070803 ocp4d-gcvpn-worker-0-64w6n Ready worker 100m v1.19.0+7070803 ocp4d-gcvpn-worker-0-7242l Ready worker 100m v1.19.0+7070803 ocp4d-gcvpn-worker-0-dqrjj Ready worker 100m v1.19.0+7070803 Each of the 'infra' node has a boot disk and 8 x 8Tb virtio-SCSI disks with a unique WWN. The local storage operator YAML is quite simplistic: apiVersion: local.storage.openshift.io/v1 kind: LocalVolume metadata: name: localstorage-ocs-osd namespace: openshift-local-storage labels: app: ocs-storagecluster spec: nodeSelector: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: In values: - "" storageClassDevices: - storageClassName: localstorage-ocs-osd-sc volumeMode: Block devicePaths: - /dev/disk/by-id/wwn-0x5000c50015ea71aa - /dev/disk/by-id/wwn-0x5000c50015ea71ab - /dev/disk/by-id/wwn-0x5000c50015ea71ac - /dev/disk/by-id/wwn-0x5000c50015ea71ad - /dev/disk/by-id/wwn-0x5000c50015ea71ae - /dev/disk/by-id/wwn-0x5000c50015ea71b0 - /dev/disk/by-id/wwn-0x5000c50015ea71b1 - /dev/disk/by-id/wwn-0x5000c50015ea71b2 [....long list of computed WWWNs.....] - /dev/disk/by-id/wwn-0x5000c50015ea721d - /dev/disk/by-id/wwn-0x5000c50015ea721e - /dev/disk/by-id/wwn-0x5000c50015ea7220 For the OCS storage cluster, I have this (and some resource limitations): apiVersion: ocs.openshift.io/v1 kind: StorageCluster metadata: name: ocs-storagecluster namespace: openshift-storage spec: manageNodes: false monDataDirHostPath: /var/lib/rook storageDeviceSets: - count: 5 # <-- modify count to to desired value dataPVCTemplate: spec: accessModes: - ReadWriteOnce resources: requests: storage: 1 storageClassName: localstorage-ocs-osd-sc volumeMode: Block
# https://red-hat-storage.github.io/ocs-training/training/ocs4/ocs4-install-no-ui.html#_create_cluster apiVersion: ocs.openshift.io/v1 kind: StorageCluster metadata: name: ocs-storagecluster namespace: openshift-storage spec: manageNodes: false resources: mds: limits: cpu: 500m memory: "4Gi" requests: cpu: 500m memory: "4Gi" rgw: limits: cpu: 500m memory: "4Gi" requests: cpu: 500m memory: "4Gi" mon: limits: cpu: 500m memory: "2Gi" requests: cpu: 500m memory: "2Gi" osd: limits: cpu: 500m memory: "4Gi" requests: cpu: 500m memory: "4Gi" mgr: limits: cpu: 500m memory: "2Gi" requests: cpu: 500m memory: "2Gi" noobaa-core: limits: cpu: 500m memory: "2Gi" requests: cpu: 500m memory: "2Gi" noobaa-db: limits: cpu: 500m memory: "2Gi" requests: cpu: 500m memory: "2Gi" monDataDirHostPath: /var/lib/rook storageDeviceSets: - count: 5 # <-- modify count to to desired value dataPVCTemplate: spec: accessModes: - ReadWriteOnce resources: requests: storage: 1 storageClassName: localstorage-ocs-osd-sc volumeMode: Block name: ocs-deviceset placement: {} portable: false replica: 3 resources: limits: cpu: 500m memory: "4Gi" requests: cpu: 500m memory: "4Gi" so far so good.. each node has 8x8Tb disks and I'm only requesting 'count: 5' from each rack.
On a fresh deploy and with 'count: 5', here's what I get (6 infra nodes): $ oc rsh rook-ceph-tools-8589699f6c-l9x57 ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 120.00000 root default -4 40.00000 rack rack0 -3 24.00000 host ocp4d-gcvpn-infra-0-4w4sh 0 ssd 8.00000 osd.0 up 1.00000 1.00000 1 ssd 8.00000 osd.1 up 1.00000 1.00000 2 ssd 8.00000 osd.2 up 1.00000 1.00000 -11 16.00000 host ocp4d-gcvpn-infra-0-vncgl 7 ssd 8.00000 osd.7 up 1.00000 1.00000 8 ssd 8.00000 osd.8 up 1.00000 1.00000 -8 40.00000 rack rack1 -7 32.00000 host ocp4d-gcvpn-infra-0-gzmvx 3 ssd 8.00000 osd.3 up 1.00000 1.00000 4 ssd 8.00000 osd.4 up 1.00000 1.00000 5 ssd 8.00000 osd.5 up 1.00000 1.00000 6 ssd 8.00000 osd.6 up 1.00000 1.00000 -13 8.00000 host ocp4d-gcvpn-infra-0-zc2sm 14 ssd 8.00000 osd.14 up 1.00000 1.00000 -16 40.00000 rack rack2 -15 40.00000 host ocp4d-gcvpn-infra-0-zjfdm 9 ssd 8.00000 osd.9 up 1.00000 1.00000 10 ssd 8.00000 osd.10 up 1.00000 1.00000 11 ssd 8.00000 osd.11 up 1.00000 1.00000 12 ssd 8.00000 osd.12 up 1.00000 1.00000 13 ssd 8.00000 osd.13 up 1.00000 1.00000 so each of the racks have 5 OSD's, which is fine but the OSD's aren't even evently balanced within each rack.
Increasing 'count' to 6 (total of 3 x 6 OSDs = 6 for each rack) yield these results (osds 15, 16 and 17) get properly scaled and spread across nodes: $ oc rsh rook-ceph-tools-8589699f6c-l9x57 ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 144.00000 root default -4 48.00000 rack rack0 -3 24.00000 host ocp4d-gcvpn-infra-0-4w4sh 0 ssd 8.00000 osd.0 up 1.00000 1.00000 1 ssd 8.00000 osd.1 up 1.00000 1.00000 2 ssd 8.00000 osd.2 up 1.00000 1.00000 -11 24.00000 host ocp4d-gcvpn-infra-0-vncgl 7 ssd 8.00000 osd.7 up 1.00000 1.00000 8 ssd 8.00000 osd.8 up 1.00000 1.00000 15 ssd 8.00000 osd.15 up 1.00000 1.00000 -8 48.00000 rack rack1 -7 40.00000 host ocp4d-gcvpn-infra-0-gzmvx 3 ssd 8.00000 osd.3 up 1.00000 1.00000 4 ssd 8.00000 osd.4 up 1.00000 1.00000 5 ssd 8.00000 osd.5 up 1.00000 1.00000 6 ssd 8.00000 osd.6 up 1.00000 1.00000 16 ssd 8.00000 osd.16 up 1.00000 1.00000 -13 8.00000 host ocp4d-gcvpn-infra-0-zc2sm 14 ssd 8.00000 osd.14 up 1.00000 1.00000 -16 48.00000 rack rack2 -19 8.00000 host ocp4d-gcvpn-infra-0-kzlnn 17 ssd 8.00000 osd.17 up 1.00000 1.00000 -15 40.00000 host ocp4d-gcvpn-infra-0-zjfdm 9 ssd 8.00000 osd.9 up 1.00000 1.00000 10 ssd 8.00000 osd.10 up 1.00000 1.00000 11 ssd 8.00000 osd.11 up 1.00000 1.00000 12 ssd 8.00000 osd.12 up 1.00000 1.00000 13 ssd 8.00000 osd.13 up 1.00000 1.00000 (the 6th OCS node 'ocp4d-gcvpn-infra-0-kzlnn' showed up with osd 17).
Increasing count to '7' (osd's 18, 19 and 20 will come up) yield these: $ oc rsh rook-ceph-tools-8589699f6c-l9x57 ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 168.00000 root default -4 56.00000 rack rack0 -3 24.00000 host ocp4d-gcvpn-infra-0-4w4sh 0 ssd 8.00000 osd.0 up 1.00000 1.00000 1 ssd 8.00000 osd.1 up 1.00000 1.00000 2 ssd 8.00000 osd.2 up 1.00000 1.00000 -11 32.00000 host ocp4d-gcvpn-infra-0-vncgl 7 ssd 8.00000 osd.7 up 1.00000 1.00000 8 ssd 8.00000 osd.8 up 1.00000 1.00000 15 ssd 8.00000 osd.15 up 1.00000 1.00000 18 ssd 8.00000 osd.18 up 1.00000 1.00000 -8 56.00000 rack rack1 -7 48.00000 host ocp4d-gcvpn-infra-0-gzmvx 3 ssd 8.00000 osd.3 up 1.00000 1.00000 4 ssd 8.00000 osd.4 up 1.00000 1.00000 5 ssd 8.00000 osd.5 up 1.00000 1.00000 6 ssd 8.00000 osd.6 up 1.00000 1.00000 16 ssd 8.00000 osd.16 up 1.00000 1.00000 19 ssd 8.00000 osd.19 up 1.00000 1.00000 -13 8.00000 host ocp4d-gcvpn-infra-0-zc2sm 14 ssd 8.00000 osd.14 up 1.00000 1.00000 -16 56.00000 rack rack2 -19 16.00000 host ocp4d-gcvpn-infra-0-kzlnn 17 ssd 8.00000 osd.17 up 1.00000 1.00000 20 ssd 8.00000 osd.20 up 1.00000 1.00000 -15 40.00000 host ocp4d-gcvpn-infra-0-zjfdm 9 ssd 8.00000 osd.9 up 1.00000 1.00000 10 ssd 8.00000 osd.10 up 1.00000 1.00000 11 ssd 8.00000 osd.11 up 1.00000 1.00000 12 ssd 8.00000 osd.12 up 1.00000 1.00000 13 ssd 8.00000 osd.13 up 1.00000 1.00000 IMHO, osd 19 should have gone to node 'ocp4d-gcvpn-infra-0-zc2sm' (it only has 1 OSD) and not to node 'ocp4d-gcvpn-infra-0-gzmvx' (bringing its total to 6 OSDs). osd 20 went to node 'ocp4d-gcvpn-infra-0-kzlnn' which is fine because it only had 1 OSD prior to the scale-up.
Increasing count to '8' (osd's 21, 22 and 23 will come up) yield these results: $ oc rsh rook-ceph-tools-8589699f6c-l9x57 ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 192.00000 root default -4 64.00000 rack rack0 -3 24.00000 host ocp4d-gcvpn-infra-0-4w4sh 0 ssd 8.00000 osd.0 up 1.00000 1.00000 1 ssd 8.00000 osd.1 up 1.00000 1.00000 2 ssd 8.00000 osd.2 up 1.00000 1.00000 -11 40.00000 host ocp4d-gcvpn-infra-0-vncgl 7 ssd 8.00000 osd.7 up 1.00000 1.00000 8 ssd 8.00000 osd.8 up 1.00000 1.00000 15 ssd 8.00000 osd.15 up 1.00000 1.00000 18 ssd 8.00000 osd.18 up 1.00000 1.00000 21 ssd 8.00000 osd.21 up 1.00000 1.00000 -8 64.00000 rack rack1 -7 56.00000 host ocp4d-gcvpn-infra-0-gzmvx 3 ssd 8.00000 osd.3 up 1.00000 1.00000 4 ssd 8.00000 osd.4 up 1.00000 1.00000 5 ssd 8.00000 osd.5 up 1.00000 1.00000 6 ssd 8.00000 osd.6 up 1.00000 1.00000 16 ssd 8.00000 osd.16 up 1.00000 1.00000 19 ssd 8.00000 osd.19 up 1.00000 1.00000 22 ssd 8.00000 osd.22 up 1.00000 1.00000 -13 8.00000 host ocp4d-gcvpn-infra-0-zc2sm 14 ssd 8.00000 osd.14 up 1.00000 1.00000 -16 64.00000 rack rack2 -19 24.00000 host ocp4d-gcvpn-infra-0-kzlnn 17 ssd 8.00000 osd.17 up 1.00000 1.00000 20 ssd 8.00000 osd.20 up 1.00000 1.00000 23 ssd 8.00000 osd.23 up 1.00000 1.00000 -15 40.00000 host ocp4d-gcvpn-infra-0-zjfdm 9 ssd 8.00000 osd.9 up 1.00000 1.00000 10 ssd 8.00000 osd.10 up 1.00000 1.00000 11 ssd 8.00000 osd.11 up 1.00000 1.00000 12 ssd 8.00000 osd.12 up 1.00000 1.00000 13 ssd 8.00000 osd.13 up 1.00000 1.00000 here, OSD 21 should have gone to ocp4d-gcvpn-infra-0-4w4sh (only 3 OSDs) instead of node 'ocp4d-gcvpn-infra-0-vncgl' (which had 4 OSDs). OSD 22 should have gone to node 'ocp4d-gcvpn-infra-0-zc2sm' (only one OSD) instead of node 'ocp4d-gcvpn-infra-0-gzmvx' (it already had 6 OSDs). OSD 23 went correctly to the least used node (ocp4d-gcvpn-infra-0-kzlnn).
Increasing count to '9' (osd's 24, 25 and 26 will come up) yield these results: $ oc rsh rook-ceph-tools-8589699f6c-l9x57 ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 216.00000 root default -4 72.00000 rack rack0 -3 32.00000 host ocp4d-gcvpn-infra-0-4w4sh 0 ssd 8.00000 osd.0 up 1.00000 1.00000 1 ssd 8.00000 osd.1 up 1.00000 1.00000 2 ssd 8.00000 osd.2 up 1.00000 1.00000 24 ssd 8.00000 osd.24 up 1.00000 1.00000 -11 40.00000 host ocp4d-gcvpn-infra-0-vncgl 7 ssd 8.00000 osd.7 up 1.00000 1.00000 8 ssd 8.00000 osd.8 up 1.00000 1.00000 15 ssd 8.00000 osd.15 up 1.00000 1.00000 18 ssd 8.00000 osd.18 up 1.00000 1.00000 21 ssd 8.00000 osd.21 up 1.00000 1.00000 -8 72.00000 rack rack1 -7 56.00000 host ocp4d-gcvpn-infra-0-gzmvx 3 ssd 8.00000 osd.3 up 1.00000 1.00000 4 ssd 8.00000 osd.4 up 1.00000 1.00000 5 ssd 8.00000 osd.5 up 1.00000 1.00000 6 ssd 8.00000 osd.6 up 1.00000 1.00000 16 ssd 8.00000 osd.16 up 1.00000 1.00000 19 ssd 8.00000 osd.19 up 1.00000 1.00000 22 ssd 8.00000 osd.22 up 1.00000 1.00000 -13 16.00000 host ocp4d-gcvpn-infra-0-zc2sm 14 ssd 8.00000 osd.14 up 1.00000 1.00000 25 ssd 8.00000 osd.25 up 1.00000 1.00000 -16 72.00000 rack rack2 -19 24.00000 host ocp4d-gcvpn-infra-0-kzlnn 17 ssd 8.00000 osd.17 up 1.00000 1.00000 20 ssd 8.00000 osd.20 up 1.00000 1.00000 23 ssd 8.00000 osd.23 up 1.00000 1.00000 -15 48.00000 host ocp4d-gcvpn-infra-0-zjfdm 9 ssd 8.00000 osd.9 up 1.00000 1.00000 10 ssd 8.00000 osd.10 up 1.00000 1.00000 11 ssd 8.00000 osd.11 up 1.00000 1.00000 12 ssd 8.00000 osd.12 up 1.00000 1.00000 13 ssd 8.00000 osd.13 up 1.00000 1.00000 26 ssd 8.00000 osd.26 up 1.00000 1.00000 OSD's 24 and 25 both went to the least used node in the rack but OSD 26 went to host 'ocp4d-gcvpn-infra-0-zjfdm' (already had 5 OSDs) instead of node 'ocp4d-gcvpn-infra-0-kzlnn' (had only 3 OSDs).
this goes on and on until nodes in a rack have no more available disks and at which point scale up operations start filling up the free OSDs within a rack.
This is a known issue and indeed is being resolved by the implementation of TopologySpreadConstraints: https://bugzilla.redhat.com/show_bug.cgi?id=1814681 As such, I think we can safely take this in OCS 4.7 as an additional verification of the feature.
Since this should be addressed via BZ 1814681 which is already acked, providing QA ack assuming that here QE team will check that the topologySpreadConstraints fix works with LSO as well.
https://bugzilla.redhat.com/show_bug.cgi?id=1814681 is ON_QA
I created cluster on vmware with LSO, OCP : 4.7.0 # oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-local-storage local-storage-operator.4.7.0-202102110027.p0 Local Storage 4.7.0-202102110027.p0 Succeeded openshift-operator-lifecycle-manager packageserver Package Server 0.17.0 Succeeded openshift-storage ocs-operator.v4.7.0-278.ci OpenShift Container Storage 4.7.0-278.ci Succeeded 6 worker nodes with 8 disk's each # oc get nodes NAME STATUS ROLES AGE VERSION compute-0 Ready worker 172m v1.20.0+ba45583 compute-1 Ready worker 172m v1.20.0+ba45583 compute-2 Ready worker 172m v1.20.0+ba45583 compute-3 Ready worker 172m v1.20.0+ba45583 compute-4 Ready worker 172m v1.20.0+ba45583 compute-5 Ready worker 172m v1.20.0+ba45583 control-plane-0 Ready master 3h1m v1.20.0+ba45583 control-plane-1 Ready master 3h1m v1.20.0+ba45583 control-plane-2 Ready master 3h1m v1.20.0+ba45583 after Creating the cluster (3 OSD's), OSD tree look like : sh-4.4# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.29306 root default -7 0.09769 host compute-3 2 hdd 0.09769 osd.2 up 1.00000 1.00000 -3 0.09769 host compute-4 0 hdd 0.09769 osd.0 up 1.00000 1.00000 -5 0.09769 host compute-5 1 hdd 0.09769 osd.1 up 1.00000 1.00000 we can see that each OSD is from different host. after adding capacity, the OSD tree look like : sh-4.4# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.58612 root default -13 0.09769 host compute-0 5 hdd 0.09769 osd.5 up 1.00000 1.00000 -11 0.09769 host compute-1 3 hdd 0.09769 osd.3 up 1.00000 1.00000 -9 0.09769 host compute-2 4 hdd 0.09769 osd.4 up 1.00000 1.00000 -7 0.09769 host compute-3 2 hdd 0.09769 osd.2 up 1.00000 1.00000 -3 0.09769 host compute-4 0 hdd 0.09769 osd.0 up 1.00000 1.00000 -5 0.09769 host compute-5 1 hdd 0.09769 osd.1 up 1.00000 1.00000 we can see that the OSD are spread evenly on all worker (one OSD on each worker) after adding capacity again, the OSD tree look like : sh-4.4# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.87918 root default -13 0.19537 host compute-0 5 hdd 0.09769 osd.5 up 1.00000 1.00000 7 hdd 0.09769 osd.7 up 1.00000 1.00000 -11 0.09769 host compute-1 3 hdd 0.09769 osd.3 up 1.00000 1.00000 -9 0.19537 host compute-2 4 hdd 0.09769 osd.4 up 1.00000 1.00000 6 hdd 0.09769 osd.6 up 1.00000 1.00000 -7 0.09769 host compute-3 2 hdd 0.09769 osd.2 up 1.00000 1.00000 -3 0.09769 host compute-4 0 hdd 0.09769 osd.0 up 1.00000 1.00000 -5 0.19537 host compute-5 1 hdd 0.09769 osd.1 up 1.00000 1.00000 8 hdd 0.09769 osd.8 up 1.00000 1.00000 after adding capacity again, the OSD tree look like : sh-4.4# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.17224 root default -13 0.19537 host compute-0 5 hdd 0.09769 osd.5 up 1.00000 1.00000 7 hdd 0.09769 osd.7 up 1.00000 1.00000 -11 0.19537 host compute-1 3 hdd 0.09769 osd.3 up 1.00000 1.00000 9 hdd 0.09769 osd.9 up 1.00000 1.00000 -9 0.19537 host compute-2 4 hdd 0.09769 osd.4 up 1.00000 1.00000 6 hdd 0.09769 osd.6 up 1.00000 1.00000 -7 0.19537 host compute-3 2 hdd 0.09769 osd.2 up 1.00000 1.00000 11 hdd 0.09769 osd.11 up 1.00000 1.00000 -3 0.19537 host compute-4 0 hdd 0.09769 osd.0 up 1.00000 1.00000 10 hdd 0.09769 osd.10 up 1.00000 1.00000 -5 0.19537 host compute-5 1 hdd 0.09769 osd.1 up 1.00000 1.00000 8 hdd 0.09769 osd.8 up 1.00000 1.00000 and again, we see that the OSD's are spread evenly on all worker after adding capacity again, the OSD tree look like : sh-4.4# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.46530 root default -13 0.19537 host compute-0 5 hdd 0.09769 osd.5 up 1.00000 1.00000 7 hdd 0.09769 osd.7 up 1.00000 1.00000 -11 0.19537 host compute-1 3 hdd 0.09769 osd.3 up 1.00000 1.00000 9 hdd 0.09769 osd.9 up 1.00000 1.00000 -9 0.29306 host compute-2 4 hdd 0.09769 osd.4 up 1.00000 1.00000 6 hdd 0.09769 osd.6 up 1.00000 1.00000 12 hdd 0.09769 osd.12 up 1.00000 1.00000 -7 0.19537 host compute-3 2 hdd 0.09769 osd.2 up 1.00000 1.00000 11 hdd 0.09769 osd.11 up 1.00000 1.00000 -3 0.29306 host compute-4 0 hdd 0.09769 osd.0 up 1.00000 1.00000 10 hdd 0.09769 osd.10 up 1.00000 1.00000 14 hdd 0.09769 osd.14 up 1.00000 1.00000 -5 0.29306 host compute-5 1 hdd 0.09769 osd.1 up 1.00000 1.00000 8 hdd 0.09769 osd.8 up 1.00000 1.00000 13 hdd 0.09769 osd.13 up 1.00000 1.00000 after adding capacity again, the OSD tree look like : sh-4.4# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.75836 root default -13 0.29306 host compute-0 5 hdd 0.09769 osd.5 up 1.00000 1.00000 7 hdd 0.09769 osd.7 up 1.00000 1.00000 15 hdd 0.09769 osd.15 up 1.00000 1.00000 -11 0.29306 host compute-1 3 hdd 0.09769 osd.3 up 1.00000 1.00000 9 hdd 0.09769 osd.9 up 1.00000 1.00000 17 hdd 0.09769 osd.17 up 1.00000 1.00000 -9 0.29306 host compute-2 4 hdd 0.09769 osd.4 up 1.00000 1.00000 6 hdd 0.09769 osd.6 up 1.00000 1.00000 12 hdd 0.09769 osd.12 up 1.00000 1.00000 -7 0.29306 host compute-3 2 hdd 0.09769 osd.2 up 1.00000 1.00000 11 hdd 0.09769 osd.11 up 1.00000 1.00000 16 hdd 0.09769 osd.16 up 1.00000 1.00000 -3 0.29306 host compute-4 0 hdd 0.09769 osd.0 up 1.00000 1.00000 sh-4.4# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.75836 root default -13 0.29306 host compute-0 5 hdd 0.09769 osd.5 up 1.00000 1.00000 7 hdd 0.09769 osd.7 up 1.00000 1.00000 15 hdd 0.09769 osd.15 up 1.00000 1.00000 -11 0.29306 host compute-1 3 hdd 0.09769 osd.3 up 1.00000 1.00000 9 hdd 0.09769 osd.9 up 1.00000 1.00000 17 hdd 0.09769 osd.17 up 1.00000 1.00000 -9 0.29306 host compute-2 4 hdd 0.09769 osd.4 up 1.00000 1.00000 6 hdd 0.09769 osd.6 up 1.00000 1.00000 12 hdd 0.09769 osd.12 up 1.00000 1.00000 -7 0.29306 host compute-3 2 hdd 0.09769 osd.2 up 1.00000 1.00000 11 hdd 0.09769 osd.11 up 1.00000 1.00000 16 hdd 0.09769 osd.16 up 1.00000 1.00000 -3 0.29306 host compute-4 0 hdd 0.09769 osd.0 up 1.00000 1.00000 10 hdd 0.09769 osd.10 up 1.00000 1.00000 14 hdd 0.09769 osd.14 up 1.00000 1.00000 -5 0.29306 host compute-5 1 hdd 0.09769 osd.1 up 1.00000 1.00000 8 hdd 0.09769 osd.8 up 1.00000 1.00000 13 hdd 0.09769 osd.13 up 1.00000 1.00000 10 hdd 0.09769 osd.10 up 1.00000 1.00000 14 hdd 0.09769 osd.14 up 1.00000 1.00000 -5 0.29306 host compute-5 1 hdd 0.09769 osd.1 up 1.00000 1.00000 8 hdd 0.09769 osd.8 up 1.00000 1.00000 13 hdd 0.09769 osd.13 up 1.00000 1.00000 so, IMO this can be verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041