Description of problem (please be detailed as possible and provide log snippests): Following up on https://bugzilla.redhat.com/show_bug.cgi?id=1790500#c47, I'm trying to do performance tests on a 9-node OCS setup, with 27 OSDs, on an AWS IPI OCP setup. However, I end up with a situation where some of the OCS nodes have 6 OSDs and some others have none. OCS is installed and managed using the UI, sticking to our official documentation for initial cluster creation, and subsequent scaling to 9 nodes and 27 OSDs. Version of all relevant components (if applicable): # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.1 True False 46h Cluster version is 4.3.1 # oc get csv -n openshift-storage ocs-operator.v4.3.0-376.ci OpenShift Container Storage 4.3.0-376.ci Steps to Reproduce: * create AWS IPI OCP cluster * oc apply -f deploy-with-olm.yaml (with image: quay.io/rhceph-dev/ocs-olm-operator:4.3.0-rc1) * from here use the UI * create initial OCS cluster following OCS 4.2 GA documentation, to get 3 OCS nodes with 1 OSD each of 2TiB gp2. * add 4TiB storage capacity so that we have 3 OCS nodes, each with 3 OSDs of 2TiB gp2. * scale the setup to 9 nodes using the UI according to steps in the documentation (labelling of nodes done via CLI) * add 12 TiB storage capacity (to get total 18TiB) and wait for cluster to become ready Actual results: * there are now 27 OSDs, as expected. But they are not equally distributed across the OCS nodes. In my latest attempt, 5 OCS nodes have 3 OSDs each; 2 OCS nodes have 6 OSDs each, and 2 OCS nodes have 0 OSDs. Expected results: At least according to OCS 4.2 guidelines, this is an unsupported configuration. Following our documented steps for cluster scaling should not lead to that. Additional info: stats collected during the performance test shows output like this on 2 OCS nodes, showing 6 OSDs receiving IO: 03/26/20 10:06:54 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 0.00 1.40 0.00 10.60 15.14 0.00 0.50 0.00 0.50 1.00 0.14 dm-0 0.00 0.00 0.00 1.40 0.00 10.60 15.14 0.00 0.43 0.00 0.43 1.00 0.14 loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 loop1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 loop2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 loop3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 loop4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme2n1 0.00 3.20 2121.10 2.20 8484.40 22.00 8.01 0.01 0.39 0.39 0.64 0.44 92.41 nvme3n1 0.00 0.00 2391.40 0.00 9566.00 0.00 8.00 0.07 0.39 0.39 0.00 0.39 93.08 nvme4n1 0.00 1.30 3001.40 0.70 12005.60 10.80 8.01 0.63 0.58 0.58 0.57 0.32 95.96 nvme5n1 0.00 0.40 3126.10 0.20 12504.40 4.00 8.00 0.01 0.39 0.39 0.50 0.31 96.60 nvme6n1 0.00 0.00 1900.40 0.00 7601.60 0.00 8.00 0.01 0.36 0.36 0.00 0.47 89.07 nvme7n1 0.00 0.00 2257.70 0.00 9030.80 0.00 8.00 0.01 0.35 0.35 0.00 0.41 92.74 ceph--d9a3f7f4--1ba3--496c--ae03--6c890550b28f-osd--block--e3fe9b48--fd74--47c4--bda1--85cef9ae1068 0.00 0.00 3126.40 0.60 12505.60 4.00 8.00 1.20 0.38 0.38 1.00 0.31 96.59 ceph--02cd8c67--07c0--4381--84fd--75668fb2d240-osd--block--97a6a773--6cb6--43d1--a91e--63eeae4228da 0.00 0.00 2121.40 5.40 8485.60 22.00 8.00 0.83 0.39 0.39 0.69 0.43 92.36 ceph--ee11d629--21e4--419d--8895--c498d94d305e-osd--block--bf34dbdf--6bb6--40b8--9e7c--6768f9f325d3 0.00 0.00 1900.40 0.00 7601.60 0.00 8.00 0.69 0.36 0.36 0.00 0.47 89.07 ceph--8282bf71--a988--4446--9037--7f3bf9b73246-osd--block--0dc8cc5e--86bf--4fcb--9404--0fbb1c7ec120 0.00 0.00 3001.00 2.00 12004.00 10.80 8.00 1.73 0.58 0.58 0.85 0.32 95.98 loop5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ceph--c8258555--bac0--4beb--a4a8--5b1f05c5024b-osd--block--00ac521e--35d9--48a4--915a--b35b86d5bd69 0.00 0.00 2257.90 0.00 9031.60 0.00 8.00 0.79 0.35 0.35 0.00 0.41 92.71 ceph--05e81cc2--6a4c--4a40--b9b0--436f441377e3-osd--block--382b31de--1038--4a0f--b46a--fb54af7aed9b 0.00 0.00 2391.40 0.00 9566.00 0.00 8.00 0.93 0.39 0.39 0.00 0.39 93.05 nvme1n1 0.00 2.40 0.00 3.60 0.00 30.00 16.67 0.00 0.56 0.00 0.56 0.92 0.33 Whereas on 2 of the OCS nodes I see no OSDs: 03/26/20 10:06:54 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.60 0.20 23.40 6.00 122.30 10.87 0.00 0.58 0.50 0.58 0.28 0.65 dm-0 0.00 0.00 0.20 24.00 6.00 122.30 10.60 0.02 0.66 0.00 0.67 0.26 0.64 Others show 3 OSDs: 03/26/20 10:06:54 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.20 0.00 3.30 0.00 35.90 21.76 0.00 0.82 0.00 0.82 0.52 0.17 dm-0 0.00 0.00 0.00 3.50 0.00 35.90 20.51 0.00 0.71 0.00 0.71 0.49 0.17 loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 loop1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 loop2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 loop3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme2n1 0.00 0.40 3081.10 0.20 12324.40 4.00 8.00 0.28 0.52 0.52 0.50 0.31 97.01 nvme3n1 0.00 0.00 1970.70 0.00 7882.80 0.00 8.00 0.13 0.49 0.49 0.00 0.46 91.36 nvme4n1 0.00 0.00 2233.30 0.00 8933.20 0.00 8.00 0.01 0.41 0.41 0.00 0.42 92.75 ceph--77c3f7bb--4ce8--493f--b3fe--51a8425dbe89-osd--block--504a47c5--105e--4708--9cb0--71af98dfde5e 0.00 0.00 2233.20 0.00 8932.80 0.00 8.00 0.89 0.40 0.40 0.00 0.42 92.72 ceph--63730511--0b48--4b5a--a04f--a747814ff6b0-osd--block--69ff341f--0571--4156--89f5--698cb99bd143 0.00 0.00 1970.70 0.00 7882.80 0.00 8.00 0.94 0.48 0.48 0.00 0.46 91.32 ceph--8fd8ca6b--361a--4faf--a348--de334f07ef40-osd--block--a424782e--f5ac--422e--883f--0281c66b8c70 0.00 0.00 3081.20 0.60 12324.80 4.00 8.00 1.58 0.51 0.51 0.33 0.31 97.03 nvme1n1 0.00 2.40 0.00 3.60 0.00 35.20 19.56 0.00 0.53 0.00 0.53 0.89 0.32
ceph_osd_tree: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 53.97281 root default -5 53.97281 region us-east-1 -10 17.99094 zone us-east-1a -9 1.99899 host ocs-deviceset-1-0-z444z 2 ssd 1.99899 osd.2 up 1.00000 1.00000 -21 1.99899 host ocs-deviceset-1-1-xc6fz 4 ssd 1.99899 osd.4 up 1.00000 1.00000 -25 1.99899 host ocs-deviceset-1-2-vd52m 6 ssd 1.99899 osd.6 up 1.00000 1.00000 -35 1.99899 host ocs-deviceset-1-3-ms5tm 18 ssd 1.99899 osd.18 up 1.00000 1.00000 -29 1.99899 host ocs-deviceset-1-4-7bwqb 12 ssd 1.99899 osd.12 up 1.00000 1.00000 -33 1.99899 host ocs-deviceset-1-5-9fpqz 13 ssd 1.99899 osd.13 up 1.00000 1.00000 -37 1.99899 host ocs-deviceset-1-6-jbbdn 15 ssd 1.99899 osd.15 up 1.00000 1.00000 -31 1.99899 host ocs-deviceset-1-7-hp8fd 14 ssd 1.99899 osd.14 up 1.00000 1.00000 -63 1.99899 host ocs-deviceset-1-8-whv7w 26 ssd 1.99899 osd.26 up 1.00000 1.00000 -14 17.99094 zone us-east-1b -13 1.99899 host ocs-deviceset-2-0-rwqzz 1 ssd 1.99899 osd.1 up 1.00000 1.00000 -27 1.99899 host ocs-deviceset-2-1-mgxxz 7 ssd 1.99899 osd.7 up 1.00000 1.00000 -23 1.99899 host ocs-deviceset-2-2-5x2pq 8 ssd 1.99899 osd.8 up 1.00000 1.00000 -43 1.99899 host ocs-deviceset-2-3-dcdbd 16 ssd 1.99899 osd.16 up 1.00000 1.00000 -45 1.99899 host ocs-deviceset-2-4-tnk8p 22 ssd 1.99899 osd.22 up 1.00000 1.00000 -57 1.99899 host ocs-deviceset-2-5-scs28 23 ssd 1.99899 osd.23 up 1.00000 1.00000 -55 1.99899 host ocs-deviceset-2-6-cd4hz 25 ssd 1.99899 osd.25 up 1.00000 1.00000 -39 1.99899 host ocs-deviceset-2-7-vltbt 24 ssd 1.99899 osd.24 up 1.00000 1.00000 -49 1.99899 host ocs-deviceset-2-8-m8fxs 21 ssd 1.99899 osd.21 up 1.00000 1.00000 -4 17.99094 zone us-east-1c -3 1.99899 host ocs-deviceset-0-0-rh5xd 0 ssd 1.99899 osd.0 up 1.00000 1.00000 -17 1.99899 host ocs-deviceset-0-1-mdnj7 5 ssd 1.99899 osd.5 up 1.00000 1.00000 -19 1.99899 host ocs-deviceset-0-2-4zc87 3 ssd 1.99899 osd.3 up 1.00000 1.00000 -41 1.99899 host ocs-deviceset-0-3-kftrr 10 ssd 1.99899 osd.10 up 1.00000 1.00000 -59 1.99899 host ocs-deviceset-0-4-8t6dj 11 ssd 1.99899 osd.11 up 1.00000 1.00000 -47 1.99899 host ocs-deviceset-0-5-6ljc7 17 ssd 1.99899 osd.17 up 1.00000 1.00000 -61 1.99899 host ocs-deviceset-0-6-nml8q 20 ssd 1.99899 osd.20 up 1.00000 1.00000 -51 1.99899 host ocs-deviceset-0-7-78jfn 19 ssd 1.99899 osd.19 up 1.00000 1.00000 -53 1.99899 host ocs-deviceset-0-8-5x8n4 9 ssd 1.99899 osd.9 up 1.00000 1.00000
@Manoj, within a zone, the scheduling of osds across nodes is up to kubernetes scheduling. We currently have no control over it. There is an upcoming feature in kubernetes that will lead to even distribution as a result of scheduling. Currently it's rather the opposite. Or at least more random.
This is dependent on an upcoming feature in Kubernetes 1.18, which will be the base for OCP 4.5. Rook-Ceph will also be implementing this for OCS 4.5, so we just need to update the StorageDevice placement in ocs-operator to use this new feature. Moving to OCS 4.5.
*** Bug 1821161 has been marked as a duplicate of this bug. ***
Hi, this bug was already acked for 4.5, as a blocker. Can we please have a discussion first before pushing it out? Eran, Michael, please review
Yeah, this was approved for 4.5, but prematurely. As José said, it's a feature. It is also the same as this BZ effectively: https://bugzilla.redhat.com/show_bug.cgi?id=1814681 There are design discussions going on about this. As José said, the basic feature is there in k8s/ocp, but how to properly make use of it in rook/ocs-operator in a fashion that solves our problem and that is upgrade-safe needs much more discussion than a simple bugfix. Major work will be required in rook. This will also fix https://bugzilla.redhat.com/show_bug.cgi?id=1776562
So: 1) I agree with José's assesment 2) I agree that this should not just have been moved but first discussed with you.
I'm not sure why it's a block and why it's a serious scalability issue. I understand that it would be great if we could spread it, but as long as we get the resources, we should be fine, hence we trust K8S scheduling. I don't see a problem to push it to 4.6
This is still dependent on "topology spread contraints" feature which is targeted for 4.7
The PR for topology spread constraints has merged.
This bug has been fixed in OCS 4.7 by topology spread constraints and its corresponding PR has been merged. Note : Topology Spread Constraints is supported on OCP 4.6+ Tested in OCP 4.6 by following the "Steps to reproduce" and the OSDs were distributed uniformly across OCS nodes on 9-node AWS setup (each with 3 OSDs) Worker nodes: oc get nodes | grep worker ip-10-0-131-148.ec2.internal Ready worker 69m v1.19.0-rc.2+99cb93a-dirty ip-10-0-149-130.ec2.internal Ready worker 144m v1.19.0-rc.2+99cb93a-dirty ip-10-0-158-252.ec2.internal Ready worker 69m v1.19.0-rc.2+99cb93a-dirty ip-10-0-161-65.ec2.internal Ready worker 69m v1.19.0-rc.2+99cb93a-dirty ip-10-0-165-63.ec2.internal Ready worker 144m v1.19.0-rc.2+99cb93a-dirty ip-10-0-185-7.ec2.internal Ready worker 69m v1.19.0-rc.2+99cb93a-dirty ip-10-0-193-103.ec2.internal Ready worker 69m v1.19.0-rc.2+99cb93a-dirty ip-10-0-212-165.ec2.internal Ready worker 144m v1.19.0-rc.2+99cb93a-dirty ip-10-0-213-134.ec2.internal Ready worker 69m v1.19.0-rc.2+99cb93a-dirty OSD Pods placement : oc get pods -owide | grep osd rook-ceph-osd-0-6d57d4dc47-j92nf 1/1 Running 0 94m 10.131.0.32 ip-10-0-149-130.ec2.internal <none> <none> rook-ceph-osd-1-5dc5fd969d-fs7p7 1/1 Running 0 94m 10.129.2.19 ip-10-0-165-63.ec2.internal <none> <none> rook-ceph-osd-10-5d7f855999-qq68x 1/1 Running 0 23m 10.131.4.12 ip-10-0-185-7.ec2.internal <none> <none> rook-ceph-osd-11-64b5854558-chzw8 1/1 Running 0 23m 10.129.4.7 ip-10-0-131-148.ec2.internal <none> <none> rook-ceph-osd-12-6cb8475dd6-lj9qs 1/1 Running 0 19m 10.130.2.8 ip-10-0-193-103.ec2.internal <none> <none> rook-ceph-osd-13-7685644678-ml9vl 1/1 Running 0 19m 10.131.2.11 ip-10-0-161-65.ec2.internal <none> <none> rook-ceph-osd-14-5f846855bf-ghk8s 1/1 Running 0 19m 10.130.4.9 ip-10-0-158-252.ec2.internal <none> <none> rook-ceph-osd-15-75d655b657-tx668 1/1 Running 0 13m 10.131.4.15 ip-10-0-185-7.ec2.internal <none> <none> rook-ceph-osd-16-574d5ddb6d-wsrj9 1/1 Running 0 13m 10.130.4.12 ip-10-0-158-252.ec2.internal <none> <none> rook-ceph-osd-17-b869d8c76-h8n4s 1/1 Running 0 13m 10.128.4.12 ip-10-0-213-134.ec2.internal <none> <none> rook-ceph-osd-18-5985f978cd-8g5nz 1/1 Running 0 9m59s 10.131.2.14 ip-10-0-161-65.ec2.internal <none> <none> rook-ceph-osd-19-58f467fdd8-mhk47 1/1 Running 0 9m58s 10.130.2.10 ip-10-0-193-103.ec2.internal <none> <none> rook-ceph-osd-2-594449cd9d-tvjkn 1/1 Running 0 94m 10.128.2.27 ip-10-0-212-165.ec2.internal <none> <none> rook-ceph-osd-20-cb8f574d-smnkq 1/1 Running 0 9m48s 10.129.4.11 ip-10-0-131-148.ec2.internal <none> <none> rook-ceph-osd-21-75bf5bdc7d-hcq65 1/1 Running 0 5m46s 10.131.2.15 ip-10-0-161-65.ec2.internal <none> <none> rook-ceph-osd-22-5c6d7754f4-zrvnf 1/1 Running 0 5m45s 10.128.4.14 ip-10-0-213-134.ec2.internal <none> <none> rook-ceph-osd-23-7dcc8dcfbd-6q7zw 1/1 Running 0 5m40s 10.129.4.13 ip-10-0-131-148.ec2.internal <none> <none> rook-ceph-osd-24-5f9f94d444-tqmzb 1/1 Running 0 60s 10.130.4.14 ip-10-0-158-252.ec2.internal <none> <none> rook-ceph-osd-25-f9cf7558c-f5zvc 1/1 Running 0 57s 10.130.2.14 ip-10-0-193-103.ec2.internal <none> <none> rook-ceph-osd-26-d46dd8944-dmrmj 1/1 Running 0 56s 10.131.4.18 ip-10-0-185-7.ec2.internal <none> <none> rook-ceph-osd-3-558db7498b-c59x4 1/1 Running 0 83m 10.131.0.36 ip-10-0-149-130.ec2.internal <none> <none> rook-ceph-osd-4-65dcbb54f6-6grp8 1/1 Running 0 82m 10.129.2.27 ip-10-0-165-63.ec2.internal <none> <none> rook-ceph-osd-5-675d4dcc74-wmpjn 1/1 Running 0 82m 10.128.2.31 ip-10-0-212-165.ec2.internal <none> <none> rook-ceph-osd-6-bb48b855c-4k4ck 1/1 Running 0 74m 10.128.2.33 ip-10-0-212-165.ec2.internal <none> <none> rook-ceph-osd-7-86bcf856db-tbhhw 1/1 Running 0 74m 10.129.2.30 ip-10-0-165-63.ec2.internal <none> <none> rook-ceph-osd-8-7864cd7df5-ktqpq 1/1 Running 0 74m 10.131.0.38 ip-10-0-149-130.ec2.internal <none> <none> rook-ceph-osd-9-8d7bf6df6-dprxm 1/1 Running 0 23m 10.128.4.9 ip-10-0-213-134.ec2.internal <none> <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5605
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days