Description of problem: After installation of ODF Managed Service addon on 1 TiB ROSA cluster I see that there are 2 OSDs on 1 node. rack-0 is not used. $ oc get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS ip-10-0-148-15.us-east-2.compute.internal Ready master 92m v1.22.0-rc.0+a44d0f0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-148-15.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a ip-10-0-151-136.us-east-2.compute.internal Ready infra,worker 62m v1.22.0-rc.0+a44d0f0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=r5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-151-136.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node-role.kubernetes.io/worker=,node-role.kubernetes.io=infra,node.kubernetes.io/instance-type=r5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a ip-10-0-155-80.us-east-2.compute.internal Ready master 93m v1.22.0-rc.0+a44d0f0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-155-80.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a ip-10-0-161-52.us-east-2.compute.internal Ready worker 83m v1.22.0-rc.0+a44d0f0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-161-52.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a,topology.rook.io/rack=rack0 ip-10-0-189-113.us-east-2.compute.internal Ready worker 82m v1.22.0-rc.0+a44d0f0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-189-113.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a,topology.rook.io/rack=rack1 ip-10-0-195-98.us-east-2.compute.internal Ready worker 86m v1.22.0-rc.0+a44d0f0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-195-98.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a,topology.rook.io/rack=rack2 ip-10-0-199-64.us-east-2.compute.internal Ready master 93m v1.22.0-rc.0+a44d0f0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-199-64.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a ip-10-0-219-37.us-east-2.compute.internal Ready infra,worker 61m v1.22.0-rc.0+a44d0f0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=r5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-219-37.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node-role.kubernetes.io/worker=,node-role.kubernetes.io=infra,node.kubernetes.io/instance-type=r5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a $ oc get pods -n openshift-storage -o wide|grep ceph-osd rook-ceph-osd-0-56c8dd8864-zw4sr 2/2 Running 0 43m 10.129.2.10 ip-10-0-189-113.us-east-2.compute.internal <none> <none> rook-ceph-osd-1-85f94b9957-fk5g8 2/2 Running 0 44m 10.131.0.34 ip-10-0-195-98.us-east-2.compute.internal <none> <none> rook-ceph-osd-2-75ff66487-nmmfg 2/2 Running 0 43m 10.129.2.9 ip-10-0-189-113.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-default-0-data-098bp7--1-b7n2t 0/1 Completed 0 45m 10.131.0.33 ip-10-0-195-98.us-east-2.compute.internal <none> <none> $ oc rsh -n openshift-storage rook-ceph-tools-798b4968cc-pfx4p ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 3.00000 root default -6 3.00000 region us-east-2 -5 3.00000 zone us-east-2a -12 2.00000 rack rack1 -11 1.00000 host default-1-data-0wlkbt 0 ssd 1.00000 osd.0 up 1.00000 1.00000 -15 1.00000 host default-2-data-0jqgkn 2 ssd 1.00000 osd.2 up 1.00000 1.00000 -4 1.00000 rack rack2 -3 1.00000 host default-0-data-098bp7 1 ssd 1.00000 osd.1 up 1.00000 1.00000 Version-Release number of selected component (if applicable): ocs-operator.v4.8.2 ocs-osd-deployer.v1.1.1 How reproducible: Not sure
2/3 OSDs on the same node is not expected. Is this a product bug?
Can you attach the StorageCluster CR?
Please attach the CephCluster CR as well.
The possible reason would be during the ODF MS addon installation, one of the node would have went down and which lead OSDs to schedule on the available two nodes using TSC. Currently TSC doesn't have any mechanism to check for the minimum nodes before scheduling, so StorageCluster creation is still being proceeded with 3 replicas on a cluster with less than 3 nodes
Jose, this looks like a regression on introducing TopologySpreadConstraints. Can we solve it in the product?
Observed this problem in scale tests too, here is the bz https://bugzilla.redhat.com/show_bug.cgi?id=2004801
I'm not 100% sure if this is a regression, but it's certainly a problem we should resolve. Since the requirements on the managed service(s) is changing frequently, I'll leave it to you guys to prioritize this BZ. As long as there is a fully ACKed OCS/ODF BZ it can go into any release of ocs-operator.
(In reply to Red Hat Bugzilla from comment #13) > remove performed by PnT Account Manager <pnt-expunge> In a single zone AWS deployment, flexible scaling should be enabled and we would not added rack labels. In addition 2 OSDs on the same node will mean that there will be no OSD to store the third replica, i.e the cluster always in degraded mode. This makes this a regression.
The tracking bug is fixed in the product and this needs to verify and close.
I am turning this back to NEW. BZ 2100713 was closed but the reason is that it is a duplicate of BZ 2004801 which is still in NEW state.