Bug 2023268

Summary: [Managed Service Tracker] OSDs are not evenly distributed
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Filip Balák <fbalak>
Component: odf-managed-serviceAssignee: Ohad <omitrani>
Status: CLOSED WORKSFORME QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: high    
Version: 4.8CC: aeyal, dbindra, ebenahar, mbukatov, mmuench, nibalach, ocs-bugs, odf-bz-bot, owasserm, rperiyas
Target Milestone: ---Keywords: Tracking
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2100713 (view as bug list) Environment:
Last Closed: 2023-01-20 09:45:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2004801    
Bug Blocks:    

Description Filip Balák 2021-11-15 10:43:45 UTC
Description of problem:
After installation of ODF Managed Service addon on 1 TiB ROSA cluster I see that there are 2 OSDs on 1 node. rack-0 is not used.

$ oc get nodes --show-labels
NAME                                         STATUS   ROLES          AGE   VERSION                LABELS
ip-10-0-148-15.us-east-2.compute.internal    Ready    master         92m   v1.22.0-rc.0+a44d0f0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-148-15.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a
ip-10-0-151-136.us-east-2.compute.internal   Ready    infra,worker   62m   v1.22.0-rc.0+a44d0f0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=r5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-151-136.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node-role.kubernetes.io/worker=,node-role.kubernetes.io=infra,node.kubernetes.io/instance-type=r5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a
ip-10-0-155-80.us-east-2.compute.internal    Ready    master         93m   v1.22.0-rc.0+a44d0f0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-155-80.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a
ip-10-0-161-52.us-east-2.compute.internal    Ready    worker         83m   v1.22.0-rc.0+a44d0f0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-161-52.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a,topology.rook.io/rack=rack0
ip-10-0-189-113.us-east-2.compute.internal   Ready    worker         82m   v1.22.0-rc.0+a44d0f0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-189-113.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a,topology.rook.io/rack=rack1
ip-10-0-195-98.us-east-2.compute.internal    Ready    worker         86m   v1.22.0-rc.0+a44d0f0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-195-98.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a,topology.rook.io/rack=rack2
ip-10-0-199-64.us-east-2.compute.internal    Ready    master         93m   v1.22.0-rc.0+a44d0f0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.2xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-199-64.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.2xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a
ip-10-0-219-37.us-east-2.compute.internal    Ready    infra,worker   61m   v1.22.0-rc.0+a44d0f0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=r5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-219-37.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node-role.kubernetes.io/worker=,node-role.kubernetes.io=infra,node.kubernetes.io/instance-type=r5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a

$ oc get pods -n openshift-storage -o wide|grep ceph-osd
rook-ceph-osd-0-56c8dd8864-zw4sr                                  2/2     Running     0             43m   10.129.2.10    ip-10-0-189-113.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-1-85f94b9957-fk5g8                                  2/2     Running     0             44m   10.131.0.34    ip-10-0-195-98.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-2-75ff66487-nmmfg                                   2/2     Running     0             43m   10.129.2.9     ip-10-0-189-113.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-default-0-data-098bp7--1-b7n2t              0/1     Completed   0             45m   10.131.0.33    ip-10-0-195-98.us-east-2.compute.internal    <none>           <none>

$ oc rsh -n openshift-storage rook-ceph-tools-798b4968cc-pfx4p ceph osd tree
ID  CLASS WEIGHT  TYPE NAME                                  STATUS REWEIGHT PRI-AFF 
 -1       3.00000 root default                                                       
 -6       3.00000     region us-east-2                                               
 -5       3.00000         zone us-east-2a                                            
-12       2.00000             rack rack1                                             
-11       1.00000                 host default-1-data-0wlkbt                         
  0   ssd 1.00000                     osd.0                      up  1.00000 1.00000 
-15       1.00000                 host default-2-data-0jqgkn                         
  2   ssd 1.00000                     osd.2                      up  1.00000 1.00000 
 -4       1.00000             rack rack2                                             
 -3       1.00000                 host default-0-data-098bp7                         
  1   ssd 1.00000                     osd.1                      up  1.00000 1.00000 

Version-Release number of selected component (if applicable):
ocs-operator.v4.8.2
ocs-osd-deployer.v1.1.1

How reproducible:
Not sure

Comment 2 Sahina Bose 2021-11-16 07:21:08 UTC
2/3 OSDs on the same node is not expected. Is this a product bug?

Comment 3 Sahina Bose 2021-11-16 07:28:48 UTC
Can you attach the StorageCluster CR?

Comment 5 N Balachandran 2021-11-16 09:12:11 UTC
Please attach the CephCluster CR as well.

Comment 7 Kesavan 2021-11-17 09:59:31 UTC
The possible reason would be during the ODF MS addon installation, one of the node would have went down and which lead OSDs to schedule on the available two nodes using TSC.
Currently TSC doesn't have any mechanism to check for the minimum nodes before scheduling, so StorageCluster creation is still being proceeded with 3 replicas on a cluster with less than 3 nodes

Comment 8 Sahina Bose 2021-11-17 10:14:16 UTC
Jose, this looks like a regression on introducing TopologySpreadConstraints. Can we solve it in the product?

Comment 11 Ramakrishnan Periyasamy 2021-11-18 09:59:54 UTC
Observed this problem in scale tests too, here is the bz https://bugzilla.redhat.com/show_bug.cgi?id=2004801

Comment 12 Jose A. Rivera 2022-01-24 15:37:50 UTC
I'm not 100% sure if this is a regression, but it's certainly a problem we should resolve. Since the requirements on the managed service(s) is changing frequently, I'll leave it to you guys to prioritize this BZ. As long as there is a fully ACKed OCS/ODF BZ it can go into any release of ocs-operator.

Comment 14 Orit Wasserman 2022-06-23 10:42:22 UTC
(In reply to Red Hat Bugzilla from comment #13)
> remove performed by PnT Account Manager <pnt-expunge>

In a single zone AWS deployment, flexible scaling should be enabled and we would not added rack labels.
In addition 2 OSDs on the same node will mean that there will be no OSD to store the third replica, i.e the cluster always in degraded mode.
This makes this a regression.

Comment 15 Orit Wasserman 2022-06-23 10:42:40 UTC
(In reply to Red Hat Bugzilla from comment #13)
> remove performed by PnT Account Manager <pnt-expunge>

In a single zone AWS deployment, flexible scaling should be enabled and we would not added rack labels.
In addition 2 OSDs on the same node will mean that there will be no OSD to store the third replica, i.e the cluster always in degraded mode.
This makes this a regression.

Comment 17 Dhruv Bindra 2022-07-19 05:41:02 UTC
The tracking bug is fixed in the product and this needs to verify and close.

Comment 18 Filip Balák 2022-07-19 11:50:47 UTC
I am turning this back to NEW. BZ 2100713 was closed but the reason is that it is a duplicate of BZ 2004801 which is still in NEW state.