Bug 2167347

Summary: OSDs marked as down, not equally distributed in ceph osd tree output of a size 20 cluster
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Jilju Joy <jijoy>
Component: odf-managed-serviceAssignee: Ohad <omitrani>
Status: CLOSED CURRENTRELEASE QA Contact: Jilju Joy <jijoy>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: cblum, ocs-bugs, odf-bz-bot, rchikatw, rohgupta
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-04-10 09:19:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jilju Joy 2023-02-06 11:15:48 UTC
Description of problem:
osd.9 and osd.10 are marked as down in ceph osd tree output. osd-9 and osd-10 pods are missing. osd-15 and osd-16 pods are running instead.

$ oc get pods -o wide -l osd
NAME                                READY   STATUS    RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES
rook-ceph-osd-0-5fb844c65b-kvzzg    2/2     Running   0          157m   10.0.17.34    ip-10-0-17-34.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-1-67d56959d8-l49bq    2/2     Running   0          157m   10.0.14.24    ip-10-0-14-24.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-11-698f94f8f4-vqvp7   2/2     Running   0          152m   10.0.19.38    ip-10-0-19-38.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-12-85f55945d6-cpdps   2/2     Running   0          157m   10.0.14.194   ip-10-0-14-194.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-13-7f4c5dc66d-dh7q9   2/2     Running   0          157m   10.0.14.24    ip-10-0-14-24.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-14-7b9ddd9967-qzj25   2/2     Running   0          157m   10.0.14.24    ip-10-0-14-24.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-15-65b76d4467-gctxj   2/2     Running   0          155m   10.0.21.244   ip-10-0-21-244.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-16-5bc8c589df-gszfq   2/2     Running   0          155m   10.0.22.40    ip-10-0-22-40.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-2-76479488bb-vlbz8    2/2     Running   0          157m   10.0.22.40    ip-10-0-22-40.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-3-c5985f568-xs8q7     2/2     Running   0          157m   10.0.14.194   ip-10-0-14-194.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-4-667f88675d-tbjrz    2/2     Running   0          157m   10.0.14.194   ip-10-0-14-194.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-5-5cc6866895-q96s6    2/2     Running   0          157m   10.0.17.34    ip-10-0-17-34.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-6-58b5b97594-m6txx    2/2     Running   0          152m   10.0.19.38    ip-10-0-19-38.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-7-7c688846cb-pmfjv    2/2     Running   0          157m   10.0.22.40    ip-10-0-22-40.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-8-65ff789554-vf6x5    2/2     Running   0          157m   10.0.21.244   ip-10-0-21-244.us-east-2.compute.internal   <none>           <none>


The weight is not distributed correctly per zones.

$ oc exec  rook-ceph-tools-7c8c77bd96-g9r2v -- ceph osd tree
ID   CLASS  WEIGHT    TYPE NAME                               STATUS  REWEIGHT  PRI-AFF
 -1         60.00000  root default                                                     
 -5         60.00000      region us-east-2                                             
-10         24.00000          zone us-east-2a                                          
-19          4.00000              host default-0-data-0zb589                           
  1    ssd   4.00000                  osd.1                       up   1.00000  1.00000
-27          4.00000              host default-1-data-0dgqxh                           
 13    ssd   4.00000                  osd.13                      up   1.00000  1.00000
-29          4.00000              host default-1-data-18p4w9                           
 14    ssd   4.00000                  osd.14                      up   1.00000  1.00000
-13          4.00000              host default-1-data-3ccnsk                           
  4    ssd   4.00000                  osd.4                       up   1.00000  1.00000
-31          4.00000              host default-2-data-1crwhx                           
 12    ssd   4.00000                  osd.12                      up   1.00000  1.00000
 -9          4.00000              host default-2-data-4drz6r                           
  3    ssd   4.00000                  osd.3                       up   1.00000  1.00000
 -4         16.00000          zone us-east-2b                                          
 -3          4.00000              host default-0-data-1lmf4s                           
  0    ssd   4.00000                  osd.0                       up   1.00000  1.00000
-21          4.00000              host default-0-data-4c6c6c                           
  6    ssd   4.00000                  osd.6                       up   1.00000  1.00000
-25          4.00000              host default-1-data-4qj4rs                           
  5    ssd   4.00000                  osd.5                       up   1.00000  1.00000
-23          4.00000              host default-2-data-3cwl4r                           
 11    ssd   4.00000                  osd.11                      up   1.00000  1.00000
-16         20.00000          zone us-east-2c                                          
-35          4.00000              host default-0-data-2w7jjk                           
  2    ssd   4.00000                  osd.2                       up   1.00000  1.00000
-37          4.00000              host default-0-data-32lmdw                           
 15    ssd   4.00000                  osd.15                      up   1.00000  1.00000
-33          4.00000              host default-1-data-28l6lr                           
  7    ssd   4.00000                  osd.7                       up   1.00000  1.00000
-39          4.00000              host default-2-data-0b5gpt                           
 16    ssd   4.00000                  osd.16                      up   1.00000  1.00000
-15          4.00000              host default-2-data-26pmj5                           
  8    ssd   4.00000                  osd.8                       up   1.00000  1.00000
  9                0  osd.9                                     down         0  1.00000
 10                0  osd.10                                    down         0  1.00000



must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-f6-s20-pr/jijoy-f6-s20-pr_20230206T073026/logs/deployment_1675674405/

=================================================================================================================
Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.50   True        False         176m    Cluster version is 4.10.50

$ oc get csv
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.10.9                      NooBaa Operator               4.10.9            mcg-operator.v4.10.8                      Succeeded
observability-operator.v0.0.20            Observability Operator        0.0.20            observability-operator.v0.0.19            Succeeded
ocs-operator.v4.10.9                      OpenShift Container Storage   4.10.9            ocs-operator.v4.10.8                      Succeeded
ocs-osd-deployer.v2.0.11                  OCS OSD Deployer              2.0.11            ocs-osd-deployer.v2.0.10                  Succeeded
odf-csi-addons-operator.v4.10.9           CSI Addons                    4.10.9            odf-csi-addons-operator.v4.10.8           Succeeded
odf-operator.v4.10.9                      OpenShift Data Foundation     4.10.9            odf-operator.v4.10.8                      Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator           4.10.0            ose-prometheus-operator.4.8.0             Succeeded
route-monitor-operator.v0.1.456-02ea942   Route Monitor Operator        0.1.456-02ea942   route-monitor-operator.v0.1.454-494fffd   Succeeded


=========================================================================================================================

How reproducible:
Reporting the first occurrence of the issue with the revised topology changes 

Steps to Reproduce:
1. Install Managed Services Provider cluster with size 20 
2. Verify "ceph osd tree" output and the list of pods

Actual results:
Some OSDs are marked as down. Incorrect number of OSDs per zone. 

Expected results:
ceph osd tree output should not have any issues.

Additional info:
Bug #2166915 was also seen in the cluster.
There is a similar bug #2136378  (closed as not a bug after discussions) where ceph health was not okay. Ceph health is HEALTH_OK in this case.

Comment 1 Chris Blum 2023-02-10 10:25:13 UTC
duplicate of 2167045 or same root cause?
did we see this with the latest image again?

Comment 3 Rohan Gupta 2023-03-27 12:32:15 UTC
OSDs are distributed equally after https://github.com/red-hat-storage/ocs-osd-deployer/pull/281 was merged

Comment 4 Jilju Joy 2023-03-28 07:06:52 UTC
Moving this to Verified because the bug was not reproduced during the testing after the PR mentioned in comment #c3 was applied in the build.

Comment 5 Ritesh Chikatwar 2023-04-10 09:19:04 UTC
Closing this as this is been verified BY QE and fixed in v2.0.11.