Bug 2213074 - PGs stuck in incomplete state
Summary: PGs stuck in incomplete state
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.10
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Michael J. Kidd
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-07 05:31 UTC by Anjali
Modified: 2023-08-09 16:37 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-26 21:08:42 UTC
Embargoed:


Attachments (Terms of Use)

Description Anjali 2023-06-07 05:31:27 UTC
Description of problem (please be detailed as possible and provide log
snippests):

- ODF CLuster used as backend for 3scale project

- Case started of with below pods in CLBO and osd.1 and osd.2 were down.  

rook-ceph-crashcollector-ocp-xq4fg-worker-ocs-dndtv-6d5bbdlrx4n   1/1     Running            0                56m    172.26.2.3     ocp-xq4fg-worker-ocs-dndtv    <none>           <none>
rook-ceph-crashcollector-ocp-xq4fg-worker-ocs-kdmdv-5f8985zd5sv   1/1     Running            0                71m    172.27.2.12    ocp-xq4fg-worker-ocs-kdmdv    <none>           <none>
rook-ceph-crashcollector-ocp-xq4fg-worker-ocs-tb22v-8486f4868t7   1/1     Running            0                84m    172.24.4.18    ocp-xq4fg-worker-ocs-tb22v    <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-58f58df98xjfh   2/2     Running            0                84m    172.24.4.16    ocp-xq4fg-worker-ocs-tb22v    <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-f74549b77bd9j   2/2     Running            0                64m    172.27.2.15    ocp-xq4fg-worker-ocs-kdmdv    <none>           <none>
rook-ceph-mgr-a-5f45f48656-9sqr7                                  2/2     Running            0                84m    172.24.4.17    ocp-xq4fg-worker-ocs-tb22v    <none>           <none>
rook-ceph-mon-c-77b6d94bb5-hhrn2                                  2/2     Running            0                28m    172.26.2.17    ocp-xq4fg-worker-ocs-dndtv    <none>           <none>
rook-ceph-mon-e-6b9d888fc9-pf7gq                                  2/2     Running            3                26d    172.24.4.11    ocp-xq4fg-worker-ocs-tb22v    <none>           <none>
rook-ceph-mon-f-596ff854bb-6s9gh                                  2/2     Running            0                28m    172.27.2.16    ocp-xq4fg-worker-ocs-kdmdv    <none>           <none>
rook-ceph-operator-7fcb865999-srw2t                               1/1     Running            0                44m    172.26.2.13    ocp-xq4fg-worker-ocs-dndtv    <none>           <none>
rook-ceph-osd-0-6957867bc6-dgdv6                                  2/2     Running            9 (19m ago)      42m    172.24.4.22    ocp-xq4fg-worker-ocs-tb22v    <none>           <none>
rook-ceph-osd-1-66fcb9d68c-qf2h2                                  1/2     Running            9 (5m47s ago)    80m    172.27.2.7     ocp-xq4fg-worker-ocs-kdmdv    <none>           <none>
rook-ceph-osd-2-5d8579f7f4-qzw89                                  1/2     CrashLoopBackOff   12 (2m29s ago)   41m    172.26.2.15    ocp-xq4fg-worker-ocs-dndtv    <none>           <none>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-5f655f45fmd2   1/2     CrashLoopBackOff   15 (2m52s ago)   71m    172.27.2.13    ocp-xq4fg-worker-ocs-kdmdv    <none>           <none>

[amenon@supportshell-1 must_gather_commands]$ cat ceph_status
  cluster:
    id:     a68b4792-6284-4f1d-9d20-e42ed96b59ef
    health: HEALTH_ERR
            1 filesystem is degraded
            1 MDSs report slow metadata IOs
            132/272233 objects unfound (0.048%)
            2 osds down
            2 hosts (2 osds) down
            2 racks (2 osds) down
            Reduced data availability: 177 pgs inactive, 99 pgs down
            Possible data damage: 29 pgs recovery_unfound
            Degraded data redundancy: 221154/816699 objects degraded (27.079%), 60 pgs degraded
 
  services:
    mon: 3 daemons, quorum c,e,f (age 29m)
    mgr: a(active, since 85m)
    mds: 1/1 daemons up, 1 standby
    osd: 3 osds: 1 up (since 22s), 3 in (since 7M)
 
  data:
    volumes: 0/1 healthy, 1 recovering
    pools:   11 pools, 177 pgs
    objects: 272.23k objects, 525 GiB
    usage:   1.6 TiB used, 1.4 TiB / 3 TiB avail
    pgs:     100.000% pgs not active
             221154/816699 objects degraded (27.079%)
             132/272233 objects unfound (0.048%)
             99 down
             31 undersized+degraded+peered
             29 recovery_unfound+undersized+degraded+peered
             18 undersized+peered
 
- Please note at this stage only osd.0 was up 

[amenon@supportshell-1 must_gather_commands]$ cat ceph_osd_tree
ID   CLASS  WEIGHT   TYPE NAME                                STATUS  REWEIGHT  PRI-AFF
 -1         3.00000  root default                                                      
 -4         1.00000      rack rack0                                                    
 -3         1.00000          host ocp-xq4fg-worker-ocs-dndtv                           
  2    hdd  1.00000              osd.2                          down   1.00000  1.00000
-12         1.00000      rack rack1                                                    
-11         1.00000          host ocp-xq4fg-worker-ocs-kdmdv                           
  1    hdd  1.00000              osd.1                          down   1.00000  1.00000
 -8         1.00000      rack rack2                                                    
 -7         1.00000          host ocp-xq4fg-worker-ocs-tb22v                           
  0    hdd  1.00000              osd.0                            up   1.00000  1.00000

- Cu replaced osd nodes for osd.2 and osd.0. 

- At this stage, osd.0 was down. Looks like osd.1 and osd.0 were flapping. 

[bhull@supportshell-1 must_gather_commands]$ more ceph_osd_tree
ID   CLASS  WEIGHT   TYPE NAME                                STATUS  REWEIGHT  PRI-AFF
 -1         3.00000  root default                                                      
 -4         1.00000      rack rack0                                                    
 -3         1.00000          host ocp-xq4fg-worker-ocs-bnsfg                           
  2    hdd  1.00000              osd.2                            up   1.00000  1.00000
-12         1.00000      rack rack1                                                    
-11         1.00000          host ocp-xq4fg-worker-ocs-kdmdv                           
  1    hdd  1.00000              osd.1                            up   1.00000  1.00000
 -8         1.00000      rack rack2                                                    
 -7         1.00000          host ocp-xq4fg-worker-ocs-tb22v                           
  0    hdd  1.00000              osd.0                          down   1.00000  1.00000

- Currently we have all 3 osds up and running. 

- all 3 osd pods are running 

rook-ceph-osd-0-8bdc48b56-k4m6x                                   2/2     Running     0                2m25s   172.26.4.28    ocp-xq4fg-worker-ocs-ffgzx    <none>           <none>
rook-ceph-osd-1-66fcb9d68c-2j2ht                                  2/2     Running     44 (7h56m ago)   10h     172.27.2.33    ocp-xq4fg-worker-ocs-kdmdv    <none>           <none>
rook-ceph-osd-2-7f7bfd7fc5-7cpnm                                  2/2     Running     0                7h30m   172.25.4.29    ocp-xq4fg-worker-ocs-bnsfg    <none>           <none>

- But mon.e is down 

rook-ceph-mon-e-6b9d888fc9-djkcs                                  0/2     Pending     0                12m     <none>         <none>                        <none>           <none>

- Cluster is currently backfilling, but main issue is that we have 46 pgs incomplete + 1 pg in recovery_unfound

[amenon@supportshell-1 must_gather_commands]$ cat ceph_status
  cluster:
    id:     a68b4792-6284-4f1d-9d20-e42ed96b59ef
    health: HEALTH_ERR
            1 filesystem is degraded
            1 MDSs report slow metadata IOs
            1/3 mons down, quorum f,g
            4/223819 objects unfound (0.002%)
            1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
            Reduced data availability: 47 pgs inactive, 46 pgs incomplete
            Possible data damage: 1 pg recovery_unfound
            Degraded data redundancy: 221461/671457 objects degraded (32.982%), 98 pgs degraded, 112 pgs undersized
            29 slow ops, oldest one blocked for 11261 sec, daemons [osd.0,osd.2] have slow ops.
 
  services:
    mon: 3 daemons, quorum f,g (age 6h), out of quorum: e
    mgr: a(active, since 8h)
    mds: 1/1 daemons up, 1 standby
    osd: 3 osds: 3 up (since 3m), 3 in (since 3m); 70 remapped pgs
 
  data:
    volumes: 0/1 healthy, 1 recovering
    pools:   11 pools, 177 pgs
    objects: 223.82k objects, 439 GiB
    usage:   947 GiB used, 2.1 TiB / 3 TiB avail
    pgs:     26.554% pgs not active
             221461/671457 objects degraded (32.982%)
             118865/671457 objects misplaced (17.703%)
             4/223819 objects unfound (0.002%)
             49 active+undersized+degraded+remapped+backfill_wait
             47 active+undersized+degraded
             46 incomplete
             14 active+undersized
             10 active+clean+remapped
             9  active+clean
             1  active+undersized+degraded+remapped+backfilling
             1  recovery_unfound+undersized+degraded+remapped+peered

- ceph is still in ERR status beacuse of the 46 pgs incomplete + 1 pg in recovery_unfound. 

- Cu confirmed that the old osd.0 is empty. As we don't have any data on osd.0, the 46 pgs are still stuck in incomplete and we are not able to recover them. 

- Ceph is concerned with these 46 pgs, which show [1,2] and [2,1] so the data is in question at this point.  

[WRN] PG_AVAILABILITY: Reduced data availability: 46 pgs inactive, 46 pgs incomplete
    pg 1.0 is incomplete, acting [1,2]
    pg 1.1 is incomplete, acting [2,1]
    pg 1.14 is incomplete, acting [2,1]
. . .

- Need help from engineering on whether we can recover these pgs and get to RCA

Version of all relevant components (if applicable):

- odf v4.10.12
- ceph version 16.2.7-126.el8cp

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
 
- Yes, Business services are currently unavailable. The ODF cluster is used as backend to their 3scale project. Cu will try to bring up the applications once backfilling is completed.  

Is there any workaround available to the best of your knowledge?
 N/A

Additional info:

All m-gs are available in supportshell under ~/03531113


Note You need to log in before you can comment on or make changes to this bug.