Bug 2228072

Summary: FS is not accepting IOs when 1 OSD is full
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Amarnath <amk>
Component: RADOSAssignee: Radoslaw Zarzynski <rzarzyns>
Status: NEW --- QA Contact: Pawan <pdhiran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: bhubbard, ceph-eng-bugs, cephqe-warriors, nojha, rzarzyns, vumrao
Target Milestone: ---   
Target Release: 7.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Amarnath 2023-08-01 10:15:46 UTC
Description of problem:
FS is not accepting IOs when 1 OSD is full

Test Steps Followed:
1. Created a cluster and started IOs to fill it 100%(Size of cluster is 189TB).
2. After the cluster reaching 45%(83 TB), FS stopped accepting the IOs and all pools are marked as 100%
[root@extensa019 cephfs_io_94zkeqg8cf_1]# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    186 TiB  102 TiB   84 TiB    84 TiB      45.20
ssd    745 GiB  743 GiB  1.9 GiB   1.9 GiB       0.25
TOTAL  186 TiB  102 TiB   84 TiB    84 TiB      45.02
--- POOLS ---
POOL                     ID  PGS   STORED  OBJECTS     USED   %USED  MAX AVAIL
.mgr                      1    1   18 MiB        6   54 MiB  100.00        0 B
cephfs.cephfs.meta        2   16  396 MiB      142  1.2 GiB  100.00        0 B
cephfs.cephfs.data        3   32  1.2 GiB   50.26k  3.5 GiB  100.00        0 B
.nfs                      4   32  1.5 KiB        0  4.5 KiB  100.00        0 B
cephfs.cephfs_io_1.meta   7   16   15 GiB  933.58k   45 GiB  100.00        0 B
cephfs.cephfs_io_1.data   8  190   28 TiB   28.91M   84 TiB  100.00        0 B

[root@extensa019 cephfs_io_94zkeqg8cf_1]# ceph -s
  cluster:
    id:     16659610-2bb3-11ee-885d-ac1f6bb270c6
    health: HEALTH_ERR
            1 full osd(s)
            4 nearfull osd(s)
            Low space hindering backfill (add storage if this doesn't resolve itself): 12 pgs backfill_toofull
            6 pool(s) full
  services:
    mon: 5 daemons, quorum extensa001,extensa015,extensa004,extensa014,extensa003 (age 4d)
    mgr: extensa003.rgqsss(active, since 4d), standbys: extensa015.hvzezm
    mds: 3/3 daemons up, 3 standby
    osd: 53 osds: 53 up (since 4d), 53 in (since 4d); 14 remapped pgs
  data:
    volumes: 2/2 healthy
    pools:   6 pools, 287 pgs
    objects: 29.89M objects, 28 TiB
    usage:   84 TiB used, 102 TiB / 186 TiB avail
    pgs:     4918834/89671746 objects misplaced (5.485%)
             271 active+clean
             8   active+remapped+backfill_wait+backfill_toofull
             4   active+remapped+backfill_toofull
             1   active+remapped+backfilling
             1   active+clean+scrubbing+deep
             1   active+clean+scrubbing
             1   active+remapped+backfill_wait
  io:
    recovery: 8.1 MiB/s, 8 objects/s
  progress:
    Global Recovery Event (4d)
      [==========================..] (remaining: 5h)

3. We have marked the OSDs with REweight as 0 for the OSD which was full .
    ceph osd reweight osd.43 0
4. After his Fs started accepting the IOs for while again one more OSD was full and we followed step 3 for the full OSD
5. After a few hours of IOs, now cluster is in HEALTH_WARN
            1 MDSs report slow metadata IOs
            1 MDSs behind on trimming
IOs are not going in to the cluster


root@extensa013 ~]# ceph health detail
HEALTH_WARN 2 failed cephadm daemon(s); 1 MDSs report slow metadata IOs; 1 MDSs behind on trimming; 1 backfillfull osd(s); 3 nearfull osd(s); Reduced data availability: 3 pgs inactive; Low space hindering backfill (add storage if this doesn't resolve itself): 13 pgs backfill_toofull; Degraded data redundancy: 2637108/101031984 objects degraded (2.610%), 18 pgs degraded, 18 pgs undersized; 10 pool(s) backfillfull
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    daemon osd.43 on extensa011 is in error state
    daemon osd.41 on extensa015 is in error state
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
    mds.cephfs_io_1.extensa009.uagafl(mds.0): 100+ slow metadata IOs are blocked > 30 secs, oldest blocked for 54870 secs
[WRN] MDS_TRIM: 1 MDSs behind on trimming
    mds.cephfs_io_1.extensa009.uagafl(mds.0): Behind on trimming (397/128) max_segments: 128, num_segments: 397
[WRN] OSD_BACKFILLFULL: 1 backfillfull osd(s)
    osd.24 is backfill full
[WRN] OSD_NEARFULL: 3 nearfull osd(s)
    osd.14 is near full
    osd.21 is near full
    osd.29 is near full
[WRN] PG_AVAILABILITY: Reduced data availability: 3 pgs inactive
    pg 8.14 is stuck inactive for 15h, current state undersized+degraded+remapped+backfilling+peered, last acting [24]
    pg 8.54 is stuck inactive for 15h, current state undersized+degraded+remapped+backfill_wait+backfill_toofull+peered, last acting [24]
    pg 8.94 is stuck inactive for 15h, current state undersized+degraded+remapped+backfill_wait+backfill_toofull+peered, last acting [24]
[WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 13 pgs backfill_toofull
    pg 8.15 is active+remapped+backfill_toofull, acting [29,26,42]
    pg 8.53 is active+undersized+degraded+remapped+backfill_wait+backfill_toofull, acting [38,42]
    pg 8.54 is undersized+degraded+remapped+backfill_wait+backfill_toofull+peered, acting [24]
    pg 8.5e is active+undersized+degraded+remapped+backfill_wait+backfill_toofull, acting [0,51]
    pg 8.6c is active+undersized+degraded+remapped+backfill_toofull, acting [14,25]
    pg 8.74 is active+remapped+backfill_toofull, acting [48,38,7]
    pg 8.77 is active+remapped+backfill_toofull, acting [7,8,36]
    pg 8.7a is active+remapped+backfill_wait+backfill_toofull, acting [24,32,46]
    pg 8.7f is active+remapped+backfill_toofull, acting [4,10,14]
    pg 8.94 is undersized+degraded+remapped+backfill_wait+backfill_toofull+peered, acting [24]
    pg 8.95 is active+remapped+backfill_toofull, acting [29,26,42]
    pg 8.9e is active+undersized+degraded+remapped+backfill_wait+backfill_toofull, acting [0,51]
    pg 8.ac is active+undersized+degraded+remapped+backfill_toofull, acting [14,25]
[WRN] PG_DEGRADED: Degraded data redundancy: 2637108/101031984 objects degraded (2.610%), 18 pgs degraded, 18 pgs undersized
    pg 8.13 is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_wait, last acting [38,42]
    pg 8.14 is stuck undersized for 15h, current state undersized+degraded+remapped+backfilling+peered, last acting [24]
    pg 8.1e is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,51]
    pg 8.28 is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,6]
    pg 8.42 is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_wait, last acting [47,0]
    pg 8.44 is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfilling, last acting [34,32]
    pg 8.4c is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfilling, last acting [28,25]
    pg 8.53 is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_wait+backfill_toofull, last acting [38,42]
    pg 8.54 is stuck undersized for 15h, current state undersized+degraded+remapped+backfill_wait+backfill_toofull+peered, last acting [24]
    pg 8.5e is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_wait+backfill_toofull, last acting [0,51]
    pg 8.68 is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfilling, last acting [0,6]
    pg 8.6c is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_toofull, last acting [14,25]
    pg 8.7e is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,51]
    pg 8.93 is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_wait, last acting [38,42]
    pg 8.94 is stuck undersized for 15h, current state undersized+degraded+remapped+backfill_wait+backfill_toofull+peered, last acting [24]
    pg 8.9e is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_wait+backfill_toofull, last acting [0,51]
    pg 8.ac is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_toofull, last acting [14,25]
    pg 8.be is stuck undersized for 15h, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,51]
[WRN] POOL_BACKFILLFULL: 10 pool(s) backfillfull
    pool '.mgr' is backfillfull
    pool 'cephfs.cephfs.meta' is backfillfull
    pool 'cephfs.cephfs.data' is backfillfull
    pool '.nfs' is backfillfull
    pool 'cephfs.cephfs_io_1.meta' is backfillfull
    pool 'cephfs.cephfs_io_1.data' is backfillfull
    pool '.rgw.root' is backfillfull
    pool 'default.rgw.log' is backfillfull
    pool 'default.rgw.control' is backfillfull
    pool 'default.rgw.meta' is backfillfull
[root@extensa013 ~]# 


[root@extensa013 ~]# ceph fs status
cephfs - 9 clients
======
RANK  STATE             MDS                ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs.extensa004.dwgfpl  Reqs:    0 /s  48.2k  48.2k  8211    546   
 1    active  cephfs.extensa003.otrrap  Reqs:    0 /s  12.1k  12.1k  2101     21   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata  1271M  3151G  
cephfs.cephfs.data    data    3724M  3151G  
cephfs_io_1 - 4 clients
===========
RANK  STATE                MDS                  ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs_io_1.extensa009.uagafl  Reqs:    0 /s  1344k  1344k  71.2k  4546   
          POOL             TYPE     USED  AVAIL  
cephfs.cephfs_io_1.meta  metadata  57.7G  3151G  
cephfs.cephfs_io_1.data    data    99.5T  3238G  
         STANDBY MDS           
cephfs_io_1.extensa007.ppjqjk  
   cephfs.extensa014.vetifx    
   cephfs.extensa011.ktzine    
MDS version: ceph version 17.2.6-99.el9cp (6869830013a8878a3930e23c75d8b990f6b0c491) quincy (stable)
[root@extensa013 ~]# 



Client Node : extensa013.ceph.redhat.com(root/passwd)


Version-Release number of selected component (if applicable):
[root@extensa013 ~]# ceph versions
{
    "mon": {
        "ceph version 17.2.6-99.el9cp (6869830013a8878a3930e23c75d8b990f6b0c491) quincy (stable)": 5
    },
    "mgr": {
        "ceph version 17.2.6-99.el9cp (6869830013a8878a3930e23c75d8b990f6b0c491) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.6-99.el9cp (6869830013a8878a3930e23c75d8b990f6b0c491) quincy (stable)": 51
    },
    "mds": {
        "ceph version 17.2.6-99.el9cp (6869830013a8878a3930e23c75d8b990f6b0c491) quincy (stable)": 6
    },
    "overall": {
        "ceph version 17.2.6-99.el9cp (6869830013a8878a3930e23c75d8b990f6b0c491) quincy (stable)": 64
    }
}
[root@extensa013 ~]# 


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info: