Bug 2117093

Summary: OSD nodes are hung after initiating random writes to the pool
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: skanta
Component: RADOSAssignee: Nitzan mordechai <nmordech>
Status: NEW --- QA Contact: skanta
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0CC: akupczyk, amathuri, bhubbard, ceph-eng-bugs, cephqe-warriors, choffman, ksirivad, lflores, nojha, pdhange, rfriedma, rzarzyns, sseshasa, vereddy, vumrao
Target Milestone: ---   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Top output on OSD node none

Description skanta 2022-08-10 00:54:54 UTC
Created attachment 1904589 [details]
Top output on OSD node

Description of problem:

   OSD nodes are hung and OSD are down  after performing the random write operations on the cluster. Notice that CPU utilization is more.

[ceph: root@ceph-skanta-yi6hpl-node1-installer /]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                           STATUS  REWEIGHT  PRI-AFF
 -1         0.48798  root default                                                 
 -9         0.09760      host ceph-skanta-yi6hpl-node10                           
 12    hdd  0.02440          osd.12                        down   1.00000  1.00000
 14    hdd  0.02440          osd.14                        down   1.00000  1.00000
 16    hdd  0.02440          osd.16                        down   1.00000  1.00000
 18    hdd  0.02440          osd.18                        down   1.00000  1.00000
 -7         0.09760      host ceph-skanta-yi6hpl-node3                            
  2    hdd  0.02440          osd.2                           up   1.00000  1.00000
  5    hdd  0.02440          osd.5                           up   1.00000  1.00000
  8    hdd  0.02440          osd.8                           up   1.00000  1.00000
 11    hdd  0.02440          osd.11                          up   1.00000  1.00000
 -3         0.09760      host ceph-skanta-yi6hpl-node4                            
  1    hdd  0.02440          osd.1                         down   1.00000  1.00000
  3    hdd  0.02440          osd.3                         down   1.00000  1.00000
  7    hdd  0.02440          osd.7                         down   1.00000  1.00000
 10    hdd  0.02440          osd.10                        down   1.00000  1.00000
 -5         0.09760      host ceph-skanta-yi6hpl-node5                            
  0    hdd  0.02440          osd.0                         down   1.00000  1.00000
  4    hdd  0.02440          osd.4                         down   1.00000  1.00000
  6    hdd  0.02440          osd.6                         down   1.00000  1.00000
  9    hdd  0.02440          osd.9                         down   1.00000  1.00000
-11         0.09760      host ceph-skanta-yi6hpl-node9                            
 13    hdd  0.02440          osd.13                        down         0  1.00000
 15    hdd  0.02440          osd.15                        down         0  1.00000
 17    hdd  0.02440          osd.17                          up   1.00000  1.00000
 19    hdd  0.02440          osd.19                          up   1.00000  1.00000
[ceph: root@ceph-skanta-yi6hpl-node1-installer /]#

 
[ceph: root@ceph-skanta-yi6hpl-node1-installer /]# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    350 GiB  183 GiB  167 GiB   167 GiB      47.70
TOTAL  350 GiB  183 GiB  167 GiB   167 GiB      47.70
 
--- POOLS ---
POOL                    ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                     1    1  897 KiB        2  1.8 MiB      0    109 GiB
cephfs.cephfs_Qos.meta   2   16  1.2 MiB       25  2.6 MiB      0    109 GiB
cephfs.cephfs_Qos.data   3  512  102 GiB   17.39k  206 GiB  48.46    108 GiB
scrub_pool               4   32      0 B        0      0 B      0     73 GiB
recovery_pool            5   32      0 B        0      0 B      0     73 GiB
[ceph: root@ceph-skanta-yi6hpl-node1-installer /]#


 

Version-Release number of selected component (if applicable):
[ceph: root@ceph-skanta-yi6hpl-node1-installer /]# ceph -v
ceph version 17.2.2-1.el9cp (27ec6f23923e162bf6e6e48c8b789cf18fee6f31) quincy (stable)
[ceph: root@ceph-skanta-yi6hpl-node1-installer /]#

How reproducible:


Steps to Reproduce:
1. Configure cluster

2. Perform random writes by using fio.
    Command- fio --directory=/mnt/cephfs_Qos  -direct=1 -iodepth 64 -thread -rw=randwrite  --end_fsync=0 -ioengine=libaio -bs=4096 -size=16384M --norandommap -numjobs=1 -runtime=600 --time_based --invalidate=0 -group_reporting -name=ceph_fs_Qos_4M --write_iops_log=/tmp/cephfs/Fio/output.0 --write_bw_log=/tmp/cephfs/Fio/output.0 --write_lat_log=/tmp/cephfs/Fio/output.0 --log_avg_msec=100 --write_hist_log=/tmp/cephfs/Fio/output.0 --output-format=json,normal > /tmp/cephfs/Fio/output.0