Created attachment 1904589 [details] Top output on OSD node Description of problem: OSD nodes are hung and OSD are down after performing the random write operations on the cluster. Notice that CPU utilization is more. [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.48798 root default -9 0.09760 host ceph-skanta-yi6hpl-node10 12 hdd 0.02440 osd.12 down 1.00000 1.00000 14 hdd 0.02440 osd.14 down 1.00000 1.00000 16 hdd 0.02440 osd.16 down 1.00000 1.00000 18 hdd 0.02440 osd.18 down 1.00000 1.00000 -7 0.09760 host ceph-skanta-yi6hpl-node3 2 hdd 0.02440 osd.2 up 1.00000 1.00000 5 hdd 0.02440 osd.5 up 1.00000 1.00000 8 hdd 0.02440 osd.8 up 1.00000 1.00000 11 hdd 0.02440 osd.11 up 1.00000 1.00000 -3 0.09760 host ceph-skanta-yi6hpl-node4 1 hdd 0.02440 osd.1 down 1.00000 1.00000 3 hdd 0.02440 osd.3 down 1.00000 1.00000 7 hdd 0.02440 osd.7 down 1.00000 1.00000 10 hdd 0.02440 osd.10 down 1.00000 1.00000 -5 0.09760 host ceph-skanta-yi6hpl-node5 0 hdd 0.02440 osd.0 down 1.00000 1.00000 4 hdd 0.02440 osd.4 down 1.00000 1.00000 6 hdd 0.02440 osd.6 down 1.00000 1.00000 9 hdd 0.02440 osd.9 down 1.00000 1.00000 -11 0.09760 host ceph-skanta-yi6hpl-node9 13 hdd 0.02440 osd.13 down 0 1.00000 15 hdd 0.02440 osd.15 down 0 1.00000 17 hdd 0.02440 osd.17 up 1.00000 1.00000 19 hdd 0.02440 osd.19 up 1.00000 1.00000 [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 350 GiB 183 GiB 167 GiB 167 GiB 47.70 TOTAL 350 GiB 183 GiB 167 GiB 167 GiB 47.70 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL .mgr 1 1 897 KiB 2 1.8 MiB 0 109 GiB cephfs.cephfs_Qos.meta 2 16 1.2 MiB 25 2.6 MiB 0 109 GiB cephfs.cephfs_Qos.data 3 512 102 GiB 17.39k 206 GiB 48.46 108 GiB scrub_pool 4 32 0 B 0 0 B 0 73 GiB recovery_pool 5 32 0 B 0 0 B 0 73 GiB [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# Version-Release number of selected component (if applicable): [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# ceph -v ceph version 17.2.2-1.el9cp (27ec6f23923e162bf6e6e48c8b789cf18fee6f31) quincy (stable) [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# How reproducible: Steps to Reproduce: 1. Configure cluster 2. Perform random writes by using fio. Command- fio --directory=/mnt/cephfs_Qos -direct=1 -iodepth 64 -thread -rw=randwrite --end_fsync=0 -ioengine=libaio -bs=4096 -size=16384M --norandommap -numjobs=1 -runtime=600 --time_based --invalidate=0 -group_reporting -name=ceph_fs_Qos_4M --write_iops_log=/tmp/cephfs/Fio/output.0 --write_bw_log=/tmp/cephfs/Fio/output.0 --write_lat_log=/tmp/cephfs/Fio/output.0 --log_avg_msec=100 --write_hist_log=/tmp/cephfs/Fio/output.0 --output-format=json,normal > /tmp/cephfs/Fio/output.0