| Summary: |
OSD nodes are hung after initiating random writes to the pool |
| Product: |
[Red Hat Storage] Red Hat Ceph Storage
|
Reporter: |
skanta |
| Component: |
RADOS | Assignee: |
Nitzan mordechai <nmordech> |
| Status: |
NEW
---
|
QA Contact: |
skanta |
| Severity: |
high
|
Docs Contact: |
|
| Priority: |
unspecified
|
|
|
| Version: |
6.0 | CC: |
akupczyk, amathuri, bhubbard, ceph-eng-bugs, cephqe-warriors, choffman, ksirivad, lflores, nojha, pdhange, rfriedma, rzarzyns, sseshasa, vereddy, vumrao
|
| Target Milestone: |
--- | |
|
| Target Release: |
8.0 | |
|
| Hardware: |
Unspecified | |
|
| OS: |
Unspecified | |
|
| Whiteboard: |
|
|
Fixed In Version:
|
|
Doc Type:
|
If docs needed, set a value
|
|
Doc Text:
|
|
Story Points:
|
---
|
|
Clone Of:
|
|
Environment:
|
|
|
Last Closed:
|
|
Type:
|
Bug
|
|
Regression:
|
---
|
Mount Type:
|
---
|
|
Documentation:
|
---
|
CRM:
|
|
|
Verified Versions:
|
|
Category:
|
---
|
|
oVirt Team:
|
---
|
RHEL 7.3 requirements from Atomic Host:
|
|
|
Cloudforms Team:
|
---
|
Target Upstream Version:
|
|
|
Embargoed:
|
|
| |
| Attachments: |
|
Created attachment 1904589 [details] Top output on OSD node Description of problem: OSD nodes are hung and OSD are down after performing the random write operations on the cluster. Notice that CPU utilization is more. [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.48798 root default -9 0.09760 host ceph-skanta-yi6hpl-node10 12 hdd 0.02440 osd.12 down 1.00000 1.00000 14 hdd 0.02440 osd.14 down 1.00000 1.00000 16 hdd 0.02440 osd.16 down 1.00000 1.00000 18 hdd 0.02440 osd.18 down 1.00000 1.00000 -7 0.09760 host ceph-skanta-yi6hpl-node3 2 hdd 0.02440 osd.2 up 1.00000 1.00000 5 hdd 0.02440 osd.5 up 1.00000 1.00000 8 hdd 0.02440 osd.8 up 1.00000 1.00000 11 hdd 0.02440 osd.11 up 1.00000 1.00000 -3 0.09760 host ceph-skanta-yi6hpl-node4 1 hdd 0.02440 osd.1 down 1.00000 1.00000 3 hdd 0.02440 osd.3 down 1.00000 1.00000 7 hdd 0.02440 osd.7 down 1.00000 1.00000 10 hdd 0.02440 osd.10 down 1.00000 1.00000 -5 0.09760 host ceph-skanta-yi6hpl-node5 0 hdd 0.02440 osd.0 down 1.00000 1.00000 4 hdd 0.02440 osd.4 down 1.00000 1.00000 6 hdd 0.02440 osd.6 down 1.00000 1.00000 9 hdd 0.02440 osd.9 down 1.00000 1.00000 -11 0.09760 host ceph-skanta-yi6hpl-node9 13 hdd 0.02440 osd.13 down 0 1.00000 15 hdd 0.02440 osd.15 down 0 1.00000 17 hdd 0.02440 osd.17 up 1.00000 1.00000 19 hdd 0.02440 osd.19 up 1.00000 1.00000 [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 350 GiB 183 GiB 167 GiB 167 GiB 47.70 TOTAL 350 GiB 183 GiB 167 GiB 167 GiB 47.70 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL .mgr 1 1 897 KiB 2 1.8 MiB 0 109 GiB cephfs.cephfs_Qos.meta 2 16 1.2 MiB 25 2.6 MiB 0 109 GiB cephfs.cephfs_Qos.data 3 512 102 GiB 17.39k 206 GiB 48.46 108 GiB scrub_pool 4 32 0 B 0 0 B 0 73 GiB recovery_pool 5 32 0 B 0 0 B 0 73 GiB [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# Version-Release number of selected component (if applicable): [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# ceph -v ceph version 17.2.2-1.el9cp (27ec6f23923e162bf6e6e48c8b789cf18fee6f31) quincy (stable) [ceph: root@ceph-skanta-yi6hpl-node1-installer /]# How reproducible: Steps to Reproduce: 1. Configure cluster 2. Perform random writes by using fio. Command- fio --directory=/mnt/cephfs_Qos -direct=1 -iodepth 64 -thread -rw=randwrite --end_fsync=0 -ioengine=libaio -bs=4096 -size=16384M --norandommap -numjobs=1 -runtime=600 --time_based --invalidate=0 -group_reporting -name=ceph_fs_Qos_4M --write_iops_log=/tmp/cephfs/Fio/output.0 --write_bw_log=/tmp/cephfs/Fio/output.0 --write_lat_log=/tmp/cephfs/Fio/output.0 --log_avg_msec=100 --write_hist_log=/tmp/cephfs/Fio/output.0 --output-format=json,normal > /tmp/cephfs/Fio/output.0