Description of problem: Quiesce on 11 subvolumes of quiesce-set TIMEDOUT after 10mins(timeout, expiration values), while IO was in-progress on each of subvolumes. IO workload details: - smallfile, crefi,wget cmds, tar cmds, dd Error snippet: 2024-04-08 03:55:52,080 (cephci.snapshot_clone.cg_snap_system_test) [INFO] - cephci.ceph.ceph.py:1563 - Running command ceph fs quiesce cephfs "subvolgroup_cg/sv_non_def_1" "subvolgroup_cg/sv_non_def_2" "subvolgroup_cg/sv_non_def_3" "subvolgroup_cg/sv_non_def_4" "subvolgroup_cg/sv_non_def_5" "subvolgroup_cg/sv_non_def_6" "subvolgroup_cg/sv_non_def_7" "subvolgroup_cg/sv_non_def_8" "subvolgroup_cg/sv_non_def_9" "subvolgroup_cg/sv_non_def_10" "subvolgroup_cg/sv_non_def_11" --format json --set-id cg_test1_0ov --await --timeout 600 --expiration 600 on 10.8.128.54 timeout 600 2024-04-08 04:03:02,087 (cephci.snapshot_clone.cg_snap_system_test) [ERROR] - cephci.ceph.ceph.py:1584 - socket.timeout doesn't give an error message 2024-04-08 04:03:02,128 (cephci.snapshot_clone.cg_snap_system_test) [INFO] - cephci.ceph.ceph.py:1532 - Command completed on 2024-04-08 04:03:02.127740 2024-04-08 04:03:02,129 (cephci.snapshot_clone.cg_snap_system_test) [ERROR] - cephci.ceph.ceph.py:1584 - socket.timeout doesn't give an error message Version-Release number of selected component (if applicable): 18.2.1-116.el9cp How reproducible: 5/5 Steps to Reproduce: 1. Create 3 quiesce set of 11 subvolumes - set1 - subvolumes from default group, set2 - subvolumes from non-default group, set3 - mix of subvolumes from default and non-default group 2. On each set, perform quiesce lifecycle - Run IO, perform quiesce and wait for quiesced state, create snapshots, perform quiesce release. Repeat for 5 times while IO in background Actual results: On set2, quiesce op TIMEOUT after 10mins. Expected results: Quiesce should succeed within given timeout Additional info: Automation run logs: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-MRROTV/cg_snap_system_test_0.log Ceph fs status before test: [root@magna046 ~]# ceph status cluster: id: 26171d20-f1d5-11ee-b6c2-002590fc2a2e health: HEALTH_OK services: mon: 3 daemons, quorum magna021,magna023,magna022 (age 4d) mgr: magna021.hzinhl(active, since 3d), standbys: magna022.cecfam mds: 2/2 daemons up, 3 standby osd: 24 osds: 24 up (since 4d), 24 in (since 4d) data: volumes: 1/1 healthy pools: 3 pools, 1041 pgs objects: 609.55k objects, 1.4 TiB usage: 4.1 TiB used, 18 TiB / 22 TiB avail pgs: 751 active+clean 272 active+clean+snaptrim_wait 18 active+clean+snaptrim io: client: 570 B/s wr, 0 op/s rd, 3 op/s wr [root@magna046 ~]# ceph orch ps NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mds.cephfs.magna022.yuqofl magna022 running (4d) 2m ago 4d 23.6M - 18.2.1-116.el9cp 3c15178790f1 f03993f481b0 mds.cephfs.magna023.qwxpru magna023 running (4d) 2m ago 4d 35.0M - 18.2.1-116.el9cp 3c15178790f1 778a4392d5c5 mds.cephfs.magna024.lqbedi magna024 running (4d) 2m ago 4d 24.5M - 18.2.1-116.el9cp 3c15178790f1 3060a9399b0f mds.cephfs.magna025.oevoux magna025 running (4d) 2m ago 4d 1331M - 18.2.1-116.el9cp 3c15178790f1 d948ceed534a mds.cephfs.magna026.qbqfcj magna026 running (4d) 2m ago 4d 23.5M - 18.2.1-116.el9cp 3c15178790f1 75649cc3efaf mgr.magna021.hzinhl magna021 *:9283,8765 running (4d) 2m ago 4d 664M - 18.2.1-116.el9cp 3c15178790f1 6bbbc49dda27 mgr.magna022.cecfam magna022 *:8443,8765 running (4d) 2m ago 4d 448M - 18.2.1-116.el9cp 3c15178790f1 c4c6117b4db8 mon.magna021 magna021 running (4d) 2m ago 4d 762M 2048M 18.2.1-116.el9cp 3c15178790f1 728702ed2a51 mon.magna022 magna022 running (4d) 2m ago 4d 750M 2048M 18.2.1-116.el9cp 3c15178790f1 36f822a6d632 mon.magna023 magna023 running (4d) 2m ago 4d 750M 2048M 18.2.1-116.el9cp 3c15178790f1 504595b0cf0f osd.0 magna025 running (4d) 2m ago 4d 2674M 6053M 18.2.1-116.el9cp 3c15178790f1 1b74b2184063 osd.1 magna023 running (4d) 2m ago 4d 2200M 5712M 18.2.1-116.el9cp 3c15178790f1 3ea2e5dfec1e osd.2 magna027 running (4d) 2m ago 4d 2704M 7419M 18.2.1-116.el9cp 3c15178790f1 739eddc97b1d osd.3 magna024 running (4d) 2m ago 4d 3012M 6053M 18.2.1-116.el9cp 3c15178790f1 869fb0c1cd8d osd.4 magna022 running (4d) 2m ago 4d 2306M 4347M 18.2.1-116.el9cp 3c15178790f1 b91063704613 osd.5 magna026 running (4d) 2m ago 4d 2454M 6053M 18.2.1-116.el9cp 3c15178790f1 5d0847254871 osd.6 magna028 running (4d) 2m ago 4d 2527M 7418M 18.2.1-116.el9cp 3c15178790f1 14a53b018094 osd.7 magna021 running (4d) 2m ago 4d 2677M 5712M 18.2.1-116.el9cp 3c15178790f1 7ab3742eced7 osd.8 magna025 running (4d) 2m ago 4d 2213M 6053M 18.2.1-116.el9cp 3c15178790f1 c490844960f7 osd.9 magna027 running (4d) 2m ago 4d 2788M 7419M 18.2.1-116.el9cp 3c15178790f1 bebdde16f310 osd.10 magna023 running (4d) 2m ago 4d 2284M 5712M 18.2.1-116.el9cp 3c15178790f1 c68ca8e0e1e0 osd.11 magna024 running (4d) 2m ago 4d 2187M 6053M 18.2.1-116.el9cp 3c15178790f1 bcc9fdb55a56 osd.12 magna026 running (4d) 2m ago 4d 2536M 6053M 18.2.1-116.el9cp 3c15178790f1 cc07d755ed96 osd.13 magna022 running (4d) 2m ago 4d 1980M 4347M 18.2.1-116.el9cp 3c15178790f1 a0c929f8830d osd.14 magna028 running (4d) 2m ago 4d 2658M 7418M 18.2.1-116.el9cp 3c15178790f1 cfc34ebb0f6d osd.15 magna021 running (4d) 2m ago 4d 2140M 5712M 18.2.1-116.el9cp 3c15178790f1 7f493a0a699d osd.16 magna025 running (4d) 2m ago 4d 2580M 6053M 18.2.1-116.el9cp 3c15178790f1 0e20420e0f29 osd.17 magna027 running (4d) 2m ago 4d 2756M 7419M 18.2.1-116.el9cp 3c15178790f1 7e5cc71eb559 osd.18 magna024 running (4d) 2m ago 4d 2654M 6053M 18.2.1-116.el9cp 3c15178790f1 d4108d7b1b5c osd.19 magna023 running (4d) 2m ago 4d 2316M 5712M 18.2.1-116.el9cp 3c15178790f1 175ce4352a0f osd.20 magna026 running (4d) 2m ago 4d 2441M 6053M 18.2.1-116.el9cp 3c15178790f1 4d71184eebf9 osd.21 magna022 running (4d) 2m ago 4d 2051M 4347M 18.2.1-116.el9cp 3c15178790f1 46c9ab26d91d osd.22 magna028 running (4d) 2m ago 4d 2973M 7418M 18.2.1-116.el9cp 3c15178790f1 31ab61682602 osd.23 magna021 running (4d) 2m ago 4d 2631M 5712M 18.2.1-116.el9cp 3c15178790f1 38e5e92ac953 [root@magna046 ~]# ceph fs status cephfs - 41 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.magna025.oevoux Reqs: 0 /s 259k 257k 47.1k 5615 1 active cephfs.magna023.qwxpru Reqs: 0 /s 102 105 62 75 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 8125M 5649G cephfs.cephfs.data data 4163G 5649G STANDBY MDS cephfs.magna024.lqbedi cephfs.magna022.yuqofl cephfs.magna026.qbqfcj MDS version: ceph version 18.2.1-116.el9cp (7709f0c4c90984d791f2f37b5672d6be5a8a6986) reef (stable) ceph fs status during test: -------------------------- [root@magna046 ~]# ceph status c cluster: id: 26171d20-f1d5-11ee-b6c2-002590fc2a2e health: HEALTH_WARN 1 MDSs report slow metadata IOs 1 MDSs report slow requests services: mon: 3 daemons, quorum magna021,magna023,magna022 (age 4d) mgr: magna021.hzinhl(active, since 3d), standbys: magna022.cecfam mds: 2/2 daemons up, 3 standby osd: 24 osds: 24 up (since 4d), 24 in (since 4d) data: volumes: 1/1 healthy pools: 3 pools, 1041 pgs objects: 630.75k objects, 1.4 TiB usage: 4.1 TiB used, 18 TiB / 22 TiB avail pgs: 1041 active+clean io: client: 15 KiB/s wr, 0 op/s rd, 0 op/s wr [root@magna046 ~]# ceph fs status cephfs - 53 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.magna025.oevoux Reqs: 5 /s 356k 354k 122k 43.8k 1 active cephfs.magna023.qwxpru Reqs: 0 /s 109 112 65 80 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 1392M 5651G cephfs.cephfs.data data 4163G 5651G STANDBY MDS cephfs.magna024.lqbedi cephfs.magna022.yuqofl cephfs.magna026.qbqfcj MDS version: ceph version 18.2.1-116.el9cp (7709f0c4c90984d791f2f37b5672d6be5a8a6986) reef (stable) [root@magna046 ~]# ceph fs dump e163 enable_multiple, ever_enabled_multiple: 1,1 default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2,12=quiesce subvolumes} legacy client fscid: 1 Filesystem 'cephfs' (1) fs_name cephfs epoch 163 flags 12 joinable allow_snaps allow_multimds_snaps created 2024-04-03T16:26:32.792097+0000 modified 2024-04-08T07:50:01.801535+0000 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 required_client_features {} last_failure 0 last_failure_osd_epoch 201 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2,12=quiesce subvolumes} max_mds 2 in 0,1 up {0=24754,1=24724} failed damaged stopped data_pools [3] metadata_pool 2 inline_data disabled balancer bal_rank_mask -1 standby_count_wanted 1 qdb_cluster leader: 24754 members: 24724,24754 [mds.cephfs.magna025.oevoux{0:24754} state up:active seq 100243 join_fscid=1 addr [v2:10.8.128.25:6824/3786581435,v1:10.8.128.25:6825/3786581435] compat {c=[1],r=[1],i=[17ff]}] [mds.cephfs.magna023.qwxpru{1:24724} state up:active seq 28 join_fscid=1 addr [v2:10.8.128.23:6824/530446013,v1:10.8.128.23:6825/530446013] compat {c=[1],r=[1],i=[17ff]}] Standby daemons: [mds.cephfs.magna024.lqbedi{-1:15843} state up:standby seq 1 join_fscid=1 addr [v2:10.8.128.24:6824/198511530,v1:10.8.128.24:6825/198511530] compat {c=[1],r=[1],i=[17ff]}] [mds.cephfs.magna022.yuqofl{-1:25304} state up:standby seq 1 join_fscid=1 addr [v2:10.8.128.22:6824/900401187,v1:10.8.128.22:6825/900401187] compat {c=[1],r=[1],i=[17ff]}] [mds.cephfs.magna026.qbqfcj{-1:25451} state up:standby seq 1 join_fscid=1 addr [v2:10.8.128.26:6824/2467846202,v1:10.8.128.26:6825/2467846202] compat {c=[1],r=[1],i=[17ff]}] dumped fsmap epoch 163 ceph fs status after test: [root@magna046 ~]# ceph status cluster: id: 26171d20-f1d5-11ee-b6c2-002590fc2a2e health: HEALTH_OK services: mon: 3 daemons, quorum magna021,magna023,magna022 (age 4d) mgr: magna021.hzinhl(active, since 3d), standbys: magna022.cecfam mds: 2/2 daemons up, 3 standby osd: 24 osds: 24 up (since 4d), 24 in (since 4d) data: volumes: 1/1 healthy pools: 3 pools, 1041 pgs objects: 807.23k objects, 1.6 TiB usage: 4.7 TiB used, 17 TiB / 22 TiB avail pgs: 1041 active+clean io: client: 937 B/s rd, 247 MiB/s wr, 4 op/s rd, 236 op/s wr [root@magna046 ~]# ceph health HEALTH_OK [root@magna046 ~]# ceph fs status cephfs - 52 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.magna025.oevoux Reqs: 21 /s 420k 418k 153k 50.6k 1 active cephfs.magna023.qwxpru Reqs: 0 /s 127 130 82 70 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 1864M 5344G cephfs.cephfs.data data 5039G 5344G STANDBY MDS cephfs.magna024.lqbedi cephfs.magna022.yuqofl cephfs.magna026.qbqfcj MDS version: ceph version 18.2.1-116.el9cp (7709f0c4c90984d791f2f37b5672d6be5a8a6986) reef (stable) [root@magna046 ~]# ceph fs subvolume info cephfs sv_non_def_1 --group-name subvolgroup_cg { "atime": "2024-04-05 10:37:57", "bytes_pcent": "66.15", "bytes_quota": 64424509440, "bytes_used": 42619114820, "created_at": "2024-04-05 10:37:57", "ctime": "2024-04-08 07:53:02", "data_pool": "cephfs.cephfs.data", "features": [ "snapshot-clone", "snapshot-autoprotect", "snapshot-retention" ], "flavor": 2, "gid": 0, "mode": 16877, "mon_addrs": [ "10.8.128.21:6789", "10.8.128.23:6789", "10.8.128.22:6789" ], "mtime": "2024-04-08 07:53:02", "path": "/volumes/subvolgroup_cg/sv_non_def_1/52ee6889-a1e1-42c4-9a1a-dd17117ee854", "pool_namespace": "", "state": "complete", "type": "subvolume", "uid": 0 } -------------------------
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3925