Description of problem: CG quiesce on quiesce set having 10 subvolumes from non-default group, timedout after 10mins with member quiesce timeout on sv_non_def_1 returning 'rc 110'. Cmd: 2024-08-17 23:57:50,495 (cephci.snapshot_clone.cg_snap_system_test) [INFO] - cephci.RH.8.0.rhel-9.Weekly.19.1.0-22.cephfs.4.cephci.ceph.ceph.py:1596 - Running command ceph fs quiesce cephfs "subvolgroup_cg/sv_non_def_1" "subvolgroup_cg/sv_non_def_2" "subvolgroup_cg/sv_non_def_3" "subvolgroup_cg/sv_non_def_4" "subvolgroup_cg/sv_non_def_5" "subvolgroup_cg/sv_non_def_6" "subvolgroup_cg/sv_non_def_7" "subvolgroup_cg/sv_non_def_8" "subvolgroup_cg/sv_non_def_9" "subvolgroup_cg/sv_non_def_10" --format json --set-id cg_scale_f6fc --await --timeout 600 --expiration 600 on 10.0.195.112 timeout 600 Response: File "/home/jenkins/ceph-builds/openstack/RH/8.0/rhel-9/Weekly/19.1.0-22/cephfs/4/cephci/tests/cephfs/snapshot_clone/cg_snap_system_test.py", line 368, in cg_scale cg_snap_util.cg_quiesce( File "/home/jenkins/ceph-builds/openstack/RH/8.0/rhel-9/Weekly/19.1.0-22/cephfs/4/cephci/tests/cephfs/snapshot_clone/cg_snap_utils.py", line 150, in cg_quiesce out, rc = client.exec_command( File "/home/jenkins/ceph-builds/openstack/RH/8.0/rhel-9/Weekly/19.1.0-22/cephfs/4/cephci/ceph/ceph.py", line 2226, in exec_command return self.node.exec_command(cmd=cmd, **kw) File "/home/jenkins/ceph-builds/openstack/RH/8.0/rhel-9/Weekly/19.1.0-22/cephfs/4/cephci/ceph/ceph.py", line 1619, in exec_command raise SocketTimeoutException(sock_err) ceph.ceph.SocketTimeoutException MDS Debug log snippet: ceph-mds.cephfs.ceph-weekly-0u6921-lp59i1-node2.jjzygd.log:2024-08-18T03:57:09.618+0000 7f2f33e37640 1 mds.cephfs.ceph-weekly-0u6921-lp59i1-node2.jjzygd asok_command: quiesce db {await=1,expiration=600,format=json,members=[subvolgroup_cg/sv_non_def_1,subvolgroup_cg/sv_non_def_2,subvolgroup_cg/sv_non_def_3,subvolgroup_cg/sv_non_def_4,subvolgroup_cg/sv_non_def_5,subvolgroup_cg/sv_non_def_6,subvolgroup_cg/sv_non_def_7,subvolgroup_cg/sv_non_def_8,subvolgroup_cg/sv_non_def_9,subvolgroup_cg/sv_non_def_10],prefix=quiesce db,roots=[/volumes/subvolgroup_cg/sv_non_def_1/54e6c36f-e555-49da-ab07-2475dc235f0c,/volumes/subvolgroup_cg/sv_non_def_2/c0e146ce-f5d6-433b-8d1c-d3b34073aa4d,/volumes/subvolgroup_cg/sv_non_def_3/f0c14e16-3719-450b-8d63-91916ddf2381,/volumes/subvolgroup_cg/sv_non_def_4/6adbb35d-7a7d-49e1-9b2c-5f73e30ddfc1,/volumes/subvolgroup_cg/sv_non_def_5/4ec557ae-98d2-49b1-bd1a-50dd513c8a94,/volumes/subvolgroup_cg/sv_non_def_6/3c468a22-358b-4c4a-99d6-e021bfc16792,/volumes/subvolgroup_cg/sv_non_def_7/d359a401-a70b-4ecc-942c-58c24f6a11ec,/volumes/subvolgroup_cg/sv_non_def_8/f19885c8-5601-442c-b428-7f67daf81904,/volumes/subvolgroup_cg/sv_non_def_9/d9b65656-a866-47c1-9f38-d5113ba07b79,/volumes/subvolgroup_cg/sv_non_def_10/f75f73dc-dfb6-4cc1-a85c-e7ddf8ab7050],set_id=cg_scale_f6fc,target=[mon-mgr,],timeout=600,vol_name=cephfs} (starting...) ceph-mds.cephfs.ceph-weekly-0u6921-lp59i1-node2.jjzygd.log:2024-08-18T04:07:09.623+0000 7f2f28e21640 10 quiesce.mgr.24284 <leader_upkeep_set> [cg_scale_f6fc@10,file:/volumes/subvolgroup_cg/sv_non_def_1/54e6c36f-e555-49da-ab07-2475dc235f0c] detected a member quiesce timeout ceph-mds.cephfs.ceph-weekly-0u6921-lp59i1-node2.jjzygd.log:2024-08-18T04:07:09.623+0000 7f2f28e21640 10 quiesce.mgr.24284 <leader_upkeep_awaits> completing an await for the set 'cg_scale_f6fc' with rc: 110 Version-Release number of selected component (if applicable): 19.1.0-22 How reproducible: Steps to Reproduce: 1. Create a quiesce set of 10subvolumes 2. Run IO across all subvolumes from 10 different clients 3. While IO in-progress, perform quiesce on quiesce set with timeout and expiration set to 600secs. Actual results: Quiesce should succeed in given timeout. Expected results: Quiesce timedout on a member after 10mins. Additional info: Automation logs: http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/8.0/rhel-9/Weekly/19.1.0-22/cephfs/4/tier-1_cephfs_cg_quiesce_systemic/cg_snap_system_test_0.log MDS and OSD Debug logs at magna002 : /ceph/cephci-jenkins/results/openstack/RH/8.0/rhel-9/Weekly/19.1.0-22/cephfs/4/tier-1_cephfs_cg_quiesce_systemic/ceph_logs/ Please let me know if any additional information required.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:10216