Description of problem: Quiesce on 12 subvolumes with IO in-progress fails with error EBADF. Cmd response: ceph.ceph.CommandFailed: ceph fs quiesce cephfs "sv_def_1" "sv_def_2" "sv_def_3" "sv_def_4" "sv_def_5" "sv_def_6" "sv_def_7" "sv_def_8" "sv_def_9" "sv_def_10" "sv_def_11" "sv_def_12" --format json --set-id cg_test1_9e3 --await --timeout 600 --expiration 600 Error: Error EBADF: 10.8.128.51 Version-Release number of selected component (if applicable):18.2.1-76.el9cp How reproducible: Steps to Reproduce: 1. Start IO on 12 subvolumes of default subvolumegroup. 2. IO is run across 12 different clients, one subvolume per client. 3. On each subvolume, start smallfile,crefi and dd IO tools 4. Verify quiesce succeeds. Run Snapshot create across all subvolumes, verify and Run quiesce release. 5. Rerun quiesce on same set of subvolumes with new quiesce-set ID. Actual results: Quiesce FAILED. Expected results: Quiesce should succeed. Additional info: Test logs - http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-NVJ88R/cg_snap_system_test_0.log Error snippet: 2024-03-22 06:58:12,034 (cephci.snapshot_clone.cg_snap_system_test) [INFO] - cephci.tests.cephfs.snapshot_clone.cg_snap_system_test.py:286 - Quiesce the set ['sv_def_1', 'sv_def_2', 'sv_def_3', 'sv_def_4', 'sv_def_5', 'sv_def_6', 'sv_def_7', 'sv_def_8', 'sv_def_9', 'sv_def_10', 'sv_def_11', 'sv_def_12'] 2024-03-22 06:58:12,035 (cephci.snapshot_clone.cg_snap_system_test) [INFO] - cephci.ceph.ceph.py:1563 - Running command ceph fs quiesce cephfs "sv_def_1" "sv_def_2" "sv_def_3" "sv_def_4" "sv_def_5" "sv_def_6" "sv_def_7" "sv_def_8" "sv_def_9" "sv_def_10" "sv_def_11" "sv_def_12" --format json --set-id cg_test1_9e3 --await --timeout 600 --expiration 600 on 10.8.128.51 timeout 600 2024-03-22 06:58:23,067 (cephci.snapshot_clone.cg_snap_system_test) [ERROR] - cephci.ceph.ceph.py:1599 - Error 9 during cmd, timeout 600 2024-03-22 06:58:23,076 (cephci.snapshot_clone.cg_snap_system_test) [ERROR] - cephci.ceph.ceph.py:1600 - Error EBADF: Quiesce status: { "epoch": 113, "leader": 94113, "set_version": 184, "sets": { "cg_test1_9e3": { "version": 184, "age_ref": 279.2, "state": { "name": "FAILED", "age": 0.0 }, "timeout": 600.0, "expiration": 600.0, "members": { "file:/volumes/_nogroup/sv_def_11/d978d32f-8198-469e-a2ca-92f29a797c22": { "excluded": false, "state": { "name": "FAILED", "age": 0.0 } }, "file:/volumes/_nogroup/sv_def_12/04c8e773-7d17-45ea-984b-e1c9aa3c23fb": { "excluded": false, "state": { "name": "QUIESCING", "age": 9.8 } }, "file:/volumes/_nogroup/sv_def_8/704c9512-fb79-48ca-8116-bbe04df7b8c3": { "excluded": false, "state": { "name": "QUIESCING", "age": 9.8 } }, "file:/volumes/_nogroup/sv_def_9/4ef3c19e-7126-4249-9bf1-523d8c2cb175": { "excluded": false, "state": { "name": "QUIESCING", "age": 9.8 } }, "file:/volumes/_nogroup/sv_def_5/a22e7656-8cac-4ae8-b4cc-7d9a12a6be10": { "excluded": false, "state": { "name": "QUIESCING", "age": 9.8 } }, "file:/volumes/_nogroup/sv_def_2/317da083-7c1c-40ab-8d97-6ab386331220": { "excluded": false, "state": { "name": "QUIESCING", "age": 9.8 } }, "file:/volumes/_nogroup/sv_def_7/95cc9c67-c7dc-4320-890e-61fe54ac9819": { "excluded": false, "state": { "name": "QUIESCING", "age": 9.8 } }, "file:/volumes/_nogroup/sv_def_6/1588945c-67f9-45f0-bab1-d64316caefe2": { "excluded": false, "state": { "name": "QUIESCING", "age": 9.8 } }, "file:/volumes/_nogroup/sv_def_4/17e671a8-c132-489a-ab2a-af7e2cfe962f": { "excluded": false, "state": { "name": "QUIESCING", "age": 9.8 } }, "file:/volumes/_nogroup/sv_def_3/78d7d202-1ef8-4b5f-a9c0-75c3aff8a52c": { "excluded": false, "state": { "name": "QUIESCING", "age": 9.8 } }, "file:/volumes/_nogroup/sv_def_10/6a71c93a-ec33-4960-a10c-894fcab36255": { "excluded": false, "state": { "name": "QUIESCING", "age": 9.8 } }, "file:/volumes/_nogroup/sv_def_1/04ce7492-ec2c-442d-8f0e-a4983c8442a5": { "excluded": false, "state": { "name": "QUIESCING", "age": 9.8 } } } } } } MDS log snippet: 2024-03-22T10:58:13.261+0000 7f85e51fb640 1 mds.cephfs.magna024.zrnhql asok_command: quiesce db {await=1,expiration=600,format=json,members=[sv_def_1,sv_def_2,sv_def_3,sv_def_4,sv_def_5,sv_def_6,sv_def_7,sv_def_8,sv_def_9,sv_def_10,sv_def_11,sv_def_12],prefix=quiesce db,roots=[/volumes/_nogroup/sv_def_1/04ce7492-ec2c-442d-8f0e-a4983c8442a5,/volumes/_nogroup/sv_def_2/317da083-7c1c-40ab-8d97-6ab386331220,/volumes/_nogroup/sv_def_3/78d7d202-1ef8-4b5f-a9c0-75c3aff8a52c,/volumes/_nogroup/sv_def_4/17e671a8-c132-489a-ab2a-af7e2cfe962f,/volumes/_nogroup/sv_def_5/a22e7656-8cac-4ae8-b4cc-7d9a12a6be10,/volumes/_nogroup/sv_def_6/1588945c-67f9-45f0-bab1-d64316caefe2,/volumes/_nogroup/sv_def_7/95cc9c67-c7dc-4320-890e-61fe54ac9819,/volumes/_nogroup/sv_def_8/704c9512-fb79-48ca-8116-bbe04df7b8c3,/volumes/_nogroup/sv_def_9/4ef3c19e-7126-4249-9bf1-523d8c2cb175,/volumes/_nogroup/sv_def_10/6a71c93a-ec33-4960-a10c-894fcab36255,/volumes/_nogroup/sv_def_11/d978d32f-8198-469e-a2ca-92f29a797c22,/volumes/_nogroup/sv_def_12/04c8e773-7d17-45ea-984b-e1c9aa3c23fb],set_id=cg_test1_9e3,target=[mon-mgr,],timeout=600,vol_name=cephfs} (starting...) 2024-03-22T10:58:23.043+0000 7f85d99e4640 10 quiesce.mgr.94113 <leader_upkeep_awaits> completing an await for the set 'cg_test1_9e3' with rc: 9 System logs with locks-info will be copied.
No documentation / release notes update is required for this issue
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3925