Description of problem: Quiesce may timeout How reproducible: Hard to reproduce, but can be caught with high probability given the right workload. See the linked upstream tickets
Test Plan: 1. Run functional and systemic regression tests for CG quiesce 2. On repeat, Perform the below ops, > Set authrules to subvolume, pin subvolume test_dir to a mds rank, perform dir rename > Parallel quiesce calls to same set 3. Verify if debug params to quiesce cmds have been removed
Verified fix on ceph build 18.2.1-194.el9cp. FUNCTIONAL REGRESSION TESTS --------------------------- http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-BN7JSV/ SYSTEMIC REGRESSION TESTS ------------------------- SCALE TEST - http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-68WUGA STRESS TEST: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1QDVX1/cg_snap_system_test_0.log PERFORM FS OPS in parallel to Quiesce ------------------------------------- Set authrules to subvolume, pin subvolume test_dir to a mds rank, perform dir rename : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-3XROAN Parallel quiesce calls to same set ---------------------------------- http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-Z1HB2V/cg_snap_test_0.log Verify debug param ?q=<secs> not working [root@ceph-sumar-regression-narjcq-node8 ~]# ceph fs quiesce cephfs --set-id cg_dbg_params1 sv1?q=5 sv2?q=5 sv3?q=5 --timeout 300 --expiration 300 { "epoch": 290, "leader": 44133, "set_version": 2234, "sets": { "cg_dbg_params1": { "version": 2234, "age_ref": 0.0, "state": { "name": "QUIESCING", "age": 0.0 }, "timeout": 300.0, "expiration": 300.0, "members": { "file:/volumes/_nogroup/sv3/02b68c49-4327-4309-9417-cd85f629f8a5?q=5": { "excluded": false, "state": { "name": "QUIESCING", "age": 0.0 } }, "file:/volumes/_nogroup/sv2/a7cc3735-6a4d-4ebd-9e67-b60bf9b80e10?q=5": { "excluded": false, "state": { "name": "QUIESCING", "age": 0.0 } }, "file:/volumes/_nogroup/sv1/1936ca82-f30e-4e88-94f6-fed218be72d2?q=5": { "excluded": false, "state": { "name": "QUIESCING", "age": 0.0 } } } } } } [root@ceph-sumar-regression-narjcq-node8 ~]# ceph fs quiesce cephfs --query --set-id cg_dbg_params1 { "epoch": 290, "leader": 44133, "set_version": 2237, "sets": { "cg_dbg_params1": { "version": 2237, "age_ref": 0.0, "state": { "name": "QUIESCED", "age": 2.5 }, "timeout": 300.0, "expiration": 300.0, "members": { "file:/volumes/_nogroup/sv3/02b68c49-4327-4309-9417-cd85f629f8a5?q=5": { "excluded": false, "state": { "name": "QUIESCED", "age": 2.5 } }, "file:/volumes/_nogroup/sv2/a7cc3735-6a4d-4ebd-9e67-b60bf9b80e10?q=5": { "excluded": false, "state": { "name": "QUIESCED", "age": 2.5 } }, "file:/volumes/_nogroup/sv1/1936ca82-f30e-4e88-94f6-fed218be72d2?q=5": { "excluded": false, "state": { "name": "QUIESCED", "age": 2.5 } } } } } } [root@ceph-sumar-regression-narjcq-node8 ~]# Marking the BZ as Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3925