Bug 2271096 - [CephFS - Consistency Group] - Quiesce command failed with error EBADF on scaled config
Summary: [CephFS - Consistency Group] - Quiesce command failed with error EBADF on sca...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 7.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 7.1
Assignee: Patrick Donnelly
QA Contact: sumr
Akash Raj
URL:
Whiteboard:
Depends On:
Blocks: 2267614 2298578 2298579
TreeView+ depends on / blocked
 
Reported: 2024-03-22 16:31 UTC by sumr
Modified: 2024-07-18 07:59 UTC (History)
6 users (show)

Fixed In Version: ceph-18.2.1-108.el8cp
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-06-13 14:30:18 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 65182 0 None None None 2024-03-27 22:46:30 UTC
Red Hat Issue Tracker RHCEPH-8614 0 None None None 2024-03-22 16:34:22 UTC
Red Hat Product Errata RHSA-2024:3925 0 None None None 2024-06-13 14:30:24 UTC

Description sumr 2024-03-22 16:31:53 UTC
Description of problem:

Quiesce on 12 subvolumes with IO in-progress fails with error EBADF.

Cmd response:
ceph.ceph.CommandFailed: ceph fs quiesce cephfs  "sv_def_1"  "sv_def_2"  "sv_def_3"  "sv_def_4"  "sv_def_5"  "sv_def_6"  "sv_def_7"  "sv_def_8"  "sv_def_9"  "sv_def_10"  "sv_def_11"  "sv_def_12"  --format json  --set-id cg_test1_9e3  --await  --timeout 600  --expiration 600 Error:  Error EBADF: 
 10.8.128.51

Version-Release number of selected component (if applicable):18.2.1-76.el9cp


How reproducible:


Steps to Reproduce:
1. Start IO on 12 subvolumes of default subvolumegroup.
2. IO is run across 12 different clients, one subvolume per client.
3. On each subvolume, start smallfile,crefi and dd IO tools
4. Verify quiesce succeeds. Run Snapshot create across all subvolumes, verify and Run quiesce release.
5. Rerun quiesce on same set of subvolumes with new quiesce-set ID.

Actual results: Quiesce FAILED.


Expected results: Quiesce should succeed.


Additional info:

Test logs - http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-NVJ88R/cg_snap_system_test_0.log

Error snippet:
2024-03-22 06:58:12,034 (cephci.snapshot_clone.cg_snap_system_test) [INFO] - cephci.tests.cephfs.snapshot_clone.cg_snap_system_test.py:286 - Quiesce the set ['sv_def_1', 'sv_def_2', 'sv_def_3', 'sv_def_4', 'sv_def_5', 'sv_def_6', 'sv_def_7', 'sv_def_8', 'sv_def_9', 'sv_def_10', 'sv_def_11', 'sv_def_12']
2024-03-22 06:58:12,035 (cephci.snapshot_clone.cg_snap_system_test) [INFO] - cephci.ceph.ceph.py:1563 - Running command ceph fs quiesce cephfs  "sv_def_1"  "sv_def_2"  "sv_def_3"  "sv_def_4"  "sv_def_5"  "sv_def_6"  "sv_def_7"  "sv_def_8"  "sv_def_9"  "sv_def_10"  "sv_def_11"  "sv_def_12"  --format json  --set-id cg_test1_9e3  --await  --timeout 600  --expiration 600 on 10.8.128.51 timeout 600
2024-03-22 06:58:23,067 (cephci.snapshot_clone.cg_snap_system_test) [ERROR] - cephci.ceph.ceph.py:1599 - Error 9 during cmd, timeout 600
2024-03-22 06:58:23,076 (cephci.snapshot_clone.cg_snap_system_test) [ERROR] - cephci.ceph.ceph.py:1600 - Error EBADF: 

Quiesce status:
{
    "epoch": 113,
    "leader": 94113,
    "set_version": 184,
    "sets": {
        "cg_test1_9e3": {
            "version": 184,
            "age_ref": 279.2,
            "state": {
                "name": "FAILED",
                "age": 0.0
            },
            "timeout": 600.0,
            "expiration": 600.0,
            "members": {
                "file:/volumes/_nogroup/sv_def_11/d978d32f-8198-469e-a2ca-92f29a797c22": {
                    "excluded": false,
                    "state": {
                        "name": "FAILED",
                        "age": 0.0
                    }
                },
                "file:/volumes/_nogroup/sv_def_12/04c8e773-7d17-45ea-984b-e1c9aa3c23fb": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 9.8
                    }
                },
                "file:/volumes/_nogroup/sv_def_8/704c9512-fb79-48ca-8116-bbe04df7b8c3": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 9.8
                    }
                },
                "file:/volumes/_nogroup/sv_def_9/4ef3c19e-7126-4249-9bf1-523d8c2cb175": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 9.8
                    }
                },
                "file:/volumes/_nogroup/sv_def_5/a22e7656-8cac-4ae8-b4cc-7d9a12a6be10": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 9.8
                    }
                },
                "file:/volumes/_nogroup/sv_def_2/317da083-7c1c-40ab-8d97-6ab386331220": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 9.8
                    }
                },
                "file:/volumes/_nogroup/sv_def_7/95cc9c67-c7dc-4320-890e-61fe54ac9819": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 9.8
                    }
                },
                "file:/volumes/_nogroup/sv_def_6/1588945c-67f9-45f0-bab1-d64316caefe2": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 9.8
                    }
                },
                "file:/volumes/_nogroup/sv_def_4/17e671a8-c132-489a-ab2a-af7e2cfe962f": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 9.8
                    }
                },
                "file:/volumes/_nogroup/sv_def_3/78d7d202-1ef8-4b5f-a9c0-75c3aff8a52c": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 9.8
                    }
                },
                "file:/volumes/_nogroup/sv_def_10/6a71c93a-ec33-4960-a10c-894fcab36255": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 9.8
                    }
                },
                "file:/volumes/_nogroup/sv_def_1/04ce7492-ec2c-442d-8f0e-a4983c8442a5": {
                    "excluded": false,
                    "state": {
                        "name": "QUIESCING",
                        "age": 9.8
                    }
                }
            }
        }
    }
}

MDS log snippet:
2024-03-22T10:58:13.261+0000 7f85e51fb640  1 mds.cephfs.magna024.zrnhql asok_command: quiesce db {await=1,expiration=600,format=json,members=[sv_def_1,sv_def_2,sv_def_3,sv_def_4,sv_def_5,sv_def_6,sv_def_7,sv_def_8,sv_def_9,sv_def_10,sv_def_11,sv_def_12],prefix=quiesce db,roots=[/volumes/_nogroup/sv_def_1/04ce7492-ec2c-442d-8f0e-a4983c8442a5,/volumes/_nogroup/sv_def_2/317da083-7c1c-40ab-8d97-6ab386331220,/volumes/_nogroup/sv_def_3/78d7d202-1ef8-4b5f-a9c0-75c3aff8a52c,/volumes/_nogroup/sv_def_4/17e671a8-c132-489a-ab2a-af7e2cfe962f,/volumes/_nogroup/sv_def_5/a22e7656-8cac-4ae8-b4cc-7d9a12a6be10,/volumes/_nogroup/sv_def_6/1588945c-67f9-45f0-bab1-d64316caefe2,/volumes/_nogroup/sv_def_7/95cc9c67-c7dc-4320-890e-61fe54ac9819,/volumes/_nogroup/sv_def_8/704c9512-fb79-48ca-8116-bbe04df7b8c3,/volumes/_nogroup/sv_def_9/4ef3c19e-7126-4249-9bf1-523d8c2cb175,/volumes/_nogroup/sv_def_10/6a71c93a-ec33-4960-a10c-894fcab36255,/volumes/_nogroup/sv_def_11/d978d32f-8198-469e-a2ca-92f29a797c22,/volumes/_nogroup/sv_def_12/04c8e773-7d17-45ea-984b-e1c9aa3c23fb],set_id=cg_test1_9e3,target=[mon-mgr,],timeout=600,vol_name=cephfs} (starting...)
2024-03-22T10:58:23.043+0000 7f85d99e4640 10 quiesce.mgr.94113 <leader_upkeep_awaits> completing an await for the set 'cg_test1_9e3' with rc: 9


System logs with locks-info will be copied.

Comment 23 Leonid Usov 2024-05-09 09:41:50 UTC
No documentation / release notes update is required for this issue

Comment 24 errata-xmlrpc 2024-06-13 14:30:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925


Note You need to log in before you can comment on or make changes to this bug.