Bug 2108656
Summary: | mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version()) | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Patrick Donnelly <pdonnell> |
Component: | CephFS | Assignee: | Xiubo Li <xiubli> |
Status: | CLOSED ERRATA | QA Contact: | Amarnath <amk> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 5.1 | CC: | akraj, amk, ceph-eng-bugs, cephqe-warriors, tserlin, vereddy, xiubli |
Target Milestone: | --- | ||
Target Release: | 5.2 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | ceph-16.2.8-80.el8cp | Doc Type: | Bug Fix |
Doc Text: |
.MDSs no longer crash when fetching unlinked directories
Previously, when fetching unlinked directories, the projected version would be incorrectly initialized, causing MDSs to crash when performing sanity checks.
With this fix, the projected version and the inode version are initialized when fetching an unlinked directory, allowing the MDSs to perform sanity checks without crashing.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-09 17:39:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2071085, 2102272 |
Description
Patrick Donnelly
2022-07-19 16:12:30 UTC
Hi Xiubo, As part of QE Verification for this, I am following the test_migrate_unlinked_dir test method that has been pushed as part of this MR. Steps Followed : 1. Create FS and set max_mds to 2 [cephuser@ceph-fix-amk-qqjwnv-node7 ~]$ sudo -i [root@ceph-fix-amk-qqjwnv-node7 ~]# ceph -s cluster: id: 8382e3e2-07ec-11ed-ac3e-fa163e005463 health: HEALTH_OK services: mon: 3 daemons, quorum ceph-fix-amk-qqjwnv-node1-installer,ceph-fix-amk-qqjwnv-node2,ceph-fix-amk-qqjwnv-node3 (age 30h) mgr: ceph-fix-amk-qqjwnv-node1-installer.yuneqm(active, since 30h), standbys: ceph-fix-amk-qqjwnv-node2.ncahej mds: 2/2 daemons up, 1 standby osd: 12 osds: 12 up (since 30h), 12 in (since 30h) data: volumes: 1/1 healthy pools: 3 pools, 65 pgs objects: 42 objects, 108 KiB usage: 152 MiB used, 180 GiB / 180 GiB avail pgs: 65 active+clean [root@ceph-fix-amk-qqjwnv-node7 ~]# ceph fs status cephfs - 0 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.ceph-fix-amk-qqjwnv-node4.xrhqhd Reqs: 0 /s 14 13 12 0 1 active cephfs.ceph-fix-amk-qqjwnv-node5.lfboep Reqs: 0 /s 11 15 13 0 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 480k 56.9G cephfs.cephfs.data data 0 56.9G STANDBY MDS cephfs.ceph-fix-amk-qqjwnv-node6.xftvxh MDS version: ceph version 16.2.8-77.el8cp (bf5436ca8dca124b6f4b3ddd729d112a54f70e29) pacific (stable) 2. Created a dir in mnt and mounted it using ceph-fuse [root@ceph-fix-amk-qqjwnv-node7 ~]# cd /mnt/cephfs_fuse/ [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# ls -lrt total 0 [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# mkdir test [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# ls -lrt total 1 drwxr-xr-x. 2 root root 0 Jul 21 08:18 test [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# touch test/placeholder 3. setf attribute ceph.dir.pin to rank 1 [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# setfattr -n ceph.dir.pin -v 1 /mnt/cephfs_fuse/test/ 4. Create a directory inside /mnt/cephfs_fuse/test with /mnt/cephfs_fuse/test/to-be-unlinked 5. Open the directory using the python program import time import os os.mkdir("/mnt/cephfs_fuse/test/to-be-unlinked") fd = os.open("/mnt/cephfs_fuse/test/to-be-unlinked", os.O_RDONLY) while True: time.sleep(1) 5. rmdir /mnt/cephfs_fuse/test/to-be-unlinked 6. Check the mds stray count on rank 1 [ceph: root@ceph-fix-amk-qqjwnv-node5 /]# ceph daemon mds.cephfs.ceph-fix-amk-qqjwnv-node5.lfboep perf dump mds_cache num_strays { "mds_cache": { "num_strays": 1 } } Before Deletion of directory it was 0 7. set max_mds to 1 and check the strays have been moved to rank 0 [ceph: root@ceph-fix-amk-qqjwnv-node4 /]# ceph daemon mds.cephfs.ceph-fix-amk-qqjwnv-node4.xrhqhd perf dump mds_cache num_strays { "mds_cache": { "num_strays": 0 } } [ceph: root@ceph-fix-amk-qqjwnv-node4 /]# ceph daemon mds.cephfs.ceph-fix-amk-qqjwnv-node4.xrhqhd perf dump mds_cache num_strays { "mds_cache": { "num_strays": 1 } } Can you please review this steps? Or do we need to perform upgrade with Stray enrtires and check the crash in MDS Regards, Amarnath (In reply to Amarnath from comment #4) > Hi Xiubo, > > As part of QE Verification for this, I am following the > test_migrate_unlinked_dir test method that has been pushed as part of this > MR. > Steps Followed : > > 1. Create FS and set max_mds to 2 > [cephuser@ceph-fix-amk-qqjwnv-node7 ~]$ sudo -i > [root@ceph-fix-amk-qqjwnv-node7 ~]# ceph -s > cluster: > id: 8382e3e2-07ec-11ed-ac3e-fa163e005463 > health: HEALTH_OK > > services: > mon: 3 daemons, quorum > ceph-fix-amk-qqjwnv-node1-installer,ceph-fix-amk-qqjwnv-node2,ceph-fix-amk- > qqjwnv-node3 (age 30h) > mgr: ceph-fix-amk-qqjwnv-node1-installer.yuneqm(active, since 30h), > standbys: ceph-fix-amk-qqjwnv-node2.ncahej > mds: 2/2 daemons up, 1 standby > osd: 12 osds: 12 up (since 30h), 12 in (since 30h) > > data: > volumes: 1/1 healthy > pools: 3 pools, 65 pgs > objects: 42 objects, 108 KiB > usage: 152 MiB used, 180 GiB / 180 GiB avail > pgs: 65 active+clean > > [root@ceph-fix-amk-qqjwnv-node7 ~]# ceph fs status > cephfs - 0 clients > ====== > RANK STATE MDS ACTIVITY DNS > INOS DIRS CAPS > 0 active cephfs.ceph-fix-amk-qqjwnv-node4.xrhqhd Reqs: 0 /s 14 > 13 12 0 > 1 active cephfs.ceph-fix-amk-qqjwnv-node5.lfboep Reqs: 0 /s 11 > 15 13 0 > POOL TYPE USED AVAIL > cephfs.cephfs.meta metadata 480k 56.9G > cephfs.cephfs.data data 0 56.9G > STANDBY MDS > cephfs.ceph-fix-amk-qqjwnv-node6.xftvxh > MDS version: ceph version 16.2.8-77.el8cp > (bf5436ca8dca124b6f4b3ddd729d112a54f70e29) pacific (stable) > > 2. Created a dir in mnt and mounted it using ceph-fuse > [root@ceph-fix-amk-qqjwnv-node7 ~]# cd /mnt/cephfs_fuse/ > [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# > [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# ls -lrt > total 0 > [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# mkdir test > [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# ls -lrt > total 1 > drwxr-xr-x. 2 root root 0 Jul 21 08:18 test > [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# touch test/placeholder > 3. setf attribute ceph.dir.pin to rank 1 > [root@ceph-fix-amk-qqjwnv-node7 cephfs_fuse]# setfattr -n ceph.dir.pin -v > 1 /mnt/cephfs_fuse/test/ > 4. Create a directory inside /mnt/cephfs_fuse/test with > /mnt/cephfs_fuse/test/to-be-unlinked > 5. Open the directory using the python program > import time > import os > os.mkdir("/mnt/cephfs_fuse/test/to-be-unlinked") > fd = os.open("/mnt/cephfs_fuse/test/to-be-unlinked", os.O_RDONLY) > while True: > time.sleep(1) > > 5. rmdir /mnt/cephfs_fuse/test/to-be-unlinked > 6. Check the mds stray count on rank 1 > [ceph: root@ceph-fix-amk-qqjwnv-node5 /]# ceph daemon > mds.cephfs.ceph-fix-amk-qqjwnv-node5.lfboep perf dump mds_cache num_strays > { > "mds_cache": { > "num_strays": 1 > } > } > Before Deletion of directory it was 0 > 7. set max_mds to 1 and check the strays have been moved to rank 0 > [ceph: root@ceph-fix-amk-qqjwnv-node4 /]# ceph daemon > mds.cephfs.ceph-fix-amk-qqjwnv-node4.xrhqhd perf dump mds_cache num_strays > { > "mds_cache": { > "num_strays": 0 > } > } > [ceph: root@ceph-fix-amk-qqjwnv-node4 /]# ceph daemon > mds.cephfs.ceph-fix-amk-qqjwnv-node4.xrhqhd perf dump mds_cache num_strays > { > "mds_cache": { > "num_strays": 1 > } > } > > Can you please review this steps? > Or do we need to perform upgrade with Stray enrtires and check the crash in > MDS > > Regards, > Amarnath Hi Amarnath, Yeah, please go ahead and this looks good to me. Thanks! Xiubo Verified on ceph Version : 16.2.8-80.el8cp I see Strays are getting migrated to the active MDS when there is a failure in MDS1 Client Node Commands : [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# ceph fs status cephfs - 1 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.ceph-fs-dashboard-zl2us3-node3.mtyixc Reqs: 0 /s 27 22 17 9 1 active cephfs.ceph-fs-dashboard-zl2us3-node5.svpkjw Reqs: 0 /s 25 22 19 7 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 2252k 56.8G cephfs.cephfs.data data 48.0k 56.8G STANDBY MDS cephfs.ceph-fs-dashboard-zl2us3-node4.vhsgud MDS version: ceph version 16.2.8-80.el8cp (22ecb3f5fcb872bb1d7739004f01a8eded7b397c) pacific (stable) [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# ceph orch host ls [root@ceph-fs-dashboard-zl2us3-node9 /]# cd /mnt/ceph-fuse [root@ceph-fs-dashboard-zl2us3-node9 ceph-fuse]# [root@ceph-fs-dashboard-zl2us3-node9 ceph-fuse]# ls -lrt total 1 drwxr-xr-x. 3 root root 182 Jul 27 12:49 volumes [root@ceph-fs-dashboard-zl2us3-node9 ceph-fuse]# cd volumes/ [root@ceph-fs-dashboard-zl2us3-node9 volumes]# cd subvolgroup_1/ [root@ceph-fs-dashboard-zl2us3-node9 subvolgroup_1]# cd subvol_1/ [root@ceph-fs-dashboard-zl2us3-node9 subvol_1]# cd 24e4b482-3307-46b1-8d31-0d10a01e7ed7/ [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# ls -lrt total 1 -rw-r--r--. 1 root root 14 Jul 27 12:53 test.txt -rw-r--r--. 1 root root 36 Jul 27 13:03 test_2.txt [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# mkdir pin_test [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# touch pin_test/placeholder [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# pwd /mnt/ceph-fuse/volumes/subvolgroup_1/subvol_1/24e4b482-3307-46b1-8d31-0d10a01e7ed7 Setting Pin to the directory with MDS.1 [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# setfattr -n ceph.dir.pin -v 1 /mnt/ceph-fuse/volumes/subvolgroup_1/subvol_1/24e4b482-3307-46b1-8d31-0d10a01e7ed7/pin_test/ [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# mkdir /mnt/ceph-fuse/volumes/subvolgroup_1/subvol_1/24e4b482-3307-46b1-8d31-0d10a01e7ed7/pin_test/to-be-unlinked [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# cd /mnt/ceph-fuse/open_dir Run Open directory and remove directory from other place [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# cat /mnt/ceph-fuse/open_dir.py import time import os os.mkdir("/mnt/ceph-fuse/volumes/subvolgroup_1/subvol_1/24e4b482-3307-46b1-8d31-0d10a01e7ed7/pin_test/to-be-unlinked/1") fd = os.open("/mnt/ceph-fuse/volumes/subvolgroup_1/subvol_1/24e4b482-3307-46b1-8d31-0d10a01e7ed7/pin_test/to-be-unlinked/1", os.O_RDONLY) while True: time.sleep(1) [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# rmdir /mnt/ceph-fuse/volumes/subvolgroup_1/subvol_1/24e4b482-3307-46b1-8d31-0d10a01e7ed7/pin_test/to-be-unlinked/1 Set Max_MDS to 1 [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# ceph fs set max_mds 1 MDS.0: [root@ceph-fs-dashboard-zl2us3-node3 ~]# cephadm shell Inferring fsid b8423362-0d82-11ed-af08-fa163e7db892 Inferring config /var/lib/ceph/b8423362-0d82-11ed-af08-fa163e7db892/mon.ceph-fs-dashboard-zl2us3-node3/config Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:22d12ad3f3fe4dff25ec3c81348ea0018247dc654957c1dcf5e8da88fca43ed2 [ceph: root@ceph-fs-dashboard-zl2us3-node3 /]# ceph daemon mds.cephfs.ceph-fs-dashboard-zl2us3-node3.mtyixc perf dump mds_cache num_strays { "mds_cache": { "num_strays": 0 } } [ceph: root@ceph-fs-dashboard-zl2us3-node3 /]# ceph daemon mds.cephfs.ceph-fs-dashboard-zl2us3-node3.mtyixc perf dump mds_cache num_strays { "mds_cache": { "num_strays": 0 } } After running rmdir on open directory [ceph: root@ceph-fs-dashboard-zl2us3-node3 /]# ceph daemon mds.cephfs.ceph-fs-dashboard-zl2us3-node3.mtyixc perf dump mds_cache num_strays { "mds_cache": { "num_strays": 1 } } MDS.1 [root@ceph-fs-dashboard-zl2us3-node5 ~]# cephadm shell Inferring fsid b8423362-0d82-11ed-af08-fa163e7db892 Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:22d12ad3f3fe4dff25ec3c81348ea0018247dc654957c1dcf5e8da88fca43ed2 [ceph: root@ceph-fs-dashboard-zl2us3-node5 /]# ceph daemon mds.cephfs.ceph-fs-dashboard-zl2us3-node5.svpkjw perf dump mds_cache num_strays { "mds_cache": { "num_strays": 0 } } [ceph: root@ceph-fs-dashboard-zl2us3-node5 /]# ceph daemon mds.cephfs.ceph-fs-dashboard-zl2us3-node5.svpkjw perf dump mds_cache num_strays { "mds_cache": { "num_strays": 1 } } [ceph: root@ceph-fs-dashboard-zl2us3-node5 /]# ceph daemon mds.cephfs.ceph-fs-dashboard-zl2us3-node5.svpkjw perf dump mds_cache num_strays { "mds_cache": { "num_strays": 1 } } [ceph: root@ceph-fs-dashboard-zl2us3-node5 /]# ceph daemon mds.cephfs.ceph-fs-dashboard-zl2us3-node5.svpkjw perf dump mds_cache num_strays { "mds_cache": { "num_strays": 1 } } Stray got migrated to MDS.0 node [ceph: root@ceph-fs-dashboard-zl2us3-node5 /]# ceph daemon mds.cephfs.ceph-fs-dashboard-zl2us3-node5.svpkjw perf dump mds_cache num_strays { "mds_cache": { "num_strays": 0 } } [ceph: root@ceph-fs-dashboard-zl2us3-node5 /]# ceph daemon mds.cephfs.ceph-fs-dashboard-zl2us3-node5.svpkjw perf dump mds_cache num_strays { "mds_cache": { "num_strays": 0 } } MDS Stopped [ceph: root@ceph-fs-dashboard-zl2us3-node5 /]# ceph daemon mds.cephfs.ceph-fs-dashboard-zl2us3-node5.svpkjw perf dump mds_cache num_strays {} [ceph: root@ceph-fs-dashboard-zl2us3-node5 /]# No crash observed [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# ceph crash ls [root@ceph-fs-dashboard-zl2us3-node9 24e4b482-3307-46b1-8d31-0d10a01e7ed7]# Regards, Amarnath Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5997 |