Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2020:0312
Created attachment 1648739 [details] mds server log Description of problem: Observing scrub error with assert failure when a filesystem scrub is initiated Version-Release number of selected component (if applicable): ceph version 14.2.4-85.el8cp Steps to Reproduce: Have 4 Clients and mount the filesystem with both fuse and kcephfs Fill data upto 30% Initiate Filesystem scrub : ceph tell mds.0 scrub start / recursive [root@plena001 ceph]# ceph -s cluster: id: a1a87e45-ccdf-46c9-b6a7-c371bb03c055 health: HEALTH_OK services: mon: 3 daemons, quorum plena001,plena002,plena003 (age 13d) mgr: plena001(active, since 13d) mds: vols:2 {0=plena005=up:active,1=plena006=up:active} 2 up:standby osd: 20 osds: 20 up (since 13d), 20 in (since 13d) data: pools: 2 pools, 256 pgs objects: 284.31k objects, 1.0 TiB usage: 10 TiB used, 14 TiB / 25 TiB avail pgs: 256 active+clean [root@magna114 vol2]# ceph fs status vols - 4 clients ==== +------+--------+----------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+----------+---------------+-------+-------+ | 0 | active | plena005 | Reqs: 0 /s | 20.1k | 20.1k | | 1 | active | plena006 | Reqs: 0 /s | 1129 | 1133 | +------+--------+----------+---------------+-------+-------+ +------------------+----------+-------+-------+ | Pool | type | used | avail | +------------------+----------+-------+-------+ | cephfs.vols.meta | metadata | 3101M | 3847G | | cephfs.vols.data | data | 3138G | 3847G | +------------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | plena007 | | plena004 | +-------------+ MDS version: ceph version 14.2.4-85.el8cp (e2de9960d580ef8c3047880ad0e545c06092c5a0) nautilus (stable) [root@plena001 ceph]# ceph tell mds.0 scrub start / recursive 2019-12-31 07:30:14.238 7f75534a9700 0 client.96807 ms_handle_reset on v2:10.1.172.5:6832/2488735279 2019-12-31 07:30:14.252 7f75544ab700 0 client.96813 ms_handle_reset on v2:10.1.172.5:6832/2488735279 { "return_code": 0, "scrub_tag": "8d562abb-2993-4b7b-b2e1-95012b7929fd", "mode": "asynchronous" } [root@plena001 ceph]# From the Log file :- -------------- -1> 2019-12-31 07:30:15.974 7f7314e6a700 -1 /builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: In function 'void CDir::fetch(MDSContext*, std::string_view, bool)' thread 7f7314e6a700 time 2019-12-31 07:30:15.974519 /builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: 1494: FAILED ceph_assert(is_auth()) -1> 2019-12-31 07:30:15.974 7f7314e6a700 -1 /builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: In function 'void CDir::fetch(MDSContext*, std::string_view, bool)' thread 7f7314e6a700 time 2019-12-31 07:30:15.974519 /builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: 1494: FAILED ceph_assert(is_auth()) ceph version 14.2.4-85.el8cp (e2de9960d580ef8c3047880ad0e545c06092c5a0) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x7f73267066c6] 2: (()+0x27f8e0) [0x7f73267068e0] 3: (CDir::fetch(MDSContext*, std::basic_string_view<char, std::char_traits<char> >, bool)+0xa2b) [0x561ab4c9437b] 4: (CDir::fetch(MDSContext*, bool)+0x3e) [0x561ab4c944ee] 5: (ScrubStack::get_next_cdir(CInode*, CDir**)+0x343) [0x561ab4cfcc13] 6: (ScrubStack::scrub_dir_inode(CInode*, bool*, bool*, bool*)+0x72a) [0x561ab4cfe08a] 7: (ScrubStack::kick_off_scrubs()+0x216) [0x561ab4cffb16] 8: (MDSContext::complete(int)+0x7f) [0x561ab4d2d44f] 9: (MDSIOContextBase::complete(int)+0x17f) [0x561ab4d2d6df] 10: (Finisher::finisher_thread_entry()+0x18d) [0x7f732679428d] 11: (()+0x82de) [0x7f73245012de] 12: (clone()+0x43) [0x7f7323094133] 0> 2019-12-31 07:30:15.976 7f7314e6a700 -1 *** Caught signal (Aborted) ** in thread 7f7314e6a700 thread_name:fn_anonymous ceph version 14.2.4-85.el8cp (e2de9960d580ef8c3047880ad0e545c06092c5a0) nautilus (stable) 1: (()+0x12dc0) [0x7f732450bdc0] 2: (gsignal()+0x10f) [0x7f7322fcf8df] 3: (abort()+0x127) [0x7f7322fb9cf5] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a7) [0x7f7326706717] 5: (()+0x27f8e0) [0x7f73267068e0] 6: (CDir::fetch(MDSContext*, std::basic_string_view<char, std::char_traits<char> >, bool)+0xa2b) [0x561ab4c9437b] 7: (CDir::fetch(MDSContext*, bool)+0x3e) [0x561ab4c944ee] 8: (ScrubStack::get_next_cdir(CInode*, CDir**)+0x343) [0x561ab4cfcc13] 9: (ScrubStack::scrub_dir_inode(CInode*, bool*, bool*, bool*)+0x72a) [0x561ab4cfe08a] 10: (ScrubStack::kick_off_scrubs()+0x216) [0x561ab4cffb16] 11: (MDSContext::complete(int)+0x7f) [0x561ab4d2d44f] 12: (MDSIOContextBase::complete(int)+0x17f) [0x561ab4d2d6df] 13: (Finisher::finisher_thread_entry()+0x18d) [0x7f732679428d] 14: (()+0x82de) [0x7f73245012de] 15: (clone()+0x43) [0x7f7323094133] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Actual results: Assert seen with scrub error seen while performing recursive fs scrub Expected results: Scrub should check the fs consistency on all the inodes without any errors