Bug 1787109

Summary: Observing scrub error with assert failure when a filesystem scrub is initiated
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Hemanth Kumar <hyelloji>
Component: CephFSAssignee: Milind Changire <mchangir>
Status: CLOSED ERRATA QA Contact: Hemanth Kumar <hyelloji>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.0CC: assingh, ceph-eng-bugs, ceph-qe-bugs, dfuller, hgurav, kdreyer, pdonnell, sweil, tchandra, tserlin, ymane
Target Milestone: rcKeywords: Regression
Target Release: 4.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: ceph-14.2.4-124.el8cp, ceph-14.2.4-50.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1787489 (view as bug list) Environment:
Last Closed: 2020-01-31 12:48:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1787489    
Attachments:
Description Flags
mds server log
none
ceph mds server log none

Description Hemanth Kumar 2019-12-31 08:09:46 UTC
Created attachment 1648739 [details]
mds server log

Description of problem:
Observing scrub error with assert failure when a filesystem scrub is initiated

Version-Release number of selected component (if applicable):
ceph version 14.2.4-85.el8cp 

Steps to Reproduce:
Have 4 Clients and mount the filesystem with both fuse and kcephfs
Fill data upto 30%
Initiate Filesystem scrub : ceph tell mds.0 scrub start / recursive

[root@plena001 ceph]# ceph -s
  cluster:
    id:     a1a87e45-ccdf-46c9-b6a7-c371bb03c055
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum plena001,plena002,plena003 (age 13d)
    mgr: plena001(active, since 13d)
    mds: vols:2 {0=plena005=up:active,1=plena006=up:active} 2 up:standby
    osd: 20 osds: 20 up (since 13d), 20 in (since 13d)

  data:
    pools:   2 pools, 256 pgs
    objects: 284.31k objects, 1.0 TiB
    usage:   10 TiB used, 14 TiB / 25 TiB avail
    pgs:     256 active+clean


[root@magna114 vol2]# ceph fs status
vols - 4 clients
====
+------+--------+----------+---------------+-------+-------+
| Rank | State  |   MDS    |    Activity   |  dns  |  inos |
+------+--------+----------+---------------+-------+-------+
|  0   | active | plena005 | Reqs:    0 /s | 20.1k | 20.1k |
|  1   | active | plena006 | Reqs:    0 /s | 1129  | 1133  |
+------+--------+----------+---------------+-------+-------+
+------------------+----------+-------+-------+
|       Pool       |   type   |  used | avail |
+------------------+----------+-------+-------+
| cephfs.vols.meta | metadata | 3101M | 3847G |
| cephfs.vols.data |   data   | 3138G | 3847G |
+------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|   plena007  |
|   plena004  |
+-------------+
MDS version: ceph version 14.2.4-85.el8cp (e2de9960d580ef8c3047880ad0e545c06092c5a0) nautilus (stable)


[root@plena001 ceph]# ceph tell mds.0 scrub start / recursive
2019-12-31 07:30:14.238 7f75534a9700  0 client.96807 ms_handle_reset on v2:10.1.172.5:6832/2488735279
2019-12-31 07:30:14.252 7f75544ab700  0 client.96813 ms_handle_reset on v2:10.1.172.5:6832/2488735279
{   
    "return_code": 0,
    "scrub_tag": "8d562abb-2993-4b7b-b2e1-95012b7929fd",
    "mode": "asynchronous"
}
[root@plena001 ceph]# 


From the Log file :-
--------------
   -1> 2019-12-31 07:30:15.974 7f7314e6a700 -1 /builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: In function 'void CDir::fetch(MDSContext*, std::string_view, bool)' thread 7f7314e6a700 time 2019-12-31 07:30:15.974519
/builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: 1494: FAILED ceph_assert(is_auth())

    -1> 2019-12-31 07:30:15.974 7f7314e6a700 -1 /builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: In function 'void CDir::fetch(MDSContext*, std::string_view, bool)' thread 7f7314e6a700 time 2019-12-31 07:30:15.974519
/builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: 1494: FAILED ceph_assert(is_auth())

 ceph version 14.2.4-85.el8cp (e2de9960d580ef8c3047880ad0e545c06092c5a0) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x7f73267066c6]
 2: (()+0x27f8e0) [0x7f73267068e0]
 3: (CDir::fetch(MDSContext*, std::basic_string_view<char, std::char_traits<char> >, bool)+0xa2b) [0x561ab4c9437b]
 4: (CDir::fetch(MDSContext*, bool)+0x3e) [0x561ab4c944ee]
 5: (ScrubStack::get_next_cdir(CInode*, CDir**)+0x343) [0x561ab4cfcc13]
 6: (ScrubStack::scrub_dir_inode(CInode*, bool*, bool*, bool*)+0x72a) [0x561ab4cfe08a]
 7: (ScrubStack::kick_off_scrubs()+0x216) [0x561ab4cffb16]
 8: (MDSContext::complete(int)+0x7f) [0x561ab4d2d44f]
 9: (MDSIOContextBase::complete(int)+0x17f) [0x561ab4d2d6df]
 10: (Finisher::finisher_thread_entry()+0x18d) [0x7f732679428d]
 11: (()+0x82de) [0x7f73245012de]
 12: (clone()+0x43) [0x7f7323094133]

     0> 2019-12-31 07:30:15.976 7f7314e6a700 -1 *** Caught signal (Aborted) **
 in thread 7f7314e6a700 thread_name:fn_anonymous

 ceph version 14.2.4-85.el8cp (e2de9960d580ef8c3047880ad0e545c06092c5a0) nautilus (stable)
 1: (()+0x12dc0) [0x7f732450bdc0]
 2: (gsignal()+0x10f) [0x7f7322fcf8df]
 3: (abort()+0x127) [0x7f7322fb9cf5]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a7) [0x7f7326706717]
 5: (()+0x27f8e0) [0x7f73267068e0]
 6: (CDir::fetch(MDSContext*, std::basic_string_view<char, std::char_traits<char> >, bool)+0xa2b) [0x561ab4c9437b]
 7: (CDir::fetch(MDSContext*, bool)+0x3e) [0x561ab4c944ee]
 8: (ScrubStack::get_next_cdir(CInode*, CDir**)+0x343) [0x561ab4cfcc13]
 9: (ScrubStack::scrub_dir_inode(CInode*, bool*, bool*, bool*)+0x72a) [0x561ab4cfe08a]
 10: (ScrubStack::kick_off_scrubs()+0x216) [0x561ab4cffb16]
 11: (MDSContext::complete(int)+0x7f) [0x561ab4d2d44f]
 12: (MDSIOContextBase::complete(int)+0x17f) [0x561ab4d2d6df]
 13: (Finisher::finisher_thread_entry()+0x18d) [0x7f732679428d]
 14: (()+0x82de) [0x7f73245012de]
 15: (clone()+0x43) [0x7f7323094133]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Actual results:
Assert seen with scrub error seen while performing recursive fs scrub

Expected results:
Scrub should check the fs consistency on all the inodes without any errors

Comment 4 Patrick Donnelly 2020-01-06 18:06:59 UTC
*** Bug 1782612 has been marked as a duplicate of this bug. ***

Comment 19 Hemanth Kumar 2020-01-22 09:05:44 UTC
Created attachment 1654496 [details]
ceph mds server log

Comment 31 errata-xmlrpc 2020-01-31 12:48:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0312