Bug 1787109 - Observing scrub error with assert failure when a filesystem scrub is initiated
Summary: Observing scrub error with assert failure when a filesystem scrub is initiated
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: CephFS
Version: 4.0
Hardware: All
OS: All
urgent
high
Target Milestone: rc
: 4.0
Assignee: Milind Changire
QA Contact: Hemanth Kumar
URL:
Whiteboard:
: 1782612 (view as bug list)
Depends On:
Blocks: 1787489
TreeView+ depends on / blocked
 
Reported: 2019-12-31 08:09 UTC by Hemanth Kumar
Modified: 2020-01-31 12:48 UTC (History)
11 users (show)

Fixed In Version: ceph-14.2.4-124.el8cp, ceph-14.2.4-50.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1787489 (view as bug list)
Environment:
Last Closed: 2020-01-31 12:48:33 UTC
Target Upstream Version:


Attachments (Terms of Use)
mds server log (732.60 KB, application/x-xz)
2019-12-31 08:09 UTC, Hemanth Kumar
no flags Details
ceph mds server log (3.82 MB, application/gzip)
2020-01-22 09:05 UTC, Hemanth Kumar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 43483 0 None None None 2020-01-06 15:08:55 UTC
Red Hat Product Errata RHBA-2020:0312 0 None None None 2020-01-31 12:48:44 UTC

Description Hemanth Kumar 2019-12-31 08:09:46 UTC
Created attachment 1648739 [details]
mds server log

Description of problem:
Observing scrub error with assert failure when a filesystem scrub is initiated

Version-Release number of selected component (if applicable):
ceph version 14.2.4-85.el8cp 

Steps to Reproduce:
Have 4 Clients and mount the filesystem with both fuse and kcephfs
Fill data upto 30%
Initiate Filesystem scrub : ceph tell mds.0 scrub start / recursive

[root@plena001 ceph]# ceph -s
  cluster:
    id:     a1a87e45-ccdf-46c9-b6a7-c371bb03c055
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum plena001,plena002,plena003 (age 13d)
    mgr: plena001(active, since 13d)
    mds: vols:2 {0=plena005=up:active,1=plena006=up:active} 2 up:standby
    osd: 20 osds: 20 up (since 13d), 20 in (since 13d)

  data:
    pools:   2 pools, 256 pgs
    objects: 284.31k objects, 1.0 TiB
    usage:   10 TiB used, 14 TiB / 25 TiB avail
    pgs:     256 active+clean


[root@magna114 vol2]# ceph fs status
vols - 4 clients
====
+------+--------+----------+---------------+-------+-------+
| Rank | State  |   MDS    |    Activity   |  dns  |  inos |
+------+--------+----------+---------------+-------+-------+
|  0   | active | plena005 | Reqs:    0 /s | 20.1k | 20.1k |
|  1   | active | plena006 | Reqs:    0 /s | 1129  | 1133  |
+------+--------+----------+---------------+-------+-------+
+------------------+----------+-------+-------+
|       Pool       |   type   |  used | avail |
+------------------+----------+-------+-------+
| cephfs.vols.meta | metadata | 3101M | 3847G |
| cephfs.vols.data |   data   | 3138G | 3847G |
+------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|   plena007  |
|   plena004  |
+-------------+
MDS version: ceph version 14.2.4-85.el8cp (e2de9960d580ef8c3047880ad0e545c06092c5a0) nautilus (stable)


[root@plena001 ceph]# ceph tell mds.0 scrub start / recursive
2019-12-31 07:30:14.238 7f75534a9700  0 client.96807 ms_handle_reset on v2:10.1.172.5:6832/2488735279
2019-12-31 07:30:14.252 7f75544ab700  0 client.96813 ms_handle_reset on v2:10.1.172.5:6832/2488735279
{   
    "return_code": 0,
    "scrub_tag": "8d562abb-2993-4b7b-b2e1-95012b7929fd",
    "mode": "asynchronous"
}
[root@plena001 ceph]# 


From the Log file :-
--------------
   -1> 2019-12-31 07:30:15.974 7f7314e6a700 -1 /builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: In function 'void CDir::fetch(MDSContext*, std::string_view, bool)' thread 7f7314e6a700 time 2019-12-31 07:30:15.974519
/builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: 1494: FAILED ceph_assert(is_auth())

    -1> 2019-12-31 07:30:15.974 7f7314e6a700 -1 /builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: In function 'void CDir::fetch(MDSContext*, std::string_view, bool)' thread 7f7314e6a700 time 2019-12-31 07:30:15.974519
/builddir/build/BUILD/ceph-14.2.4/src/mds/CDir.cc: 1494: FAILED ceph_assert(is_auth())

 ceph version 14.2.4-85.el8cp (e2de9960d580ef8c3047880ad0e545c06092c5a0) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x7f73267066c6]
 2: (()+0x27f8e0) [0x7f73267068e0]
 3: (CDir::fetch(MDSContext*, std::basic_string_view<char, std::char_traits<char> >, bool)+0xa2b) [0x561ab4c9437b]
 4: (CDir::fetch(MDSContext*, bool)+0x3e) [0x561ab4c944ee]
 5: (ScrubStack::get_next_cdir(CInode*, CDir**)+0x343) [0x561ab4cfcc13]
 6: (ScrubStack::scrub_dir_inode(CInode*, bool*, bool*, bool*)+0x72a) [0x561ab4cfe08a]
 7: (ScrubStack::kick_off_scrubs()+0x216) [0x561ab4cffb16]
 8: (MDSContext::complete(int)+0x7f) [0x561ab4d2d44f]
 9: (MDSIOContextBase::complete(int)+0x17f) [0x561ab4d2d6df]
 10: (Finisher::finisher_thread_entry()+0x18d) [0x7f732679428d]
 11: (()+0x82de) [0x7f73245012de]
 12: (clone()+0x43) [0x7f7323094133]

     0> 2019-12-31 07:30:15.976 7f7314e6a700 -1 *** Caught signal (Aborted) **
 in thread 7f7314e6a700 thread_name:fn_anonymous

 ceph version 14.2.4-85.el8cp (e2de9960d580ef8c3047880ad0e545c06092c5a0) nautilus (stable)
 1: (()+0x12dc0) [0x7f732450bdc0]
 2: (gsignal()+0x10f) [0x7f7322fcf8df]
 3: (abort()+0x127) [0x7f7322fb9cf5]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a7) [0x7f7326706717]
 5: (()+0x27f8e0) [0x7f73267068e0]
 6: (CDir::fetch(MDSContext*, std::basic_string_view<char, std::char_traits<char> >, bool)+0xa2b) [0x561ab4c9437b]
 7: (CDir::fetch(MDSContext*, bool)+0x3e) [0x561ab4c944ee]
 8: (ScrubStack::get_next_cdir(CInode*, CDir**)+0x343) [0x561ab4cfcc13]
 9: (ScrubStack::scrub_dir_inode(CInode*, bool*, bool*, bool*)+0x72a) [0x561ab4cfe08a]
 10: (ScrubStack::kick_off_scrubs()+0x216) [0x561ab4cffb16]
 11: (MDSContext::complete(int)+0x7f) [0x561ab4d2d44f]
 12: (MDSIOContextBase::complete(int)+0x17f) [0x561ab4d2d6df]
 13: (Finisher::finisher_thread_entry()+0x18d) [0x7f732679428d]
 14: (()+0x82de) [0x7f73245012de]
 15: (clone()+0x43) [0x7f7323094133]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Actual results:
Assert seen with scrub error seen while performing recursive fs scrub

Expected results:
Scrub should check the fs consistency on all the inodes without any errors

Comment 4 Patrick Donnelly 2020-01-06 18:06:59 UTC
*** Bug 1782612 has been marked as a duplicate of this bug. ***

Comment 19 Hemanth Kumar 2020-01-22 09:05:44 UTC
Created attachment 1654496 [details]
ceph mds server log

Comment 31 errata-xmlrpc 2020-01-31 12:48:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0312


Note You need to log in before you can comment on or make changes to this bug.