Bug 1642015 - MDS crashed when running scrub_path command in admin-daemon
Summary: MDS crashed when running scrub_path command in admin-daemon
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 3.1
Hardware: All
OS: All
high
high
Target Milestone: rc
: 4.1
Assignee: Yan, Zheng
QA Contact: Hemanth Kumar
URL:
Whiteboard: NeedsDev
Depends On:
Blocks: 1629656
TreeView+ depends on / blocked
 
Reported: 2018-10-23 11:46 UTC by Ramakrishnan Periyasamy
Modified: 2020-02-28 00:35 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.The Ceph Metadata Server might crash during scrub with multiple MDS This issue is triggered when the `scrub_path` command is run in an environment with multiple Ceph Metadata Servers. There is no workaround at this time.
Clone Of:
Environment:
Last Closed: 2020-02-28 00:35:22 UTC
Embargoed:


Attachments (Terms of Use)
Failed MDS log (2.34 MB, text/plain)
2018-10-23 11:46 UTC, Ramakrishnan Periyasamy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 36673 0 None None None 2018-11-05 22:30:53 UTC

Description Ramakrishnan Periyasamy 2018-10-23 11:46:58 UTC
Created attachment 1496682 [details]
Failed MDS log

Description of problem:
MDS crashed when running scrub_path command from admin_daemon.

command and console output:
[root@host083 ceph]# ceph --admin-daemon ceph-mds.magna083.asok scrub_path /kernel2/test/file_dstdir/localhost.localdomain/thrd_25
admin_socket: exception: exception: no data returned from admin socket

0> 2018-10-23 11:34:56.051305 7fce51417700 -1 /builddir/build/BUILD/ceph-12.2.5/src/mds/CDir.cc: In function 'void CDir::fetch(MDSInternalContextBase*, boost::string_view, bool)' thread 7fce51417700 time 2018-10-23 11:34:56.047450
/builddir/build/BUILD/ceph-12.2.5/src/mds/CDir.cc: 1473: FAILED assert(is_auth())

 ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x562d0af01210]
 2: (CDir::fetch(MDSInternalContextBase*, boost::basic_string_view<char, std::char_traits<char> >, bool)+0x8a8) [0x562d0adc8258]
 3: (CDir::fetch(MDSInternalContextBase*, bool)+0x30) [0x562d0adc8350]
 4: (()+0x4f5666) [0x562d0adf8666]
 5: (()+0x5003be) [0x562d0ae033be]
 6: (Continuation::_continue_function(int, int)+0x1aa) [0x562d0ae11aba]
 7: (Continuation::Callback::finish(int)+0x10) [0x562d0ae11ba0]
 8: (Context::complete(int)+0x9) [0x562d0abc8589]
 9: (MDSIOContextBase::complete(int)+0xa4) [0x562d0ae4b824]
 10: (Finisher::finisher_thread_entry()+0x198) [0x562d0af00188]
 11: (()+0x7dd5) [0x7fce5c259dd5]
 12: (clone()+0x6d) [0x7fce5b336ead]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


2018-10-23 11:34:56.100199 7fce51417700 -1 *** Caught signal (Aborted) **
 in thread 7fce51417700 thread_name:fn_anonymous

 ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)
 1: (()+0x5bd6f1) [0x562d0aec06f1]
 2: (()+0xf5d0) [0x7fce5c2615d0]
 3: (gsignal()+0x37) [0x7fce5b26f207]
 4: (abort()+0x148) [0x7fce5b2708f8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x562d0af01384]
 6: (CDir::fetch(MDSInternalContextBase*, boost::basic_string_view<char, std::char_traits<char> >, bool)+0x8a8) [0x562d0adc8258]
 7: (CDir::fetch(MDSInternalContextBase*, bool)+0x30) [0x562d0adc8350]
 8: (()+0x4f5666) [0x562d0adf8666]
 9: (()+0x5003be) [0x562d0ae033be]
 10: (Continuation::_continue_function(int, int)+0x1aa) [0x562d0ae11aba]
 11: (Continuation::Callback::finish(int)+0x10) [0x562d0ae11ba0]
 12: (Context::complete(int)+0x9) [0x562d0abc8589]
 13: (MDSIOContextBase::complete(int)+0xa4) [0x562d0ae4b824]
 14: (Finisher::finisher_thread_entry()+0x198) [0x562d0af00188]
 15: (()+0x7dd5) [0x7fce5c259dd5]
 16: (clone()+0x6d) [0x7fce5b336ead]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---


Version-Release number of selected component (if applicable):
ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)

How reproducible:
1/1

Steps to Reproduce:
1. Create cluster with active - active MDS
2. get a dir path from "get subtree" command
3. Run scrub_path 

Actual results:
MDS crashed and standby MDS become active

Expected results:
There should not be crash

Additional info:
NA

Comment 3 Yan, Zheng 2018-11-07 13:09:35 UTC
Scrub does not work properly in multimds setup. I'm working on this issue.

Comment 6 Giridhar Ramaraju 2019-08-05 13:11:47 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 7 Giridhar Ramaraju 2019-08-05 13:12:42 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 9 Yan, Zheng 2019-08-26 07:59:46 UTC
No. I hasn't finished the code


Note You need to log in before you can comment on or make changes to this bug.