Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1642015

Summary: MDS crashed when running scrub_path command in admin-daemon
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Ramakrishnan Periyasamy <rperiyas>
Component: CephFSAssignee: Yan, Zheng <zyan>
Status: CLOSED WONTFIX QA Contact: Hemanth Kumar <hyelloji>
Severity: high Docs Contact:
Priority: high    
Version: 3.1CC: ceph-eng-bugs, hnallurv, hyelloji, jbrier, pasik, pdonnell, tserlin, zyan
Target Milestone: rc   
Target Release: 4.1   
Hardware: All   
OS: All   
Whiteboard: NeedsDev
Fixed In Version: Doc Type: Known Issue
Doc Text:
.The Ceph Metadata Server might crash during scrub with multiple MDS This issue is triggered when the `scrub_path` command is run in an environment with multiple Ceph Metadata Servers. There is no workaround at this time.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-28 00:35:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1629656    
Attachments:
Description Flags
Failed MDS log none

Description Ramakrishnan Periyasamy 2018-10-23 11:46:58 UTC
Created attachment 1496682 [details]
Failed MDS log

Description of problem:
MDS crashed when running scrub_path command from admin_daemon.

command and console output:
[root@host083 ceph]# ceph --admin-daemon ceph-mds.magna083.asok scrub_path /kernel2/test/file_dstdir/localhost.localdomain/thrd_25
admin_socket: exception: exception: no data returned from admin socket

0> 2018-10-23 11:34:56.051305 7fce51417700 -1 /builddir/build/BUILD/ceph-12.2.5/src/mds/CDir.cc: In function 'void CDir::fetch(MDSInternalContextBase*, boost::string_view, bool)' thread 7fce51417700 time 2018-10-23 11:34:56.047450
/builddir/build/BUILD/ceph-12.2.5/src/mds/CDir.cc: 1473: FAILED assert(is_auth())

 ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x562d0af01210]
 2: (CDir::fetch(MDSInternalContextBase*, boost::basic_string_view<char, std::char_traits<char> >, bool)+0x8a8) [0x562d0adc8258]
 3: (CDir::fetch(MDSInternalContextBase*, bool)+0x30) [0x562d0adc8350]
 4: (()+0x4f5666) [0x562d0adf8666]
 5: (()+0x5003be) [0x562d0ae033be]
 6: (Continuation::_continue_function(int, int)+0x1aa) [0x562d0ae11aba]
 7: (Continuation::Callback::finish(int)+0x10) [0x562d0ae11ba0]
 8: (Context::complete(int)+0x9) [0x562d0abc8589]
 9: (MDSIOContextBase::complete(int)+0xa4) [0x562d0ae4b824]
 10: (Finisher::finisher_thread_entry()+0x198) [0x562d0af00188]
 11: (()+0x7dd5) [0x7fce5c259dd5]
 12: (clone()+0x6d) [0x7fce5b336ead]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


2018-10-23 11:34:56.100199 7fce51417700 -1 *** Caught signal (Aborted) **
 in thread 7fce51417700 thread_name:fn_anonymous

 ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)
 1: (()+0x5bd6f1) [0x562d0aec06f1]
 2: (()+0xf5d0) [0x7fce5c2615d0]
 3: (gsignal()+0x37) [0x7fce5b26f207]
 4: (abort()+0x148) [0x7fce5b2708f8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x562d0af01384]
 6: (CDir::fetch(MDSInternalContextBase*, boost::basic_string_view<char, std::char_traits<char> >, bool)+0x8a8) [0x562d0adc8258]
 7: (CDir::fetch(MDSInternalContextBase*, bool)+0x30) [0x562d0adc8350]
 8: (()+0x4f5666) [0x562d0adf8666]
 9: (()+0x5003be) [0x562d0ae033be]
 10: (Continuation::_continue_function(int, int)+0x1aa) [0x562d0ae11aba]
 11: (Continuation::Callback::finish(int)+0x10) [0x562d0ae11ba0]
 12: (Context::complete(int)+0x9) [0x562d0abc8589]
 13: (MDSIOContextBase::complete(int)+0xa4) [0x562d0ae4b824]
 14: (Finisher::finisher_thread_entry()+0x198) [0x562d0af00188]
 15: (()+0x7dd5) [0x7fce5c259dd5]
 16: (clone()+0x6d) [0x7fce5b336ead]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---


Version-Release number of selected component (if applicable):
ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)

How reproducible:
1/1

Steps to Reproduce:
1. Create cluster with active - active MDS
2. get a dir path from "get subtree" command
3. Run scrub_path 

Actual results:
MDS crashed and standby MDS become active

Expected results:
There should not be crash

Additional info:
NA

Comment 3 Yan, Zheng 2018-11-07 13:09:35 UTC
Scrub does not work properly in multimds setup. I'm working on this issue.

Comment 6 Giridhar Ramaraju 2019-08-05 13:11:47 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 7 Giridhar Ramaraju 2019-08-05 13:12:42 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 9 Yan, Zheng 2019-08-26 07:59:46 UTC
No. I hasn't finished the code