Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1642015

Summary:

MDS crashed when running scrub_path command in admin-daemon

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Ramakrishnan Periyasamy <rperiyas>

Component:

CephFS

Assignee:

Yan, Zheng <zyan>

Status:

CLOSED WONTFIX

QA Contact:

Hemanth Kumar <hyelloji>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.1

CC:

ceph-eng-bugs, hnallurv, hyelloji, jbrier, pasik, pdonnell, tserlin, zyan

Target Milestone:

Target Release:

4.1

Hardware:

All

OS:

All

Whiteboard:

NeedsDev

Fixed In Version:

Doc Type:

Known Issue

Doc Text:

.The Ceph Metadata Server might crash during scrub with multiple MDS This issue is triggered when the `scrub_path` command is run in an environment with multiple Ceph Metadata Servers. There is no workaround at this time.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-02-28 00:35:22 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1629656

Attachments:

Description	Flags
Failed MDS log	none

Description Ramakrishnan Periyasamy 2018-10-23 11:46:58 UTC

Created attachment 1496682 [details]
Failed MDS log

Description of problem:
MDS crashed when running scrub_path command from admin_daemon.

command and console output:
[root@host083 ceph]# ceph --admin-daemon ceph-mds.magna083.asok scrub_path /kernel2/test/file_dstdir/localhost.localdomain/thrd_25
admin_socket: exception: exception: no data returned from admin socket

0> 2018-10-23 11:34:56.051305 7fce51417700 -1 /builddir/build/BUILD/ceph-12.2.5/src/mds/CDir.cc: In function 'void CDir::fetch(MDSInternalContextBase*, boost::string_view, bool)' thread 7fce51417700 time 2018-10-23 11:34:56.047450
/builddir/build/BUILD/ceph-12.2.5/src/mds/CDir.cc: 1473: FAILED assert(is_auth())

 ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x562d0af01210]
 2: (CDir::fetch(MDSInternalContextBase*, boost::basic_string_view<char, std::char_traits<char> >, bool)+0x8a8) [0x562d0adc8258]
 3: (CDir::fetch(MDSInternalContextBase*, bool)+0x30) [0x562d0adc8350]
 4: (()+0x4f5666) [0x562d0adf8666]
 5: (()+0x5003be) [0x562d0ae033be]
 6: (Continuation::_continue_function(int, int)+0x1aa) [0x562d0ae11aba]
 7: (Continuation::Callback::finish(int)+0x10) [0x562d0ae11ba0]
 8: (Context::complete(int)+0x9) [0x562d0abc8589]
 9: (MDSIOContextBase::complete(int)+0xa4) [0x562d0ae4b824]
 10: (Finisher::finisher_thread_entry()+0x198) [0x562d0af00188]
 11: (()+0x7dd5) [0x7fce5c259dd5]
 12: (clone()+0x6d) [0x7fce5b336ead]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


2018-10-23 11:34:56.100199 7fce51417700 -1 *** Caught signal (Aborted) **
 in thread 7fce51417700 thread_name:fn_anonymous

 ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)
 1: (()+0x5bd6f1) [0x562d0aec06f1]
 2: (()+0xf5d0) [0x7fce5c2615d0]
 3: (gsignal()+0x37) [0x7fce5b26f207]
 4: (abort()+0x148) [0x7fce5b2708f8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x562d0af01384]
 6: (CDir::fetch(MDSInternalContextBase*, boost::basic_string_view<char, std::char_traits<char> >, bool)+0x8a8) [0x562d0adc8258]
 7: (CDir::fetch(MDSInternalContextBase*, bool)+0x30) [0x562d0adc8350]
 8: (()+0x4f5666) [0x562d0adf8666]
 9: (()+0x5003be) [0x562d0ae033be]
 10: (Continuation::_continue_function(int, int)+0x1aa) [0x562d0ae11aba]
 11: (Continuation::Callback::finish(int)+0x10) [0x562d0ae11ba0]
 12: (Context::complete(int)+0x9) [0x562d0abc8589]
 13: (MDSIOContextBase::complete(int)+0xa4) [0x562d0ae4b824]
 14: (Finisher::finisher_thread_entry()+0x198) [0x562d0af00188]
 15: (()+0x7dd5) [0x7fce5c259dd5]
 16: (clone()+0x6d) [0x7fce5b336ead]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---


Version-Release number of selected component (if applicable):
ceph version 12.2.5-59.el7cp (d4b9f17b56b3348566926849313084dd6efc2ca2) luminous (stable)

How reproducible:
1/1

Steps to Reproduce:
1. Create cluster with active - active MDS
2. get a dir path from "get subtree" command
3. Run scrub_path 

Actual results:
MDS crashed and standby MDS become active

Expected results:
There should not be crash

Additional info:
NA

Comment 3 Yan, Zheng 2018-11-07 13:09:35 UTC

Scrub does not work properly in multimds setup. I'm working on this issue.

Comment 6 Giridhar Ramaraju 2019-08-05 13:11:47 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 7 Giridhar Ramaraju 2019-08-05 13:12:42 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 9 Yan, Zheng 2019-08-26 07:59:46 UTC

No. I hasn't finished the code