Bug 2030451

Summary: MDS crash on unsupported metrics from Kernel Client
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Michael J. Kidd <linuxkidd>
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Yogesh Mane <ymane>
Severity: high Docs Contact:
Priority: high    
Version: 5.0CC: amk, ceph-eng-bugs, fkellehe, gjose, hyelloji, lithomas, mmuench, sbaldwin, tpetr, tserlin, vereddy, vshankar, xiubli
Target Milestone: ---   
Target Release: 5.0z4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-16.2.0-149.el8cp Doc Type: Bug Fix
Doc Text:
.The MDS daemon no longer crashes when receiving unsupported metrics Previously, the MDS daemon could not handle the new metrics from the kernel client causing the MDS daemons to crash on receiving any unsupported metrics. With this release, the MDS discards any unsupported metrics and works as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-08 13:01:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1959686    

Description Michael J. Kidd 2021-12-08 19:38:12 UTC
Description of problem:
Ceph MDS repeatedly crashing with:
Dec 08 14:11:45 cephmds1 conmon[3431]: debug     -1> 2021-12-08T13:11:45.526+0000 7f08eb0aa700 -1 /builddir/build/BUILD/ceph-16.2.0/src/include/cephfs/metrics/Types.h: In function 'std::ostream& operator<<(std::ostream&, const ClientMetricType&)' thread 7f08eb0aa700 time 2021-12-08T13:11:45.526200+0000
Dec 08 14:11:45 cephmds1 conmon[3431]: /builddir/build/BUILD/ceph-16.2.0/src/include/cephfs/metrics/Types.h: 56: ceph_abort_msg("abort() called")
Dec 08 14:11:45 cephmds1 conmon[3431]: 
Dec 08 14:11:45 cephmds1 conmon[3431]:  ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)
Dec 08 14:11:45 cephmds1 conmon[3431]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x7f08f3d15cd4]
Dec 08 14:11:45 cephmds1 conmon[3431]:  2: (operator<<(std::ostream&, ClientMetricType const&)+0x10e) [0x7f08f3f9a2ce]
Dec 08 14:11:45 cephmds1 conmon[3431]:  3: (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7f08f3f9a4a1]
Dec 08 14:11:45 cephmds1 conmon[3431]:  4: (DispatchQueue::entry()+0x1be2) [0x7f08f3f50312]
Dec 08 14:11:45 cephmds1 conmon[3431]:  5: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f08f3fff9b1]
Dec 08 14:11:45 cephmds1 conmon[3431]:  6: /lib64/libpthread.so.0(+0x814a) [0x7f08f2ab714a]
Dec 08 14:11:45 cephmds1 conmon[3431]:  7: clone()

Version-Release number of selected component (if applicable):
RHCS 5.0

How reproducible:
100% for this env

Actual results:
Ceph MDS crashes repeatedly

Expected results:
Ceph MDS not to crash

Additional info:
Appears to be fixed via upstream:
# Tracker:
  https://tracker.ceph.com/issues/50822
# PR:
  https://github.com/ceph/ceph/pull/41596

Comment 4 Patrick Donnelly 2021-12-09 16:43:10 UTC
*** Bug 2029832 has been marked as a duplicate of this bug. ***

Comment 8 Patrick Donnelly 2021-12-16 19:27:57 UTC
*** Bug 2033172 has been marked as a duplicate of this bug. ***

Comment 19 errata-xmlrpc 2022-02-08 13:01:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0466

Comment 20 Venky Shankar 2022-02-14 04:21:15 UTC
Clearing NI (nothing is pending afaics).