Bug 2030451 - MDS crash on unsupported metrics from Kernel Client
Summary: MDS crash on unsupported metrics from Kernel Client
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 5.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 5.0z4
Assignee: Venky Shankar
QA Contact: Yogesh Mane
URL:
Whiteboard:
: 2029832 2033172 (view as bug list)
Depends On:
Blocks: 1959686
TreeView+ depends on / blocked
 
Reported: 2021-12-08 19:38 UTC by Michael J. Kidd
Modified: 2025-04-04 13:46 UTC (History)
13 users (show)

Fixed In Version: ceph-16.2.0-149.el8cp
Doc Type: Bug Fix
Doc Text:
.The MDS daemon no longer crashes when receiving unsupported metrics Previously, the MDS daemon could not handle the new metrics from the kernel client causing the MDS daemons to crash on receiving any unsupported metrics. With this release, the MDS discards any unsupported metrics and works as expected.
Clone Of:
Environment:
Last Closed: 2022-02-08 13:01:20 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 50822 0 None None None 2021-12-08 19:39:00 UTC
Github ceph ceph pull 41596 0 None Merged pacific: mds: do not assert when receiving a unknow metric type 2021-12-08 19:39:00 UTC
Red Hat Issue Tracker RHCEPH-2528 0 None None None 2021-12-08 19:46:20 UTC
Red Hat Knowledge Base (Solution) 6617781 0 None None None 2021-12-30 10:50:05 UTC
Red Hat Product Errata RHBA-2022:0466 0 None None None 2022-02-08 13:01:44 UTC

Description Michael J. Kidd 2021-12-08 19:38:12 UTC
Description of problem:
Ceph MDS repeatedly crashing with:
Dec 08 14:11:45 cephmds1 conmon[3431]: debug     -1> 2021-12-08T13:11:45.526+0000 7f08eb0aa700 -1 /builddir/build/BUILD/ceph-16.2.0/src/include/cephfs/metrics/Types.h: In function 'std::ostream& operator<<(std::ostream&, const ClientMetricType&)' thread 7f08eb0aa700 time 2021-12-08T13:11:45.526200+0000
Dec 08 14:11:45 cephmds1 conmon[3431]: /builddir/build/BUILD/ceph-16.2.0/src/include/cephfs/metrics/Types.h: 56: ceph_abort_msg("abort() called")
Dec 08 14:11:45 cephmds1 conmon[3431]: 
Dec 08 14:11:45 cephmds1 conmon[3431]:  ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)
Dec 08 14:11:45 cephmds1 conmon[3431]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x7f08f3d15cd4]
Dec 08 14:11:45 cephmds1 conmon[3431]:  2: (operator<<(std::ostream&, ClientMetricType const&)+0x10e) [0x7f08f3f9a2ce]
Dec 08 14:11:45 cephmds1 conmon[3431]:  3: (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7f08f3f9a4a1]
Dec 08 14:11:45 cephmds1 conmon[3431]:  4: (DispatchQueue::entry()+0x1be2) [0x7f08f3f50312]
Dec 08 14:11:45 cephmds1 conmon[3431]:  5: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f08f3fff9b1]
Dec 08 14:11:45 cephmds1 conmon[3431]:  6: /lib64/libpthread.so.0(+0x814a) [0x7f08f2ab714a]
Dec 08 14:11:45 cephmds1 conmon[3431]:  7: clone()

Version-Release number of selected component (if applicable):
RHCS 5.0

How reproducible:
100% for this env

Actual results:
Ceph MDS crashes repeatedly

Expected results:
Ceph MDS not to crash

Additional info:
Appears to be fixed via upstream:
# Tracker:
  https://tracker.ceph.com/issues/50822
# PR:
  https://github.com/ceph/ceph/pull/41596

Comment 4 Patrick Donnelly 2021-12-09 16:43:10 UTC
*** Bug 2029832 has been marked as a duplicate of this bug. ***

Comment 8 Patrick Donnelly 2021-12-16 19:27:57 UTC
*** Bug 2033172 has been marked as a duplicate of this bug. ***

Comment 19 errata-xmlrpc 2022-02-08 13:01:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0466

Comment 20 Venky Shankar 2022-02-14 04:21:15 UTC
Clearing NI (nothing is pending afaics).


Note You need to log in before you can comment on or make changes to this bug.