Bug 2058524

Summary:

MDS crash / rook-ceph-mds-ocs-storagecluster-cephfilesystem-a/b stuck in a CrashLoopBackOff

Product:

[Red Hat Storage] Red Hat OpenShift Data Foundation

Reporter:

th3gov

Component:

ceph

Assignee:

Mudit Agarwal <muagarwa>

Status:

CLOSED NOTABUG

QA Contact:

Elad <ebenahar>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

4.9

CC:

bniver, madam, mmuench, muagarwa, ocs-bugs, odf-bz-bot

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2022-05-27 10:47:56 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
rook-ceph-mds-ocs-storagecluster-cephfilesystem-crashlog	none

Description th3gov 2022-02-25 08:52:55 UTC

Created attachment 1863281 [details]
rook-ceph-mds-ocs-storagecluster-cephfilesystem-crashlog

Description of problem:
After upgrading from OpenShift Container Storage 4.8.8 to OpenShift Data Foundation 4.9.2, the mds container from the pods rook-ceph-mds-ocs-storagecluster-cephfilesystem-a/b is not starting / stuck in CrashLoopBackOff. I do not see any out of memory errors in the Events.

In the logs I found the following error:

debug     -1> 2022-02-24T14:37:50.432+0000 7f9bbe952700 -1 /builddir/build/BUILD/ceph-16.2.0/src/include/cephfs/metrics/Types.h: In function 'std::ostream& operator<<(std::ostream&, const ClientMetricType&)' thread 7f9bbe952700 time 2022-02-24T14:37:50.432534+0000
/builddir/build/BUILD/ceph-16.2.0/src/include/cephfs/metrics/Types.h: 56: ceph_abort_msg("abort() called")

 ceph version 16.2.0-146.el8cp (56f5e9cfe88a08b6899327eca5166ca1c4a392aa) pacific (stable)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x7f9bc75c12a0]
 2: (operator<<(std::ostream&, ClientMetricType const&)+0x10e) [0x7f9bc78480ee]
 3: (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7f9bc78482c1]
 4: (DispatchQueue::entry()+0x1be2) [0x7f9bc77fdfa2]
 5: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f9bc78ad8a1]
 6: /lib64/libpthread.so.0(+0x817a) [0x7f9bc636117a]
 7: clone()

debug      0> 2022-02-24T14:37:50.434+0000 7f9bbe952700 -1 *** Caught signal (Aborted) **
 in thread 7f9bbe952700 thread_name:ms_dispatch

 ceph version 16.2.0-146.el8cp (56f5e9cfe88a08b6899327eca5166ca1c4a392aa) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12c20) [0x7f9bc636bc20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b6) [0x7f9bc75c1371]
 5: (operator<<(std::ostream&, ClientMetricType const&)+0x10e) [0x7f9bc78480ee]
 6: (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7f9bc78482c1]
 7: (DispatchQueue::entry()+0x1be2) [0x7f9bc77fdfa2]
 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f9bc78ad8a1]
 9: /lib64/libpthread.so.0(+0x817a) [0x7f9bc636117a]
 10: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Version of all relevant components (if applicable):
OpenShift Data Foundation 4.9.2


Is there any workaround available to the best of your knowledge?
Maybe https://access.redhat.com/solutions/6617781 but I don't know if its applicable to ODF.


Can this issue reproducible?
Maybe this issue only occurs only in combination with Red Hat OpenShift Logging v5.3.4-13 and OpenShift Elasticsearch Operator v5.3.4-13. But I don't know for sure if its reproducible.

Comment 2 th3gov 2022-02-28 08:15:34 UTC

It seems I found a workaround:
After I disabled the "Console plugin" from ODF, the mds pods are not crashing anymore.

Comment 3 Scott Ostapovicz 2022-03-14 14:53:06 UTC

Not sure which component this would be.