Created attachment 1863281 [details] rook-ceph-mds-ocs-storagecluster-cephfilesystem-crashlog Description of problem: After upgrading from OpenShift Container Storage 4.8.8 to OpenShift Data Foundation 4.9.2, the mds container from the pods rook-ceph-mds-ocs-storagecluster-cephfilesystem-a/b is not starting / stuck in CrashLoopBackOff. I do not see any out of memory errors in the Events. In the logs I found the following error: debug -1> 2022-02-24T14:37:50.432+0000 7f9bbe952700 -1 /builddir/build/BUILD/ceph-16.2.0/src/include/cephfs/metrics/Types.h: In function 'std::ostream& operator<<(std::ostream&, const ClientMetricType&)' thread 7f9bbe952700 time 2022-02-24T14:37:50.432534+0000 /builddir/build/BUILD/ceph-16.2.0/src/include/cephfs/metrics/Types.h: 56: ceph_abort_msg("abort() called") ceph version 16.2.0-146.el8cp (56f5e9cfe88a08b6899327eca5166ca1c4a392aa) pacific (stable) 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x7f9bc75c12a0] 2: (operator<<(std::ostream&, ClientMetricType const&)+0x10e) [0x7f9bc78480ee] 3: (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7f9bc78482c1] 4: (DispatchQueue::entry()+0x1be2) [0x7f9bc77fdfa2] 5: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f9bc78ad8a1] 6: /lib64/libpthread.so.0(+0x817a) [0x7f9bc636117a] 7: clone() debug 0> 2022-02-24T14:37:50.434+0000 7f9bbe952700 -1 *** Caught signal (Aborted) ** in thread 7f9bbe952700 thread_name:ms_dispatch ceph version 16.2.0-146.el8cp (56f5e9cfe88a08b6899327eca5166ca1c4a392aa) pacific (stable) 1: /lib64/libpthread.so.0(+0x12c20) [0x7f9bc636bc20] 2: gsignal() 3: abort() 4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b6) [0x7f9bc75c1371] 5: (operator<<(std::ostream&, ClientMetricType const&)+0x10e) [0x7f9bc78480ee] 6: (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7f9bc78482c1] 7: (DispatchQueue::entry()+0x1be2) [0x7f9bc77fdfa2] 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f9bc78ad8a1] 9: /lib64/libpthread.so.0(+0x817a) [0x7f9bc636117a] 10: clone() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Version of all relevant components (if applicable): OpenShift Data Foundation 4.9.2 Is there any workaround available to the best of your knowledge? Maybe https://access.redhat.com/solutions/6617781 but I don't know if its applicable to ODF. Can this issue reproducible? Maybe this issue only occurs only in combination with Red Hat OpenShift Logging v5.3.4-13 and OpenShift Elasticsearch Operator v5.3.4-13. But I don't know for sure if its reproducible.
It seems I found a workaround: After I disabled the "Console plugin" from ODF, the mds pods are not crashing anymore.
Not sure which component this would be.