+++ This bug was initially created as a clone of Bug #2224671 +++ Versions: mcg-operator.v4.13.0-rhodf odf-operator.v4.13.0-rhodf ocs-operator.v4.13.0-rhodf OCP: 4.13.0 The cluster was running for 51 days and there was no issue. Was checking the API performance dashboard today, selecting 2 weeks period... Apparently this is resource intensive operation. oc get pod -A|grep -v Run|grep -v Comple NAMESPACE NAME READY STATUS RESTARTS AGE openshift-storage rook-ceph-osd-1-88fc6f54d-xxfzt 1/2 CrashLoopBackOff 20 (4m43s ago) 85m oc logs -n openshift-storage rook-ceph-osd-1-88fc6f54d-xxfzt|grep FAIL Defaulted container "osd" out of: osd, log-collector, blkdevmapper (init), activate (init), expand-bluefs (init), chown-container-data-dir (init) /builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end()) /builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end()) /builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end()) /builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end() --- Additional comment from RHEL Program Management on 2023-07-21 23:11:50 UTC --- This bug having no release flag set previously, is now set with release flag 'odf‑4.14.0' to '?', and so is being proposed to be fixed at the ODF 4.14.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag. --- Additional comment from Alexander Chuzhoy on 2023-07-21 23:22:59 UTC --- must-gather can be retrieved from here: https://file.rdu.redhat.com/~achuzhoy/bugs/OCPBUGS-16664/ --- Additional comment from Alexander Chuzhoy on 2023-07-24 14:27:55 UTC --- Note: After I rebooted all the 3 nodes in this compact (only 3 controllers and 0 workers) cluster, the issue didn't reproduce --- Additional comment from Blaine Gardner on 2023-07-25 15:28:15 UTC --- Since the issue has been resolved, I don't think this is urgent. It still seems wise to leave this open until we have someone available who can take a look at the must-gather to see if there are any clear error indications. It's possible this could have been a random issue with a memory block becoming corrupt in RAM or on disk. --- Additional comment from Red Hat Bugzilla on 2023-08-03 08:31:14 UTC --- Account disabled by LDAP Audit
As this is not being identified as a blocker for 4.16 (the reason it was reopened) I am retargeting this to 7.1 z1. Please bring this up at the program meeting if this is a problem.