Bug 2224671

Summary: [Tracker for Ceph BZ #2231784] /builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Alexander Chuzhoy <sasha>
Component: cephAssignee: Radoslaw Zarzynski <rzarzyns>
ceph sub component: RADOS QA Contact: Elad <ebenahar>
Status: NEW --- Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bniver, brgardne, muagarwa, nojha, odf-bz-bot, pakamble, sheggodu, sostapov
Version: 4.13Flags: rzarzyns: needinfo? (sasha)
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2231784 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2231784    
Bug Blocks:    

Description Alexander Chuzhoy 2023-07-21 23:11:41 UTC
Versions:
mcg-operator.v4.13.0-rhodf
odf-operator.v4.13.0-rhodf
ocs-operator.v4.13.0-rhodf
OCP: 4.13.0


The cluster was running for 51 days and there was no issue.
Was checking the API performance dashboard today, selecting 2 weeks period...

Apparently this is resource intensive operation.



oc get pod -A|grep -v Run|grep -v Comple
NAMESPACE                                          NAME                                                                          READY   STATUS             RESTARTS         AGE
openshift-storage                                  rook-ceph-osd-1-88fc6f54d-xxfzt                                               1/2     CrashLoopBackOff   20 (4m43s ago)   85m


oc logs -n openshift-storage rook-ceph-osd-1-88fc6f54d-xxfzt|grep FAIL
Defaulted container "osd" out of: osd, log-collector, blkdevmapper (init), activate (init), expand-bluefs (init), chown-container-data-dir (init)
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end()

Comment 3 Alexander Chuzhoy 2023-07-24 14:27:55 UTC
Note: After I rebooted all the 3 nodes in this compact (only 3 controllers and 0 workers) cluster, the issue didn't reproduce

Comment 4 Blaine Gardner 2023-07-25 15:28:15 UTC
Since the issue has been resolved, I don't think this is urgent. It still seems wise to leave this open until we have someone available who can take a look at the must-gather to see if there are any clear error indications. It's possible this could have been a random issue with a memory block becoming corrupt in RAM or on disk.