2224671 – [Tracker for Ceph BZ #2231784] /builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())

Bug 2224671 - [Tracker for Ceph BZ #2231784] /builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end()) [NEEDINFO]

Summary: [Tracker for Ceph BZ #2231784] /builddir/build/BUILD/ceph-17.2.6/src/osd/osd_...

Keywords:
Status:	NEW
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Radoslaw Zarzynski
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:	2231784
Blocks:
TreeView+	depends on / blocked

Reported:	2023-07-21 23:11 UTC by Alexander Chuzhoy
Modified:	2024-09-10 11:29 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2231784 (view as bug list)
Environment:
Last Closed:
Embargoed:
Flags:	rzarzyns: needinfo? (sasha)

Attachments	(Terms of Use)

Description Alexander Chuzhoy 2023-07-21 23:11:41 UTC

Versions:
mcg-operator.v4.13.0-rhodf
odf-operator.v4.13.0-rhodf
ocs-operator.v4.13.0-rhodf
OCP: 4.13.0


The cluster was running for 51 days and there was no issue.
Was checking the API performance dashboard today, selecting 2 weeks period...

Apparently this is resource intensive operation.



oc get pod -A|grep -v Run|grep -v Comple
NAMESPACE                                          NAME                                                                          READY   STATUS             RESTARTS         AGE
openshift-storage                                  rook-ceph-osd-1-88fc6f54d-xxfzt                                               1/2     CrashLoopBackOff   20 (4m43s ago)   85m


oc logs -n openshift-storage rook-ceph-osd-1-88fc6f54d-xxfzt|grep FAIL
Defaulted container "osd" out of: osd, log-collector, blkdevmapper (init), activate (init), expand-bluefs (init), chown-container-data-dir (init)
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end()

Comment 3 Alexander Chuzhoy 2023-07-24 14:27:55 UTC

Note: After I rebooted all the 3 nodes in this compact (only 3 controllers and 0 workers) cluster, the issue didn't reproduce

Comment 4 Blaine Gardner 2023-07-25 15:28:15 UTC

Since the issue has been resolved, I don't think this is urgent. It still seems wise to leave this open until we have someone available who can take a look at the must-gather to see if there are any clear error indications. It's possible this could have been a random issue with a memory block becoming corrupt in RAM or on disk.

Note You need to log in before you can comment on or make changes to this bug.