2231784 – /builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())

Bug 2231784 - /builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())

Summary: /builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_asse...

Keywords:
Status:	ASSIGNED
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	6.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	7.2
Assignee:	Radoslaw Zarzynski
QA Contact:	Pawan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2224671
TreeView+	depends on / blocked

Reported:	2023-08-14 06:09 UTC by Mudit Agarwal
Modified:	2024-06-26 19:16 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2224671
Environment:
Last Closed:	2023-12-15 17:56:32 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-8993	0	None	None	None	2024-05-10 09:34:52 UTC

Description Mudit Agarwal 2023-08-14 06:09:30 UTC

+++ This bug was initially created as a clone of Bug #2224671 +++

Versions:
mcg-operator.v4.13.0-rhodf
odf-operator.v4.13.0-rhodf
ocs-operator.v4.13.0-rhodf
OCP: 4.13.0


The cluster was running for 51 days and there was no issue.
Was checking the API performance dashboard today, selecting 2 weeks period...

Apparently this is resource intensive operation.



oc get pod -A|grep -v Run|grep -v Comple
NAMESPACE                                          NAME                                                                          READY   STATUS             RESTARTS         AGE
openshift-storage                                  rook-ceph-osd-1-88fc6f54d-xxfzt                                               1/2     CrashLoopBackOff   20 (4m43s ago)   85m


oc logs -n openshift-storage rook-ceph-osd-1-88fc6f54d-xxfzt|grep FAIL
Defaulted container "osd" out of: osd, log-collector, blkdevmapper (init), activate (init), expand-bluefs (init), chown-container-data-dir (init)
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end())
/builddir/build/BUILD/ceph-17.2.6/src/osd/osd_types.h: 4882: FAILED ceph_assert(it != missing.end()

--- Additional comment from RHEL Program Management on 2023-07-21 23:11:50 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.14.0' to '?', and so is being proposed to be fixed at the ODF 4.14.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from Alexander Chuzhoy on 2023-07-21 23:22:59 UTC ---

must-gather can be retrieved from here: https://file.rdu.redhat.com/~achuzhoy/bugs/OCPBUGS-16664/

--- Additional comment from Alexander Chuzhoy on 2023-07-24 14:27:55 UTC ---

Note: After I rebooted all the 3 nodes in this compact (only 3 controllers and 0 workers) cluster, the issue didn't reproduce

--- Additional comment from Blaine Gardner on 2023-07-25 15:28:15 UTC ---

Since the issue has been resolved, I don't think this is urgent. It still seems wise to leave this open until we have someone available who can take a look at the must-gather to see if there are any clear error indications. It's possible this could have been a random issue with a memory block becoming corrupt in RAM or on disk.

--- Additional comment from Red Hat Bugzilla on 2023-08-03 08:31:14 UTC ---

Account disabled by LDAP Audit

Comment 3 Scott Ostapovicz 2024-05-14 14:11:19 UTC

As this is not being identified as a blocker for 4.16 (the reason it was reopened) I am retargeting this to 7.1 z1.  Please bring this up at the program meeting if this is a problem.

Note You need to log in before you can comment on or make changes to this bug.