Bug 1971084

Summary:	[Tracker for BZ #1971118] [GSS] Ceph crash - aio lock
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	kelwhite
Component:	ceph	Assignee:	Radoslaw Zarzynski <rzarzyns>
ceph sub component:	RADOS	QA Contact:	Elad <ebenahar>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	medium
Priority:	high	CC:	bhubbard, bniver, hnallurv, jdurgin, madam, muagarwa, nojha, ocs-bugs, odf-bz-bot, pdhange, sostapov, tim.crockett
Version:	4.5	Keywords:	Tracking
Target Milestone:	---	Flags:	bhubbard: needinfo-
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1971118 (view as bug list)		Environment:
Last Closed:	2021-07-19 19:47:48 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1971118

Comment 12 Yaniv Kaul 2021-07-06 08:25:52 UTC

There's a customer case attached to this BZ, it should not be in low severity. Raised it a bit as well as set priority to it.
Moved to ASSIGNED assuming Brad is handling it - please fix if I'm wrong here.

Comment 13 Brad Hubbard 2021-07-09 01:57:50 UTC

The ceph-osd crash looks to be a side-effect of the issues occurring on the system at the time. The kernel on this machine was effectively hung at the time leading to significant IO stalls. This was a highly unstable and unpredictable environment in which no application could realistically be expected to continue functioning (this is reinforced by the fact that systemd was seen to be crashing  at the same time). The osd process appears to be in pthread code which is some of the most highly used and bullet-proof code out there so this reinforces that the environment was highly unstable. There is no additional data such as a coredump which would assist us in further pinpointing the exact nature of the issue but I don't believe any further investigation of this crash is a good use of engineering's time since no non-trivial application should be expected to continue functioning in such circumstances (not to mention we would not support such a configuration of all storage on a single vSphere datastore).

If no one objects I propose closing this Bugzilla NOTABUG.