1971084 – [Tracker for BZ #1971118] [GSS] Ceph crash - aio lock

Bug 1971084 - [Tracker for BZ #1971118] [GSS] Ceph crash - aio lock

Summary: [Tracker for BZ #1971118] [GSS] Ceph crash - aio lock

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Radoslaw Zarzynski
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1971118
TreeView+	depends on / blocked

Reported:	2021-06-11 21:59 UTC by kelwhite
Modified:	2024-10-01 18:36 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1971118 (view as bug list)
Environment:
Last Closed:	2021-07-19 19:47:48 UTC
Embargoed:
Flags:	bhubbard: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	51435	0	None	None	None	2021-06-30 01:53:06 UTC

Internal Links: 1869372

Comment 12 Yaniv Kaul 2021-07-06 08:25:52 UTC

There's a customer case attached to this BZ, it should not be in low severity. Raised it a bit as well as set priority to it.
Moved to ASSIGNED assuming Brad is handling it - please fix if I'm wrong here.

Comment 13 Brad Hubbard 2021-07-09 01:57:50 UTC

The ceph-osd crash looks to be a side-effect of the issues occurring on the system at the time. The kernel on this machine was effectively hung at the time leading to significant IO stalls. This was a highly unstable and unpredictable environment in which no application could realistically be expected to continue functioning (this is reinforced by the fact that systemd was seen to be crashing  at the same time). The osd process appears to be in pthread code which is some of the most highly used and bullet-proof code out there so this reinforces that the environment was highly unstable. There is no additional data such as a coredump which would assist us in further pinpointing the exact nature of the issue but I don't believe any further investigation of this crash is a good use of engineering's time since no non-trivial application should be expected to continue functioning in such circumstances (not to mention we would not support such a configuration of all storage on a single vSphere datastore).

If no one objects I propose closing this Bugzilla NOTABUG.

Note You need to log in before you can comment on or make changes to this bug.