Bug 1971084

Summary: [Tracker for BZ #1971118] [GSS] Ceph crash - aio lock
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: kelwhite
Component: cephAssignee: Scott Ostapovicz <sostapov>
Status: CLOSED NOTABUG QA Contact: Raz Tamir <ratamir>
Severity: medium Docs Contact:
Priority: high    
Version: 4.5CC: bhubbard, bniver, hnallurv, jdurgin, madam, muagarwa, ocs-bugs, odf-bz-bot, pdhange, tim.crockett
Target Milestone: ---Keywords: Tracking
Target Release: ---Flags: bhubbard: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1971118 (view as bug list) Environment:
Last Closed: 2021-07-19 19:47:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1971118    

Comment 12 Yaniv Kaul 2021-07-06 08:25:52 UTC
There's a customer case attached to this BZ, it should not be in low severity. Raised it a bit as well as set priority to it.
Moved to ASSIGNED assuming Brad is handling it - please fix if I'm wrong here.

Comment 13 Brad Hubbard 2021-07-09 01:57:50 UTC
The ceph-osd crash looks to be a side-effect of the issues occurring on the system at the time. The kernel on this machine was effectively hung at the time leading to significant IO stalls. This was a highly unstable and unpredictable environment in which no application could realistically be expected to continue functioning (this is reinforced by the fact that systemd was seen to be crashing  at the same time). The osd process appears to be in pthread code which is some of the most highly used and bullet-proof code out there so this reinforces that the environment was highly unstable. There is no additional data such as a coredump which would assist us in further pinpointing the exact nature of the issue but I don't believe any further investigation of this crash is a good use of engineering's time since no non-trivial application should be expected to continue functioning in such circumstances (not to mention we would not support such a configuration of all storage on a single vSphere datastore).

If no one objects I propose closing this Bugzilla NOTABUG.