Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1807139

Summary:	Alert on OOMKills on the cluster as a symptom of disruptive workloads or bugs
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Node	Assignee:	Lili Cosic <lcosic>
Status:	CLOSED ERRATA	QA Contact:	Sunil Choudhary <schoudha>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.4	CC:	aos-bugs, jokerman
Target Milestone:	---
Target Release:	4.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:
Clones:	1807140 (view as bug list)		Environment:
Last Closed:	2020-08-04 18:02:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1807140

Description Clayton Coleman 2020-02-25 17:09:15 UTC

An OOMKill on a cluster can be disruptive to workloads and infrastructure both immediately and over time (if a component partially fails).  We should alert when a significant number of OOMKills have occurred.

As a starting point, we should pick a rate that we believe is likely to indicate serious problems and tune it down (to catch more issues) after we assess the impact in the field.  The alert should be at 'info' level for now in order to allow time for assessment.

Should be back ported to 4.3 where OOMKills may have caused significant production issues for a few customers.

Comment 4 Lili Cosic 2020-04-20 10:03:19 UTC

Already in the release notes, no need for extra docs.

Comment 6 errata-xmlrpc 2020-08-04 18:02:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409