1733279 – kubelet PLEG unhealthy, crio 1.14 hangs in ContainerStatus

Bug 1733279 - kubelet PLEG unhealthy, crio 1.14 hangs in ContainerStatus

Summary: kubelet PLEG unhealthy, crio 1.14 hangs in ContainerStatus

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Peter Hunt
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-25 15:19 UTC by Petr Muller
Modified:	2019-10-16 06:33 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:33:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:33:43 UTC

Description Petr Muller 2019-07-25 15:19:34 UTC

Description of problem:

Saw many tests to fail, investigated further and found out the node is tainted with a "PLEG is not healthy: pleg was last seen active 19m28.423484988s ago; threshold is 3m0s." condition

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.2/2435
https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.2/2435/artifacts/e2e-aws/nodes.json

Additional info:

Not sure if this is related to the existing https://bugzilla.redhat.com/show_bug.cgi?id=1636053 and/or https://bugzilla.redhat.com/show_bug.cgi?id=1613808, both are reported for 3.x, while this is 4.2 development.

Comment 1 Clayton Coleman 2019-07-25 16:56:14 UTC

Container runtime going down is high or above severity.

Comment 3 Seth Jennings 2019-07-29 21:38:07 UTC

(partial) mitigation
https://github.com/cri-o/cri-o/pull/2655

potential fix
https://github.com/containers/storage/pull/399

upstream cri-o issue mirroring this bz
https://github.com/cri-o/cri-o/issues/2584

Comment 4 Seth Jennings 2019-08-08 15:52:58 UTC

No occurrences in CI in the last 14 days.  Seems to be fixed.
https://search.svc.ci.openshift.org/?search=pleg+was+last+seen+active&maxAge=336h&context=2&type=all

Comment 7 errata-xmlrpc 2019-10-16 06:33:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.