Bug 1733279

Summary: kubelet PLEG unhealthy, crio 1.14 hangs in ContainerStatus
Product: OpenShift Container Platform Reporter: Petr Muller <pmuller>
Component: NodeAssignee: Peter Hunt <pehunt>
Status: CLOSED ERRATA QA Contact: Sunil Choudhary <schoudha>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.2.0CC: aos-bugs, ccoleman, jokerman, mmccomas, mpatel, pehunt, sjenning
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:33:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Petr Muller 2019-07-25 15:19:34 UTC
Description of problem:

Saw many tests to fail, investigated further and found out the node is tainted with a "PLEG is not healthy: pleg was last seen active 19m28.423484988s ago; threshold is 3m0s." condition

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.2/2435
https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.2/2435/artifacts/e2e-aws/nodes.json

Additional info:

Not sure if this is related to the existing https://bugzilla.redhat.com/show_bug.cgi?id=1636053 and/or https://bugzilla.redhat.com/show_bug.cgi?id=1613808, both are reported for 3.x, while this is 4.2 development.

Comment 1 Clayton Coleman 2019-07-25 16:56:14 UTC
Container runtime going down is high or above severity.

Comment 3 Seth Jennings 2019-07-29 21:38:07 UTC
(partial) mitigation
https://github.com/cri-o/cri-o/pull/2655

potential fix
https://github.com/containers/storage/pull/399

upstream cri-o issue mirroring this bz
https://github.com/cri-o/cri-o/issues/2584

Comment 4 Seth Jennings 2019-08-08 15:52:58 UTC
No occurrences in CI in the last 14 days.  Seems to be fixed.
https://search.svc.ci.openshift.org/?search=pleg+was+last+seen+active&maxAge=336h&context=2&type=all

Comment 7 errata-xmlrpc 2019-10-16 06:33:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922