1632430 – KubeletNotReady "pleg was last seen active.." string reports corrupt 'last seen active' timing

Bug 1632430 - KubeletNotReady "pleg was last seen active.." string reports corrupt 'last seen active' timing

Summary: KubeletNotReady "pleg was last seen active.." string reports corrupt 'last se...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Seth Jennings
QA Contact:	Weinan Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-09-24 19:13 UTC by Justin Pierce
Modified:	2019-10-31 16:30 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Improves log message when PLEG has not been healthy since the kubelet started
Clone Of:
Environment:
Last Closed:	2019-06-04 10:40:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
"impossible" last seen active output (2.27 KB, text/plain) 2018-09-24 19:13 UTC, Justin Pierce	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:40:45 UTC

Description Justin Pierce 2018-09-24 19:13:28 UTC

Created attachment 1486508 [details]
"impossible" last seen active output

Description of problem:
Observed the following string `oc describe` Conditions output: container runtime is down,PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s,network state unknown

2562047h equates to hundreds of years.

Version-Release number of selected component (if applicable):
v3.11.0-0.21.0

How reproducible:
Consistent in this particular environment for the NotReady node in question.

Additional info:
The node is running cri-o-1.11.1-2.rhaos3.11.git1759204.el7.x86_64.

Comment 1 Seth Jennings 2018-09-24 19:25:39 UTC

Yes, that is a known log readability issue.

Basically is it a unit64 underflow due to the last relist time being 0. It indicates that PLEG has not succeeded, ever, since the kubelet process started.  

If you see this when just starting the kubelet, it is not a real error.

If you see this after the kubelet has been up for a while, typically indicates that the runtime is not running or unresponsive.

Comment 2 Seth Jennings 2018-09-24 19:31:33 UTC

Actually, it isn't an underflow.  It is just the current time - 0 as a duration:

https://golang.org/pkg/time/#Duration

Comment 3 Seth Jennings 2018-09-24 20:27:43 UTC

No time like the present to fix this.  Upstream PR:
https://github.com/kubernetes/kubernetes/pull/69008

Comment 5 Seth Jennings 2018-10-31 16:48:07 UTC

Origin PR:
https://github.com/openshift/origin/pull/21397

Comment 9 Ryan Howe 2019-05-31 18:33:46 UTC

I just would like to note again (also see comment #1) this bug fix is purely cosmetic and does not point to an error with the node or container-runtime 

The following log is a result time being 0 since the relist time, when the node process is first started. 

```
PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago
```

Comment 11 errata-xmlrpc 2019-06-04 10:40:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.