Bug 1852964 - Nodes going notReady because of unknown Reason
Summary: Nodes going notReady because of unknown Reason
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-controller-manager
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.0
Assignee: Gabe Montero
QA Contact: wewang
Depends On:
Blocks: 1860397
TreeView+ depends on / blocked
Reported: 2020-07-01 16:37 UTC by Jaspreet Kaur
Modified: 2020-10-27 16:12 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: intermittent availability issues with the apiserver could lead to intermittent issues with the openshift controller manager operator retrieving deployments it manages Consequence: a failure to retrieve a deployment at the wrong moment resulted in the openshift controller manager operator making a nil reference that results in a panic in the operator Fix: nil reference checks are now in place to handle this error condition, report it, but also retry again per expected controller operations. Result: openshift controller manager operator properly handles intermittent issues retrieving deployments from the api server.
Clone Of:
: 1860397 (view as bug list)
Last Closed: 2020-10-27 16:11:46 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-openshift-controller-manager-operator pull 163 0 None closed Bug 1852964: account for nil DaemonSet returned from library-go 2020-10-20 12:35:32 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:12:05 UTC

Description Jaspreet Kaur 2020-07-01 16:37:01 UTC
Description of problem: It has been observed that Nodes were going to NotReady state because of unknown Reason. different issues are seen at same point of time.

Processes are getting oomkilled as seen in dmesg

./openshift-apiserver/pods/apiserver-vcgm8/openshift-apiserver/openshift-apiserver/logs/previous.log:2020-06-20T13:11:46.585965523Z 	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 +0x1b5
./openshift-apiserver/pods/apiserver-vcgm8/openshift-apiserver/openshift-apiserver/logs/previous.log:2020-06-20T13:11:46.585965523Z E0620 13:11:46.585778       1 wrap.go:39] apiserver panic'd on GET /apis/oauth.openshift.io/v1/oauthclients?limit=500&resourceVersion=0
./openshift-etcd/core/pods.yaml:            http: panic serving runtime error: invalid memory address
./openshift-etcd/core/pods.yaml:            +0x139\npanic(0xe5db60, 0x194e150)\n\t/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522

Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 30 Jun 2020 21:00:03 +0400   Thu, 25 Jun 2020 08:44:01 +0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 30 Jun 2020 21:00:03 +0400   Thu, 25 Jun 2020 08:44:01 +0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 30 Jun 2020 21:00:03 +0400   Thu, 25 Jun 2020 08:44:01 +0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Tue, 30 Jun 2020 21:00:03 +0400   Tue, 30 Jun 2020 17:03:49 +0400   KubeletNotReady              [container runtime is down, PLEG is not healthy: pleg was last seen active 3h58m19.579610535s ago; threshold is 3m0s]

Version-Release number of selected component (if applicable):


How reproducible:

Steps to Reproduce:

Actual results: Node going to NotReady state because of panic of unknown reason

Expected results:  Node should not have this recurring issue.

Additional info:

Comment 11 wewang 2020-07-20 09:23:53 UTC
@gabe Check version as below, didnot find node is not ready, and openshift-apiserver , openshift-controller-manager-operator, kubelet service log in node, not met related issue again, verified it, if need i check others, feel free set bug to ON_QA.

Comment 13 errata-xmlrpc 2020-10-27 16:11:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.