Description of problem: After running disaster recovery scirpts for restoring the state of the cluster, there is a large lapse in time observed before kubelet retries to set the status, even if the kubelet is running. More info on https://bugzilla.redhat.com/show_bug.cgi?id=1743190#c8 Version-Release number of selected component (if applicable): 4.1.3 Steps to Reproduce: https://bugzilla.redhat.com/show_bug.cgi?id=1743190#c8 Actual results: $ journalctl -u kubelet.service | grep "kubelet_node_status\.go" ... ... ... ... Aug 27 07:43:16 control-plane-2 hyperkube[113686]: I0827 07:43:16.160397 113686 kubelet_node_status.go:114] Node control-plane-2 was previously registered Aug 27 07:43:16 control-plane-2 hyperkube[113686]: I0827 07:43:16.160612 113686 kubelet_node_status.go:75] Successfully registered node control-plane-2 Aug 27 12:12:41 control-plane-2 hyperkube[113686]: E0827 12:12:41.217131 113686 kubelet_node_status.go:380] Error updating node status, will retry: failed to patch status "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"MemoryPressure\"},{\"type\":\"DiskPressure\"},{\"type\":\"PIDPressure\"},{\"type\":\"Ready\"}],\"conditions\":[{\"lastHeartbeatTime\":\"2019-08-27T12:12:34Z\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2019-08-27T12:12:34Z\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2019-08-27T12:12:34Z\",\"type\":\"PIDPressure\"},{\"lastHeartbeatTime\":\"2019-08-27T12:12:34Z\",\"type\":\"Ready\"}]}}" for node "control-plane-2": etcdserver: request timed out Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.655847 66796 kubelet_node_status.go:278] Setting node annotation to enable volume controller attach/detach Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.660595 66796 kubelet_node_status.go:446] Recording NodeHasSufficientMemory event message for node control-plane-2 Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.660796 66796 kubelet_node_status.go:446] Recording NodeHasNoDiskPressure event message for node control-plane-2 Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.660885 66796 kubelet_node_status.go:446] Recording NodeHasSufficientPID event message for node control-plane-2 Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.661084 66796 kubelet_node_status.go:72] Attempting to register node control-plane-2 Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.686721 66796 kubelet_node_status.go:114] Node control-plane-2 was previously registered Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.686747 66796 kubelet_node_status.go:75] Successfully registered node control-plane-2 Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.690073 66796 kubelet_node_status.go:446] Recording NodeNotReady event message for node control-plane-2 Aug 27 19:22:13 control-plane-2 hyperkube[66796]: I0827 19:22:13.718789 66796 kubelet_node_status.go:446] Recording NodeReady event message for node control-plane-2 Expected results: $ journalctl -u kubelet.service | grep "kubelet_node_status\.go" ... ... ... ... Aug 27 07:43:16 control-plane-2 hyperkube[113686]: I0827 07:43:16.160397 113686 kubelet_node_status.go:114] Node control-plane-2 was previously registered Aug 27 07:43:16 control-plane-2 hyperkube[113686]: I0827 07:43:16.160612 113686 kubelet_node_status.go:75] Successfully registered node control-plane-2 Aug 27 12:12:41 control-plane-2 hyperkube[113686]: E0827 12:12:41.217131 113686 kubelet_node_status.go:380] Error updating node status, will retry: failed to patch status "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"MemoryPressure\"},{\"type\":\"DiskPressure\"},{\"type\":\"PIDPressure\"},{\"type\":\"Ready\"}],\"conditions\":[{\"lastHeartbeatTime\":\"2019-08-27T12:12:34Z\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2019-08-27T12:12:34Z\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2019-08-27T12:12:34Z\",\"type\":\"PIDPressure\"},{\"lastHeartbeatTime\":\"2019-08-27T12:12:34Z\",\"type\":\"Ready\"}]}}" for node "control-plane-2": etcdserver: request timed out Retrying, try 1 Retrying, try 2 Retrying, try 3 . . . Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.655847 66796 kubelet_node_status.go:278] Setting node annotation to enable volume controller attach/detach Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.660595 66796 kubelet_node_status.go:446] Recording NodeHasSufficientMemory event message for node control-plane-2 Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.660796 66796 kubelet_node_status.go:446] Recording NodeHasNoDiskPressure event message for node control-plane-2 Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.660885 66796 kubelet_node_status.go:446] Recording NodeHasSufficientPID event message for node control-plane-2 Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.661084 66796 kubelet_node_status.go:72] Attempting to register node control-plane-2 Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.686721 66796 kubelet_node_status.go:114] Node control-plane-2 was previously registered Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.686747 66796 kubelet_node_status.go:75] Successfully registered node control-plane-2 Aug 27 19:22:03 control-plane-2 hyperkube[66796]: I0827 19:22:03.690073 66796 kubelet_node_status.go:446] Recording NodeNotReady event message for node control-plane-2 Aug 27 19:22:13 control-plane-2 hyperkube[66796]: I0827 19:22:13.718789 66796 kubelet_node_status.go:446] Recording NodeReady event message for node control-plane-2 Additional info: Kubelet Conditions: Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure Unknown Tue, 27 Aug 2019 14:09:22 -0400 Tue, 27 Aug 2019 03:44:05 -0400 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Tue, 27 Aug 2019 14:09:22 -0400 Tue, 27 Aug 2019 03:44:05 -0400 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Tue, 27 Aug 2019 14:09:22 -0400 Tue, 27 Aug 2019 03:44:05 -0400 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Tue, 27 Aug 2019 14:09:22 -0400 Tue, 27 Aug 2019 03:44:05 -0400 NodeStatusUnknown Kubelet stopped posting node status. OutOfDisk Unknown Tue, 27 Aug 2019 02:19:03 -0400 Tue, 27 Aug 2019 03:44:05 -0400 NodeStatusNeverUpdated Kubelet never posted node status. Aug 27 12:12:41 tried to post status, errored, 14:09:22 it was determined that that status is not getting posted, kubelet went into an unknown state. The kubelet was able to get into a `Ready` state after a restart as described in https://bugzilla.redhat.com/show_bug.cgi?id=1743190#c8 Kubelet logs before restart: https://drive.google.com/file/d/1zDRdYHh-ss7ulZdWccobMUbCMmyzntjN/view
DR scripts require a kubelet restart. I think we can close this issue, because the restart is documented.
There are a number of errors in the cluster regarding the API endpoint and auth. ``` Aug 27 19:06:47 control-plane-2 hyperkube[113686]: 2019/08/27 06:30:48 auth: error contacting auth provider (retrying in 10s): discovery through endpoint https://172.30.0.1:443/.well-known/oauth-authorization-server failed: 404 Not Found ``` ``` Aug 27 19:06:07 control-plane-2 hyperkube[113686]: E0827 07:41:45.448582 13528 openshift-tuned.go:686] Get https://172.30.0.1:443/api/v1/nodes/control-plane-2: unexpected EOF ``` Are we still seeing this issue?