Description of problem: When the kubelet fails to get its certificate renewed, it cannot do any work or make any forward progress. This should cause an alert to be fired as this is anomalous Oct 23 19:18:57 origin-ci-ig-m-428p origin-node[4967]: I1023 19:18:57.337406 4967 certificate_manager.go:287] Rotating certificates Oct 23 19:21:50 origin-ci-ig-m-428p origin-node[4967]: E1023 19:21:50.485640 4967 certificate_manager.go:326] Certificate request was not signed: timed out waiting for the condition Oct 23 19:23:05 origin-ci-ig-m-428p origin-node[4967]: E1023 19:23:05.337508 4967 reflector.go:253] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to watch *v1.Pod: the server has asked for the client to provide credentials (get pods) Oct 23 19:23:08 origin-ci-ig-m-428p origin-node[4967]: F1023 19:23:08.425371 4967 transport.go:106] The currently active client certificate has expired and the server is responsive, exiting. Oct 23 19:23:08 origin-ci-ig-m-428p systemd[1]: origin-node.service: main process exited, code=exited, status=255/n/a Oct 23 19:23:08 origin-ci-ig-m-428p systemd[1]: Unit origin-node.service entered failed state. Oct 23 19:23:08 origin-ci-ig-m-428p systemd[1]: origin-node.service failed.
Reassigning to node team as each component owns its own monitoring. I agree that there should be metrics and alerts around this.
Potential upstream PR: https://github.com/kubernetes/kubernetes/pull/84614
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581