Description of problem: MCO hotloops on creating CSRs after the cluster has been shutdown for 25 h and in i process of recovery. $ LANG=en date && oc get csr | grep system:serviceaccount:openshift-machine-config-operator:node-bootstrapper | wc -l Fri Apr 3 10:51:01 CEST 2020 3404 $ LANG=en date && oc get csr | grep system:serviceaccount:openshift-machine-config-operator:node-bootstrapper | wc -l Fri Apr 3 10:52:17 CEST 2020 3414 Version-Release number of selected component (if applicable): 4.4.0-0.nightly-2020-04-01-141451 How reproducible: Steps to Reproduce: 1. shutdown the cluster for 25 h, or ping tnozicka (I may still have the one that's broken) Actual results: thousands of CSRs, new ones at rate about 10 per minute Expected results: Only 1 CSR is created and it stays Pending until the admin approves it. Additional info:
Ryan, has something changed here?
is kubelet or something else using the same SA? the machine-config-operator pod is dead when I looked on the node with crictl
(In reply to Tomáš Nožička from comment #2) > is kubelet or something else using the same SA? the machine-config-operator > pod is dead when I looked on the node with crictl can you grab must-gather meanwhile, it'll help whoever will debug this.
I can't, must-gather requires running pods. Also pod logs are not working without valid certs.
kubelet was restarting because of another bug (being fixed now) and creating new CSR every time, although it had one already pending. Given this comes from upstream and with the fatal bug now being fixed I am lowering the severity and sending it to Node team to decide if they want to pursue, close or convert to Jira card.
Fixed via https://github.com/openshift/origin/pull/24801 and BZ 1818961 *** This bug has been marked as a duplicate of bug 1818961 ***