1820508 – MCO creates thousands of CSRs overnight

Bug 1820508 - MCO creates thousands of CSRs overnight

Summary: MCO creates thousands of CSRs overnight

Keywords:
Status:	CLOSED DUPLICATE of bug 1818961
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Ryan Phillips
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-03 08:57 UTC by Tomáš Nožička
Modified:	2020-04-07 18:48 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-07 18:48:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Tomáš Nožička 2020-04-03 08:57:01 UTC

Description of problem:
MCO hotloops on creating CSRs after the cluster has been shutdown for 25 h and in i process of recovery.

$ LANG=en date && oc get csr | grep system:serviceaccount:openshift-machine-config-operator:node-bootstrapper | wc -l
Fri Apr  3 10:51:01 CEST 2020
3404

$ LANG=en date && oc get csr | grep system:serviceaccount:openshift-machine-config-operator:node-bootstrapper | wc -l
Fri Apr  3 10:52:17 CEST 2020
3414



Version-Release number of selected component (if applicable):
4.4.0-0.nightly-2020-04-01-141451


How reproducible:

Steps to Reproduce:
1. shutdown the cluster for 25 h, or ping tnozicka (I may still have the one that's broken)


Actual results:
thousands of CSRs, new ones at rate about 10 per minute


Expected results:
Only 1 CSR is created and it stays Pending until the admin approves it.


Additional info:

Comment 1 Antonio Murdaca 2020-04-03 09:00:14 UTC

Ryan, has something changed here?

Comment 2 Tomáš Nožička 2020-04-03 09:03:14 UTC

is kubelet or something else using the same SA? the machine-config-operator pod is dead when I looked on the node with crictl

Comment 3 Antonio Murdaca 2020-04-03 09:11:03 UTC

(In reply to Tomáš Nožička from comment #2)
> is kubelet or something else using the same SA? the machine-config-operator
> pod is dead when I looked on the node with crictl

can you grab must-gather meanwhile, it'll help whoever will debug this.

Comment 4 Tomáš Nožička 2020-04-03 09:44:33 UTC

I can't, must-gather requires running pods. Also pod logs are not working without valid certs.

Comment 5 Tomáš Nožička 2020-04-06 15:34:21 UTC

kubelet was restarting because of another bug (being fixed now) and creating new CSR every time, although it had one already pending. Given this comes from upstream and with the fatal bug now being fixed I am lowering the severity and sending it to Node team to decide if they want to pursue, close or convert to Jira card.

Comment 6 Ryan Phillips 2020-04-07 18:48:32 UTC

Fixed via https://github.com/openshift/origin/pull/24801 and BZ 1818961

*** This bug has been marked as a duplicate of bug 1818961 ***

Note You need to log in before you can comment on or make changes to this bug.