Bug 1926731

Summary:	machine-config-daemon pod restarted takes (number of nodes)*10min during upgrading from 4.7-> 4.7
Product:	OpenShift Container Platform	Reporter:	jima
Component:	Machine Config Operator	Assignee:	Antonio Murdaca <amurdaca>
Status:	CLOSED DUPLICATE	QA Contact:	Michael Nguyen <mnguyen>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.7	CC:	kgarriso, yanyang
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-02-10 20:15:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description jima 2021-02-09 11:05:04 UTC

Description of problem:
Fresh install cluster with 4.7.0-0.nightly-2021-02-05-221250, then upgrade to 4.7.0-0.nightly-2021-02-06-084550, we found that it took about 1 hour to update machine-config operator, after deep investigate and found that machine-config-daemon pod will be restarted before updating mc, and each daemon pod takes about 10min to restart in sequence, which reached terminationGracePeriodSeconds(=600) defined in daemonset.

# for i in $(oc get po -o name|grep daemon); do oc logs $i -c machine-config-daemon|head -n1; done
I0208 04:02:11.334575  289968 start.go:108] Version: v4.7.0-202102060108.p0-dirty (0023e696058bbdf6e14504117bfc31f208125c47)
I0208 03:41:47.107426  125554 start.go:108] Version: v4.7.0-202102060108.p0-dirty (0023e696058bbdf6e14504117bfc31f208125c47)
I0208 03:52:01.885962  144491 start.go:108] Version: v4.7.0-202102060108.p0-dirty (0023e696058bbdf6e14504117bfc31f208125c47)
I0208 03:31:35.627618  112590 start.go:108] Version: v4.7.0-202102060108.p0-dirty (0023e696058bbdf6e14504117bfc31f208125c47)
I0208 04:12:21.641158  148829 start.go:108] Version: v4.7.0-202102060108.p0-dirty (0023e696058bbdf6e14504117bfc31f208125c47)

Since we have 3 master + 2 worker, total time used to restart mcd pod is 5*10min. 

Although the upgrade is successful finally, it takes more than 100min.

Then continue to upgrade on this cluster, issue is not reproduced any more. mcd pods restart quickly in less than 2min.

We also tried to update mcd daemonset on fresh installed cluster with 4.7 nightly build, to let mcd pods be restarted, then upgrade to another 4.7 nightly build on this cluster, not hit the issue.


Version-Release number of selected component (if applicable):

How reproducible:
Always when fresh install cluster with 4.7 nightly build, then upgrade to another 4.7 nightly build

Steps to Reproduce:
1. Fresh install upi cluster with 4.7.0-0.nightly-2021-02-05-221250
2. upgrade to 4.7.0-0.nightly-2021-02-06-084550
3. hit the issue, each mcd pod takes 10min for restarting during upgrade

Actual results:
Each mcd pod takes 10min for restarting during upgrade

Expected results:
mcd pod should be restarted quickly

Additional info:
Issue is only reproduced on the fresh installation 4.7 cluster, then upgrade to another 4.7 nightly build.

Comment 2 Kirsten Garrison 2021-02-10 20:15:49 UTC


*** This bug has been marked as a duplicate of bug 1927041 ***