Bug 1858400

Summary:	[Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at idle
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Cloud Compute	Assignee:	Danil Grigorev <dgrigore>
Cloud Compute sub component:	Other Providers	QA Contact:	sunzhaohua <zhsun>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	dgoodwin, kewang, mimccune
Version:	4.5
Target Milestone:	---
Target Release:	4.6.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1858403 (view as bug list)		Environment:
Last Closed:	2020-10-27 16:15:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2020-07-17 20:02:32 UTC

The machine-api-controller components are refreshing their lease more than all other components combined (writes to their election config map by client within a window of time):

      2 system:serviceaccount:openshift-network-operator:default	openshift-multus	cni-binary-copy-script
      2 system:serviceaccount:openshift-network-operator:default	openshift-network-operator	applied-cluster
      2 system:serviceaccount:openshift-network-operator:default	openshift-network-operator	openshift-service-ca
      2 system:serviceaccount:openshift-network-operator:default	openshift-sdn	sdn-config
     18 system:serviceaccount:openshift-machine-config-operator:default	openshift-machine-config-operator	machine-config
     18 system:serviceaccount:openshift-machine-config-operator:machine-config-controller	openshift-machine-config-operator	machine-config-controller
     27 system:serviceaccount:openshift-machine-api:cluster-autoscaler-operator	openshift-machine-api	cluster-autoscaler-operator-leader
     53 system:kube-controller-manager	openshift-kube-controller-manager	cluster-policy-controller
     53 system:serviceaccount:openshift-config-operator:openshift-config-operator	openshift-config-operator	config-operator-lock
     54 system:serviceaccount:openshift-apiserver-operator:openshift-apiserver-operator	openshift-apiserver-operator	openshift-apiserver-operator-lock
     54 system:serviceaccount:openshift-controller-manager-operator:openshift-controller-manager-operator	openshift-controller-manager-operator	openshift-controller-manager-operator-lock
     54 system:serviceaccount:openshift-etcd-operator:etcd-operator	openshift-etcd-operator	openshift-cluster-etcd-operator-lock
     54 system:serviceaccount:openshift-image-registry:cluster-image-registry-operator	openshift-image-registry	openshift-master-controllers
     54 system:serviceaccount:openshift-kube-apiserver-operator:kube-apiserver-operator	openshift-kube-apiserver-operator	kube-apiserver-operator-lock
     54 system:serviceaccount:openshift-kube-apiserver:localhost-recovery-client	openshift-kube-apiserver	cert-regeneration-controller-lock
     54 system:serviceaccount:openshift-kube-controller-manager-operator:kube-controller-manager-operator	openshift-kube-controller-manager-operator	kube-controller-manager-operator-lock
     54 system:serviceaccount:openshift-kube-scheduler-operator:openshift-kube-scheduler-operator	openshift-kube-scheduler-operator	openshift-cluster-kube-scheduler-operator-lock
     54 system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator	openshift-kube-storage-version-migrator-operator	openshift-kube-storage-version-migrator-operator-lock
     54 system:serviceaccount:openshift-service-ca-operator:service-ca-operator	openshift-service-ca-operator	service-ca-operator-lock
    179 system:kube-controller-manager	kube-system	kube-controller-manager
    268 system:kube-scheduler	openshift-kube-scheduler	kube-scheduler
    268 system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator	openshift-cloud-credential-operator	cloud-credential-operator-leader
    268 system:serviceaccount:openshift-machine-api:machine-api-controllers	openshift-machine-api	cluster-api-provider-gcp-leader
    268 system:serviceaccount:openshift-machine-api:machine-api-controllers	openshift-machine-api	cluster-api-provider-healthcheck-leader
    268 system:serviceaccount:openshift-machine-api:machine-api-controllers	openshift-machine-api	cluster-api-provider-nodelink-leader

The machine-api components should have leader election periods closer to machine config controller.

For instance, nodelink-leader is set to 15s leader elect duration in code, it should be closer to 90s.  kube-scheduler and controller-manager are explicitly allowed to have higher intervals because they are required to restart failed pods.  Since the machine API components already run only a single pod, they are mainly using election to prevent administrator error (in force deleting a pod or node), vs needing to have rapid failover between multiple components.

Please ensure the three machine-api components listed here have leader election intervals at 90s, and that after this change the rate of configmap updates from this client (you can check audit log on a cluster) occurs no more frequently than that interval (in case there is a secondary bug).

Comment 1 Devan Goodwin 2020-07-31 18:39:43 UTC

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1858403, in investigating how to solve this for cloud cred operator I think I found this is more complicated than it looks and this issue is possibly not fixed for machine-api. (unless I've made a mistake in my testing)

Comment 2 sunzhaohua 2020-08-04 09:08:46 UTC

Checked the audit log, seems that there is still a gap between machineapi components and machine config controller.
Tested on 4.6.0-0.nightly-2020-08-02-091622

$ grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
177

$ grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
996
$ grep -ir "cluster-api-provider-aws-leader" | wc -l
994
$ grep -ir "cluster-api-provider-nodelink-leader" | wc -l
994

Comment 3 Devan Goodwin 2020-08-04 11:39:48 UTC

Clayton has added a comment on https://bugzilla.redhat.com/show_bug.cgi?id=1858403#c5 for how to fix this properly. I will be pursuing for cloud cred operator this week as well.

Comment 4 Danil Grigorev 2020-08-13 15:12:06 UTC

After some consideration settling on 120/110/90s for each provider.

Comment 5 Devan Goodwin 2020-08-13 15:19:24 UTC

Danil: I suspect this may still not be what you want, controller-runtime does not presently expose the correct way to do this where the lease is released when the leader process stops. As implemented in the PRs here you have likely added a 90s startup delay which will be irritating in development and I believe will also impact installation times.

Correct method Clayton set us onto can be seen in: https://github.com/openshift/cloud-credential-operator/pull/231

Comment 6 Danil Grigorev 2020-08-13 15:43:59 UTC

Makes sense, I agree with you. But we don't mind experiencing this issue, as we currently work through same problem with our MAO deployment. The values 120/110/90s were agreed upon in a slack discussion, which would be ok for us. I like the implementation, and I'm going to transfer it to the controller-runtime later, but you bring a good point. Just for the sake of closing this bug, hoping to avoid possible friction in upstream implementing this.

Comment 7 Devan Goodwin 2020-08-13 17:06:18 UTC

Are you confident this does not push the default installation out 90+ seconds, perhaps during the transition from bootstrap to real control plane?

Comment 8 Devan Goodwin 2020-08-13 17:14:41 UTC

For reference we tried what you're using here and Clayton's response is at https://bugzilla.redhat.com/show_bug.cgi?id=1858403#c5

Comment 9 Joel Speed 2020-08-18 10:00:01 UTC

@Devan, the Machine controllers are only started after the pivot from bootstrap to real control plane as far as I'm aware. None of the Machine API components are used in bootstrapping the control plane machines so we won't be adding any extra delay to installation.

Since we haven't had time to explore the releaseOnCancel and the effects it may have on the system fully yet, we were discussing as a team merging these PRs as is for now, and then creating a new BZ to introduce the releaseOnCancel behaviour once a new release of controller runtime is cut (the option was merged in overnight). Do you think that would be an acceptable approach here?

Comment 13 Michael McCune 2020-08-25 14:22:47 UTC

just wanted to drop an update here, we need to add the extended duration patches to the baremetal, ovirt, and openshift controllers. i am working to propose these changes today.

Comment 14 Michael McCune 2020-08-25 14:46:09 UTC

here are the last patches which should complete this sequence:
https://github.com/openshift/cluster-api-provider-baremetal/pull/100
https://github.com/openshift/cluster-api-provider-ovirt/pull/66
https://github.com/openshift/cluster-api-provider-openstack/pull/114

i am resetting this bz to POST and updating the pull requests.

Comment 16 sunzhaohua 2020-09-01 08:49:48 UTC

verified on gcp, checked audit logs on 3 masters.
4.6.0-0.nightly-2020-08-31-194600
sh-4.4# cd /var/log/kube-apiserver
sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
881
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
102
sh-4.4# grep -ir "cluster-api-provider-gcp-leader" | wc -l
454
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
88

sh-4.4# cd /var/log/kube-apiserver
sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
94
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
14
sh-4.4# grep -ir "cluster-api-provider-gcp-leader" | wc -l
55
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
6

# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
461
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
0
sh-4.4# grep -ir "cluster-api-provider-gcp-leader" | wc -l
22
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
22

Comment 17 sunzhaohua 2020-09-01 10:02:55 UTC

sh-4.4# cd /var/log/kube-apiserver
sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
601
Verified on azure
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
18
sh-4.4# grep -ir "cluster-api-provider-azure-leader" | wc -l
330
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
44

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
319
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
66
sh-4.4# grep -ir "cluster-api-provider-azure-leader" | wc -l
32
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
50

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
341
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
24
sh-4.4# grep -ir "cluster-api-provider-azure-leader" | wc -l
30
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
18

Comment 18 sunzhaohua 2020-09-02 03:26:17 UTC

Verified on aws
clusterversion: 4.6.0-0.nightly-2020-09-01-205915
sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
488
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
20
sh-4.4# grep -ir "cluster-api-provider-aws-leader" | wc -l
68
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
2

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
364
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
40
sh-4.4# grep -ir "cluster-api-provider-aws-leader" | wc -l
171
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
60

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
0
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
12
sh-4.4# grep -ir "cluster-api-provider-aws-leader" | wc -l
56
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
8

Comment 19 sunzhaohua 2020-09-07 06:20:27 UTC

Verified on osp
clusterverision: 4.6.0-0.nightly-2020-09-05-015624

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
918
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
33
sh-4.4# grep -ir "cluster-api-provider-openstack-leader" | wc -l
113
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
35

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
0
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
9
sh-4.4# grep -ir "cluster-api-provider-openstack-leader" | wc -l
1342
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
238

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
2123
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
327
sh-4.4# grep -ir "cluster-api-provider-openstack-leader" | wc -l
54
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
86

Verified on vsphere
clusterverision: 4.6.0-0.nightly-2020-09-05-015624
sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
126
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
76
sh-4.4# grep -ir "cluster-api-provider-vsphere-leader" | wc -l
83
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
85

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
2067
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
84
sh-4.4# grep -ir "cluster-api-provider-vsphere-leader" | wc -l
80
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
84

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
424
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
176
sh-4.4# grep -ir "cluster-api-provider-vsphere-leader" | wc -l
174
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
177

Comment 21 errata-xmlrpc 2020-10-27 16:15:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196