Bug 1858400 - [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at idle
Summary: [Performance] Lease refresh period for machine-api-controllers is too high, c...
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Danil Grigorev
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-17 20:02 UTC by Clayton Coleman
Modified: 2020-09-07 06:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1858403 (view as bug list)
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-api-provider-aws pull 339 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:28:59 UTC
Github openshift cluster-api-provider-aws pull 345 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:28:59 UTC
Github openshift cluster-api-provider-azure pull 152 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:28:59 UTC
Github openshift cluster-api-provider-azure pull 157 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:29:03 UTC
Github openshift cluster-api-provider-baremetal pull 100 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:28:58 UTC
Github openshift cluster-api-provider-baremetal pull 88 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:28:58 UTC
Github openshift cluster-api-provider-gcp pull 104 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:28:59 UTC
Github openshift cluster-api-provider-gcp pull 115 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:29:02 UTC
Github openshift cluster-api-provider-openstack pull 109 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:28:57 UTC
Github openshift cluster-api-provider-openstack pull 114 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:28:57 UTC
Github openshift cluster-api-provider-ovirt pull 56 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:28:57 UTC
Github openshift cluster-api-provider-ovirt pull 66 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:28:57 UTC
Github openshift machine-api-operator pull 649 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:28:56 UTC
Github openshift machine-api-operator pull 675 None closed BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at ... 2020-09-25 07:29:00 UTC

Description Clayton Coleman 2020-07-17 20:02:32 UTC
The machine-api-controller components are refreshing their lease more than all other components combined (writes to their election config map by client within a window of time):

      2 system:serviceaccount:openshift-network-operator:default	openshift-multus	cni-binary-copy-script
      2 system:serviceaccount:openshift-network-operator:default	openshift-network-operator	applied-cluster
      2 system:serviceaccount:openshift-network-operator:default	openshift-network-operator	openshift-service-ca
      2 system:serviceaccount:openshift-network-operator:default	openshift-sdn	sdn-config
     18 system:serviceaccount:openshift-machine-config-operator:default	openshift-machine-config-operator	machine-config
     18 system:serviceaccount:openshift-machine-config-operator:machine-config-controller	openshift-machine-config-operator	machine-config-controller
     27 system:serviceaccount:openshift-machine-api:cluster-autoscaler-operator	openshift-machine-api	cluster-autoscaler-operator-leader
     53 system:kube-controller-manager	openshift-kube-controller-manager	cluster-policy-controller
     53 system:serviceaccount:openshift-config-operator:openshift-config-operator	openshift-config-operator	config-operator-lock
     54 system:serviceaccount:openshift-apiserver-operator:openshift-apiserver-operator	openshift-apiserver-operator	openshift-apiserver-operator-lock
     54 system:serviceaccount:openshift-controller-manager-operator:openshift-controller-manager-operator	openshift-controller-manager-operator	openshift-controller-manager-operator-lock
     54 system:serviceaccount:openshift-etcd-operator:etcd-operator	openshift-etcd-operator	openshift-cluster-etcd-operator-lock
     54 system:serviceaccount:openshift-image-registry:cluster-image-registry-operator	openshift-image-registry	openshift-master-controllers
     54 system:serviceaccount:openshift-kube-apiserver-operator:kube-apiserver-operator	openshift-kube-apiserver-operator	kube-apiserver-operator-lock
     54 system:serviceaccount:openshift-kube-apiserver:localhost-recovery-client	openshift-kube-apiserver	cert-regeneration-controller-lock
     54 system:serviceaccount:openshift-kube-controller-manager-operator:kube-controller-manager-operator	openshift-kube-controller-manager-operator	kube-controller-manager-operator-lock
     54 system:serviceaccount:openshift-kube-scheduler-operator:openshift-kube-scheduler-operator	openshift-kube-scheduler-operator	openshift-cluster-kube-scheduler-operator-lock
     54 system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator	openshift-kube-storage-version-migrator-operator	openshift-kube-storage-version-migrator-operator-lock
     54 system:serviceaccount:openshift-service-ca-operator:service-ca-operator	openshift-service-ca-operator	service-ca-operator-lock
    179 system:kube-controller-manager	kube-system	kube-controller-manager
    268 system:kube-scheduler	openshift-kube-scheduler	kube-scheduler
    268 system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator	openshift-cloud-credential-operator	cloud-credential-operator-leader
    268 system:serviceaccount:openshift-machine-api:machine-api-controllers	openshift-machine-api	cluster-api-provider-gcp-leader
    268 system:serviceaccount:openshift-machine-api:machine-api-controllers	openshift-machine-api	cluster-api-provider-healthcheck-leader
    268 system:serviceaccount:openshift-machine-api:machine-api-controllers	openshift-machine-api	cluster-api-provider-nodelink-leader

The machine-api components should have leader election periods closer to machine config controller.

For instance, nodelink-leader is set to 15s leader elect duration in code, it should be closer to 90s.  kube-scheduler and controller-manager are explicitly allowed to have higher intervals because they are required to restart failed pods.  Since the machine API components already run only a single pod, they are mainly using election to prevent administrator error (in force deleting a pod or node), vs needing to have rapid failover between multiple components.

Please ensure the three machine-api components listed here have leader election intervals at 90s, and that after this change the rate of configmap updates from this client (you can check audit log on a cluster) occurs no more frequently than that interval (in case there is a secondary bug).

Comment 1 Devan Goodwin 2020-07-31 18:39:43 UTC
Please see https://bugzilla.redhat.com/show_bug.cgi?id=1858403, in investigating how to solve this for cloud cred operator I think I found this is more complicated than it looks and this issue is possibly not fixed for machine-api. (unless I've made a mistake in my testing)

Comment 2 sunzhaohua 2020-08-04 09:08:46 UTC
Checked the audit log, seems that there is still a gap between machineapi components and machine config controller.
Tested on 4.6.0-0.nightly-2020-08-02-091622

$ grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
177

$ grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
996
$ grep -ir "cluster-api-provider-aws-leader" | wc -l
994
$ grep -ir "cluster-api-provider-nodelink-leader" | wc -l
994

Comment 3 Devan Goodwin 2020-08-04 11:39:48 UTC
Clayton has added a comment on https://bugzilla.redhat.com/show_bug.cgi?id=1858403#c5 for how to fix this properly. I will be pursuing for cloud cred operator this week as well.

Comment 4 Danil Grigorev 2020-08-13 15:12:06 UTC
After some consideration settling on 120/110/90s for each provider.

Comment 5 Devan Goodwin 2020-08-13 15:19:24 UTC
Danil: I suspect this may still not be what you want, controller-runtime does not presently expose the correct way to do this where the lease is released when the leader process stops. As implemented in the PRs here you have likely added a 90s startup delay which will be irritating in development and I believe will also impact installation times.

Correct method Clayton set us onto can be seen in: https://github.com/openshift/cloud-credential-operator/pull/231

Comment 6 Danil Grigorev 2020-08-13 15:43:59 UTC
Makes sense, I agree with you. But we don't mind experiencing this issue, as we currently work through same problem with our MAO deployment. The values 120/110/90s were agreed upon in a slack discussion, which would be ok for us. I like the implementation, and I'm going to transfer it to the controller-runtime later, but you bring a good point. Just for the sake of closing this bug, hoping to avoid possible friction in upstream implementing this.

Comment 7 Devan Goodwin 2020-08-13 17:06:18 UTC
Are you confident this does not push the default installation out 90+ seconds, perhaps during the transition from bootstrap to real control plane?

Comment 8 Devan Goodwin 2020-08-13 17:14:41 UTC
For reference we tried what you're using here and Clayton's response is at https://bugzilla.redhat.com/show_bug.cgi?id=1858403#c5

Comment 9 Joel Speed 2020-08-18 10:00:01 UTC
@Devan, the Machine controllers are only started after the pivot from bootstrap to real control plane as far as I'm aware. None of the Machine API components are used in bootstrapping the control plane machines so we won't be adding any extra delay to installation.

Since we haven't had time to explore the releaseOnCancel and the effects it may have on the system fully yet, we were discussing as a team merging these PRs as is for now, and then creating a new BZ to introduce the releaseOnCancel behaviour once a new release of controller runtime is cut (the option was merged in overnight). Do you think that would be an acceptable approach here?

Comment 13 Michael McCune 2020-08-25 14:22:47 UTC
just wanted to drop an update here, we need to add the extended duration patches to the baremetal, ovirt, and openshift controllers. i am working to propose these changes today.

Comment 14 Michael McCune 2020-08-25 14:46:09 UTC
here are the last patches which should complete this sequence:
https://github.com/openshift/cluster-api-provider-baremetal/pull/100
https://github.com/openshift/cluster-api-provider-ovirt/pull/66
https://github.com/openshift/cluster-api-provider-openstack/pull/114

i am resetting this bz to POST and updating the pull requests.

Comment 16 sunzhaohua 2020-09-01 08:49:48 UTC
verified on gcp, checked audit logs on 3 masters.
4.6.0-0.nightly-2020-08-31-194600
sh-4.4# cd /var/log/kube-apiserver
sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
881
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
102
sh-4.4# grep -ir "cluster-api-provider-gcp-leader" | wc -l
454
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
88

sh-4.4# cd /var/log/kube-apiserver
sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
94
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
14
sh-4.4# grep -ir "cluster-api-provider-gcp-leader" | wc -l
55
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
6

# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
461
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
0
sh-4.4# grep -ir "cluster-api-provider-gcp-leader" | wc -l
22
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
22

Comment 17 sunzhaohua 2020-09-01 10:02:55 UTC
sh-4.4# cd /var/log/kube-apiserver
sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
601
Verified on azure
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
18
sh-4.4# grep -ir "cluster-api-provider-azure-leader" | wc -l
330
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
44

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
319
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
66
sh-4.4# grep -ir "cluster-api-provider-azure-leader" | wc -l
32
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
50

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
341
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
24
sh-4.4# grep -ir "cluster-api-provider-azure-leader" | wc -l
30
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
18

Comment 18 sunzhaohua 2020-09-02 03:26:17 UTC
Verified on aws
clusterversion: 4.6.0-0.nightly-2020-09-01-205915
sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
488
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
20
sh-4.4# grep -ir "cluster-api-provider-aws-leader" | wc -l
68
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
2

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
364
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
40
sh-4.4# grep -ir "cluster-api-provider-aws-leader" | wc -l
171
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
60

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
0
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
12
sh-4.4# grep -ir "cluster-api-provider-aws-leader" | wc -l
56
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
8

Comment 19 sunzhaohua 2020-09-07 06:20:27 UTC
Verified on osp
clusterverision: 4.6.0-0.nightly-2020-09-05-015624

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
918
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
33
sh-4.4# grep -ir "cluster-api-provider-openstack-leader" | wc -l
113
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
35

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
0
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
9
sh-4.4# grep -ir "cluster-api-provider-openstack-leader" | wc -l
1342
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
238

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
2123
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
327
sh-4.4# grep -ir "cluster-api-provider-openstack-leader" | wc -l
54
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
86

Verified on vsphere
clusterverision: 4.6.0-0.nightly-2020-09-05-015624
sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
126
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
76
sh-4.4# grep -ir "cluster-api-provider-vsphere-leader" | wc -l
83
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
85

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
2067
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
84
sh-4.4# grep -ir "cluster-api-provider-vsphere-leader" | wc -l
80
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
84

sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l
424
sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l
176
sh-4.4# grep -ir "cluster-api-provider-vsphere-leader" | wc -l
174
sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l
177


Note You need to log in before you can comment on or make changes to this bug.