Bug 1858400
| Summary: | [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at idle | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> | |
| Component: | Cloud Compute | Assignee: | Danil Grigorev <dgrigore> | |
| Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | high | |||
| Priority: | unspecified | CC: | dgoodwin, kewang, mimccune | |
| Version: | 4.5 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.6.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1858403 (view as bug list) | Environment: | ||
| Last Closed: | 2020-10-27 16:15:58 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
Please see https://bugzilla.redhat.com/show_bug.cgi?id=1858403, in investigating how to solve this for cloud cred operator I think I found this is more complicated than it looks and this issue is possibly not fixed for machine-api. (unless I've made a mistake in my testing) Checked the audit log, seems that there is still a gap between machineapi components and machine config controller. Tested on 4.6.0-0.nightly-2020-08-02-091622 $ grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 177 $ grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 996 $ grep -ir "cluster-api-provider-aws-leader" | wc -l 994 $ grep -ir "cluster-api-provider-nodelink-leader" | wc -l 994 Clayton has added a comment on https://bugzilla.redhat.com/show_bug.cgi?id=1858403#c5 for how to fix this properly. I will be pursuing for cloud cred operator this week as well. After some consideration settling on 120/110/90s for each provider. Danil: I suspect this may still not be what you want, controller-runtime does not presently expose the correct way to do this where the lease is released when the leader process stops. As implemented in the PRs here you have likely added a 90s startup delay which will be irritating in development and I believe will also impact installation times. Correct method Clayton set us onto can be seen in: https://github.com/openshift/cloud-credential-operator/pull/231 Makes sense, I agree with you. But we don't mind experiencing this issue, as we currently work through same problem with our MAO deployment. The values 120/110/90s were agreed upon in a slack discussion, which would be ok for us. I like the implementation, and I'm going to transfer it to the controller-runtime later, but you bring a good point. Just for the sake of closing this bug, hoping to avoid possible friction in upstream implementing this. Are you confident this does not push the default installation out 90+ seconds, perhaps during the transition from bootstrap to real control plane? For reference we tried what you're using here and Clayton's response is at https://bugzilla.redhat.com/show_bug.cgi?id=1858403#c5 @Devan, the Machine controllers are only started after the pivot from bootstrap to real control plane as far as I'm aware. None of the Machine API components are used in bootstrapping the control plane machines so we won't be adding any extra delay to installation. Since we haven't had time to explore the releaseOnCancel and the effects it may have on the system fully yet, we were discussing as a team merging these PRs as is for now, and then creating a new BZ to introduce the releaseOnCancel behaviour once a new release of controller runtime is cut (the option was merged in overnight). Do you think that would be an acceptable approach here? just wanted to drop an update here, we need to add the extended duration patches to the baremetal, ovirt, and openshift controllers. i am working to propose these changes today. here are the last patches which should complete this sequence: https://github.com/openshift/cluster-api-provider-baremetal/pull/100 https://github.com/openshift/cluster-api-provider-ovirt/pull/66 https://github.com/openshift/cluster-api-provider-openstack/pull/114 i am resetting this bz to POST and updating the pull requests. verified on gcp, checked audit logs on 3 masters. 4.6.0-0.nightly-2020-08-31-194600 sh-4.4# cd /var/log/kube-apiserver sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 881 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 102 sh-4.4# grep -ir "cluster-api-provider-gcp-leader" | wc -l 454 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 88 sh-4.4# cd /var/log/kube-apiserver sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 94 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 14 sh-4.4# grep -ir "cluster-api-provider-gcp-leader" | wc -l 55 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 6 # grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 461 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 0 sh-4.4# grep -ir "cluster-api-provider-gcp-leader" | wc -l 22 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 22 sh-4.4# cd /var/log/kube-apiserver sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 601 Verified on azure sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 18 sh-4.4# grep -ir "cluster-api-provider-azure-leader" | wc -l 330 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 44 sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 319 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 66 sh-4.4# grep -ir "cluster-api-provider-azure-leader" | wc -l 32 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 50 sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 341 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 24 sh-4.4# grep -ir "cluster-api-provider-azure-leader" | wc -l 30 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 18 Verified on aws clusterversion: 4.6.0-0.nightly-2020-09-01-205915 sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 488 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 20 sh-4.4# grep -ir "cluster-api-provider-aws-leader" | wc -l 68 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 2 sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 364 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 40 sh-4.4# grep -ir "cluster-api-provider-aws-leader" | wc -l 171 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 60 sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 0 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 12 sh-4.4# grep -ir "cluster-api-provider-aws-leader" | wc -l 56 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 8 Verified on osp clusterverision: 4.6.0-0.nightly-2020-09-05-015624 sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 918 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 33 sh-4.4# grep -ir "cluster-api-provider-openstack-leader" | wc -l 113 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 35 sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 0 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 9 sh-4.4# grep -ir "cluster-api-provider-openstack-leader" | wc -l 1342 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 238 sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 2123 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 327 sh-4.4# grep -ir "cluster-api-provider-openstack-leader" | wc -l 54 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 86 Verified on vsphere clusterverision: 4.6.0-0.nightly-2020-09-05-015624 sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 126 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 76 sh-4.4# grep -ir "cluster-api-provider-vsphere-leader" | wc -l 83 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 85 sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 2067 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 84 sh-4.4# grep -ir "cluster-api-provider-vsphere-leader" | wc -l 80 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 84 sh-4.4# grep -ir "system:serviceaccount:openshift-machine-config-operator:machine-config-controller" | wc -l 424 sh-4.4# grep -ir "cluster-api-provider-healthcheck-leader" | wc -l 176 sh-4.4# grep -ir "cluster-api-provider-vsphere-leader" | wc -l 174 sh-4.4# grep -ir "cluster-api-provider-nodelink-leader" | wc -l 177 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |
The machine-api-controller components are refreshing their lease more than all other components combined (writes to their election config map by client within a window of time): 2 system:serviceaccount:openshift-network-operator:default openshift-multus cni-binary-copy-script 2 system:serviceaccount:openshift-network-operator:default openshift-network-operator applied-cluster 2 system:serviceaccount:openshift-network-operator:default openshift-network-operator openshift-service-ca 2 system:serviceaccount:openshift-network-operator:default openshift-sdn sdn-config 18 system:serviceaccount:openshift-machine-config-operator:default openshift-machine-config-operator machine-config 18 system:serviceaccount:openshift-machine-config-operator:machine-config-controller openshift-machine-config-operator machine-config-controller 27 system:serviceaccount:openshift-machine-api:cluster-autoscaler-operator openshift-machine-api cluster-autoscaler-operator-leader 53 system:kube-controller-manager openshift-kube-controller-manager cluster-policy-controller 53 system:serviceaccount:openshift-config-operator:openshift-config-operator openshift-config-operator config-operator-lock 54 system:serviceaccount:openshift-apiserver-operator:openshift-apiserver-operator openshift-apiserver-operator openshift-apiserver-operator-lock 54 system:serviceaccount:openshift-controller-manager-operator:openshift-controller-manager-operator openshift-controller-manager-operator openshift-controller-manager-operator-lock 54 system:serviceaccount:openshift-etcd-operator:etcd-operator openshift-etcd-operator openshift-cluster-etcd-operator-lock 54 system:serviceaccount:openshift-image-registry:cluster-image-registry-operator openshift-image-registry openshift-master-controllers 54 system:serviceaccount:openshift-kube-apiserver-operator:kube-apiserver-operator openshift-kube-apiserver-operator kube-apiserver-operator-lock 54 system:serviceaccount:openshift-kube-apiserver:localhost-recovery-client openshift-kube-apiserver cert-regeneration-controller-lock 54 system:serviceaccount:openshift-kube-controller-manager-operator:kube-controller-manager-operator openshift-kube-controller-manager-operator kube-controller-manager-operator-lock 54 system:serviceaccount:openshift-kube-scheduler-operator:openshift-kube-scheduler-operator openshift-kube-scheduler-operator openshift-cluster-kube-scheduler-operator-lock 54 system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator openshift-kube-storage-version-migrator-operator openshift-kube-storage-version-migrator-operator-lock 54 system:serviceaccount:openshift-service-ca-operator:service-ca-operator openshift-service-ca-operator service-ca-operator-lock 179 system:kube-controller-manager kube-system kube-controller-manager 268 system:kube-scheduler openshift-kube-scheduler kube-scheduler 268 system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator openshift-cloud-credential-operator cloud-credential-operator-leader 268 system:serviceaccount:openshift-machine-api:machine-api-controllers openshift-machine-api cluster-api-provider-gcp-leader 268 system:serviceaccount:openshift-machine-api:machine-api-controllers openshift-machine-api cluster-api-provider-healthcheck-leader 268 system:serviceaccount:openshift-machine-api:machine-api-controllers openshift-machine-api cluster-api-provider-nodelink-leader The machine-api components should have leader election periods closer to machine config controller. For instance, nodelink-leader is set to 15s leader elect duration in code, it should be closer to 90s. kube-scheduler and controller-manager are explicitly allowed to have higher intervals because they are required to restart failed pods. Since the machine API components already run only a single pod, they are mainly using election to prevent administrator error (in force deleting a pod or node), vs needing to have rapid failover between multiple components. Please ensure the three machine-api components listed here have leader election intervals at 90s, and that after this change the rate of configmap updates from this client (you can check audit log on a cluster) occurs no more frequently than that interval (in case there is a secondary bug).