Description of problem: kube-controller-manager leader election lease duration is set to 15 seconds which is causing it to go through leader elections and restart during the kube-apiserver rollout which currently takes around 60 seconds with shutdown-delay-duration and gracefulTerminationDuration is now set to 0 and 15 seconds ( https://github.com/openshift/cluster-kube-apiserver-operator/pull/1168 and https://github.com/openshift/library-go/pull/1104 ). kube-controller-manager leader election timeout should be set to > 60 seconds to handle the downtime gracefully in SNO. Recommended lease duration values to be considered for reference as noted in https://github.com/openshift/enhancements/pull/832/files#diff-2e28754e69aa417e5b6d89e99e42f05bfb6330800fa823753383db1d170fbc2fR183: LeaseDuration=137s, RenewDealine=107s, RetryPeriod=26s. These are the configurable values in k8s.io/client-go based leases and controller-runtime exposes them. This gives us 1. clock skew tolerance == 30s 2. kube-apiserver downtime tolerance == 78s 3. worst non-graceful lease reacquisition == 163s 4. worst graceful lease reacquisition == 26s We can see that leader lease failures in the log: http://dell-r510-01.perf.lab.eng.rdu2.redhat.com/chaos/sno/kube-controller-manager/kube-controller-manager-pod.log and that maps to the restarts: http://dell-r510-01.perf.lab.eng.rdu2.redhat.com/chaos/sno/kube-controller-manager/cerberus_cluster_state.log. We can disable the leader election as well given that there's no HA. Version-Release number of selected component (if applicable): 4.9.0-0.nightly-2021-08-01-132055 How reproducible: Always Steps to Reproduce: 1. Install a SNO cluster using the latest nightly payload. 2. Trigger kube-apiserver rollout or outage which lasts for at least 60 seconds ( kube-apiserver rollout on a cluster built using payload with https://github.com/openshift/cluster-kube-apiserver-operator/pull/1168 should take ~60 seconds ) - $ oc patch kubeapiserver/cluster --type merge -p '{"spec":{"forceRedeploymentReason":"ITERATIONX"}}' where X can be 1,2...n 3. Observe the state of kube-controller-manager. Actual results: kube-controller-manager goes through leader election and restarts. Expected results: kube-controller-manager should handle the API rollout/outage gracefully. Additional info: Logs including must-gather: http://dell-r510-01.perf.lab.eng.rdu2.redhat.com/chaos/sno/kube-controller-manager/
Mike this is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1984608
Any update on this? Thanks.
Hi, yes I was just talking about this with David today. From his comment on the related PR: > This one in particular we should not change until graceful release becomes a thing for the KCM. > > Without graceful release, the average case of leader changes is directly related to the lease duration. Having a tight loop there is expensive, but the cost of waiting for two minutes for every normal release is too high to be reasonable. > > The KCM is special in the sense that it is difficult to gracefully release and having a leader is critical to the smooth running of the cluster. > > https://github.com/kubernetes/kubernetes/pull/101379 and https://github.com/kubernetes/kubernetes/pull/101125 are in progress to make the process gracefully release > > /hold until the KCM can gracefully release the lease. So while we have the updated values and it is possible to make the change, this is dependent on the PRs he mentioned merging upstream to allow graceful lock release in KCM (ref: https://issues.redhat.com/browse/WRKLDS-261) If this bug is a critical blocker for this release, we will need to evaluate options such as carry patches for the upstream changes in order to unblock it.
This was fixed in https://github.com/openshift/library-go/pull/1242 and picked into kcm-o in https://github.com/openshift/cluster-kube-controller-manager-operator/pull/581
still could reproduce the issue now : [root@localhost roottest]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-arm64-2022-01-29-162645 True False 37m Cluster version is 4.10.0-0.nightly-arm64-2022-01-29-162645 kubeapiserver.operator.openshift.io/cluster patched [root@localhost roottest]# oc get co kube-apiserver -w NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.10.0-0.nightly-arm64-2022-01-29-162645 True False False 45m kube-apiserver 4.10.0-0.nightly-arm64-2022-01-29-162645 True False False 45m kube-apiserver 4.10.0-0.nightly-arm64-2022-01-29-162645 True False False 45m kube-apiserver 4.10.0-0.nightly-arm64-2022-01-29-162645 True True False 45m NodeInstallerProgressing: 1 nodes are at revision 10; 0 nodes have achieved new revision 11 I0130 06:00:17.860585 1 garbagecollector.go:580] "Deleting object" object="openshift-operator-lifecycle-manager/collect-profiles-27391995-6dhnb" objectUID=0e42ac53-9bc0-422c-859e-da3b273a94b9 kind="Pod" propagationPolicy=Background E0130 06:02:30.140259 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: Get "https://api-int.yinzhou-bugr.qe.devcluster.openshift.com:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 10.0.220.57:6443: connect: connection refused E0130 06:02:33.077150 1 resource_quota_controller.go:413] failed to discover resources: Get "https://api-int.yinzhou-bugr.qe.devcluster.openshift.com:6443/api": dial tcp 10.0.220.57:6443: connect: connection refused W0130 06:02:33.080104 1 garbagecollector.go:709] failed to discover preferred resources: Get "https://api-int.yinzhou-bugr.qe.devcluster.openshift.com:6443/api": dial tcp 10.0.166.160:6443: connect: connection refused I0130 06:02:33.080127 1 garbagecollector.go:181] no resources reported by discovery, skipping garbage collector sync E0130 06:02:33.147759 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: Get "https://api-int.yinzhou-bugr.qe.devcluster.openshift.com:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 10.0.153.179:6443: connect: connection refused E0130 06:02:36.146760 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: Get "https://api-int.yinzhou-bugr.qe.devcluster.openshift.com:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 10.0.220.57:6443: connect: connection refused E0130 06:02:39.148401 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: Get "https://api-int.yinzhou-bugr.qe.devcluster.openshift.com:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 10.0.166.160:6443: connect: connection refused I0130 06:02:40.135068 1 leaderelection.go:283] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition F0130 06:02:40.135155 1 controllermanager.go:330] leaderelection lost [root@localhost roottest]# oc logs -f po/kube-controller-manager-ip-10-0-153-252.us-east-2.compute.internal + timeout 3m /bin/bash -exuo pipefail -c 'while [ -n "$(ss -Htanop \( sport = 10257 \))" ]; do sleep 1; done' ++ ss -Htanop '(' sport = 10257 ')' ...... I0130 06:03:17.790530 1 flags.go:64] FLAG: --add-dir-header="false" I0130 06:03:17.790622 1 flags.go:64] FLAG: --address="127.0.0.1" I0130 06:03:17.790629 1 flags.go:64] FLAG: --allocate-node-cidrs="false" I0130 06:03:17.790634 1 flags.go:64] FLAG: --allow-metric-labels="[]"
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.
If this is still an issue please report the bug against https://issues.redhat.com/browse/OCPBUGS