Bug 2041554 - use lease for leader election
Summary: use lease for leader election
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Emily Moss
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On: 2037856 2042501
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-17 17:10 UTC by Sergiusz Urbaniak
Modified: 2022-05-03 00:25 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2037856
Environment:
Last Closed: 2022-03-10 16:40:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-authentication-operator pull 537 0 None Merged bug 2042038: bump library go 2022-01-21 06:40:47 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:40:32 UTC

Comment 1 Sergiusz Urbaniak 2022-01-21 06:40:23 UTC
The etcd-operator PR was errournously linked here. It should be https://github.com/openshift/cluster-authentication-operator/pull/537 instead.

Comment 3 Xingxing Xia 2022-01-30 04:58:17 UTC
Though two many links to read in above, went through them to understand. The main links to understand are:
https://github.com/kubernetes/kubernetes/pull/106852
https://github.com/kubernetes/kubernetes/issues/107454
Checked the library-go, found the lib PR is: https://github.com/openshift/library-go/pull/1282 . Read its code, the only difference is: leaderelection.go now switches to return ConfigMapsLeasesResourceLock instead of ConfigMapsResourceLock . Checked latest 4.10.0-0.nightly-2022-01-29-015515 :
$ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-01-29-015515 | grep authentication-operator
  cluster-authentication-operator  https://github.com/openshift/cluster-authentication-operator 4770445...
Then checked the CAO repo of this bug's PR:
$ cd /path/to/github.com/openshift/cluster-authentication-operator
$ git pull
$ git checkout -b 4.10.0-0.nightly-2022-01-29-015515 477044
$ vi vendor/github.com/openshift/library-go/pkg/config/leaderelection/leaderelection.go
...
        rl, err := resourcelock.New(
                resourcelock.ConfigMapsLeasesResourceLock,
...

This means the PR indeed has landed into 4.10 payloads.

Then checked the definition and use of ConfigMapsLeasesResourceLock, it is in: https://github.com/openshift/cluster-authentication-operator/blob/4770445/vendor/k8s.io/client-go/tools/leaderelection/resourcelock/interface.go#L137-L140 :
	case ConfigMapsLeasesResourceLock:
		return &MultiLock{
			Primary:   configmapLock,
			Secondary: leaseLock,
This means 4.10 indeed both use old configmap-based election and new lease-baded election, proving Dev's plan in https://github.com/kubernetes/kubernetes/issues/107454 for 4.10, i.e. "version x+1". Further check from openshift-authentication-operator pod logs:
$ oc get cm -n openshift-authentication-operator | grep lock
cluster-authentication-operator-lock   0      25h
$ oc get lease -n openshift-authentication-operator | grep lock
cluster-authentication-operator-lock   authentication-operator-84bd79899c-sh9lf_baf2761e-f0cd-4f1c-a4a5-c67e3788e45d   25h
$ oc get lease -n openshift-authentication-operator cluster-authentication-operator-lock -o yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
...
spec:
  acquireTime: "2022-01-30T04:01:50.000000Z"
  holderIdentity: authentication-operator-84bd79899c-sh9lf_baf2761e-f0cd-4f1c-a4a5-c67e3788e45d
  leaseDurationSeconds: 137
  leaseTransitions: 2
  renewTime: "2022-01-30T04:42:46.623458Z"

There are both configmap and lease locks.

$ oc patch authentication.operator/cluster --type=merge -p="
spec:
  operatorLogLevel: TraceAll
"
Then check openshift-authentication-operator pod logs: delete openshift-authentication-operator pod, wait for the new pod to be created, check pod logs, there are:
2022-01-30T04:01:51.031563107Z I0130 04:01:51.031115       1 leaderelection.go:258] successfully acquired lease openshift-authentication-operator/cluster-authentication-operator-lock
2022-01-30T04:01:51.039809515Z I0130 04:01:51.033340       1 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-authentication-operator", Name:"cluster-authentication-operator-lock", UID:"7d02b348-61f8-4410-b1b7-d846493e8526", APIVersion:"v1", ResourceVersion:"542456", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' authentication-operator-84bd79899c-sh9lf_baf2761e-f0cd-4f1c-a4a5-c67e3788e45d became leader
2022-01-30T04:01:51.039809515Z I0130 04:01:51.033406       1 event.go:285] Event(v1.ObjectReference{Kind:"Lease", Namespace:"openshift-authentication-operator", Name:"cluster-authentication-operator-lock", UID:"f41ef5f7-354a-4b68-896a-2acfe531dd30", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"542458", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' authentication-operator-84bd79899c-sh9lf_baf2761e-f0cd-4f1c-a4a5-c67e3788e45d became leader

This means configmap-based and lease-based elections both work well in 4.10.

Compare it versus 4.9, openshift-authentication-operator pod logs only show lines of configmap-based election. No lines of lease-based election. This further verifies 4.10 is working as expected by the bug's PR.

After above understanding, no further test can be done IMO, moving to VERIFIED. Per https://github.com/kubernetes/kubernetes/issues/107454 , we should watch QE upgrades from 4.9 (i.e. x) to 4.10 (i.e. x+1)to see if there would be election issue. If there would be, we'll file separate bug. Since 4.11 is not yet rebased to k8s 1.24, we cannot watch upgrade from 4.10 to 4.11 (i.e. x+2) right now.

Comment 6 errata-xmlrpc 2022-03-10 16:40:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.