Bug 2037856 - use lease for leader election
Summary: use lease for leader election
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Abu Kashem
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks: 2041554
TreeView+ depends on / blocked
 
Reported: 2022-01-06 17:10 UTC by Abu Kashem
Modified: 2022-05-03 00:25 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2041554 2042501 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:37:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-csi-snapshot-controller-operator pull 110 0 None Merged Bug 2037856: Update library-go to get leader election updates 2022-01-25 12:54:00 UTC
Github openshift cluster-csi-snapshot-controller-operator pull 112 0 None open Bug 2037856: Fix typo in VolumeSnapshotContents RBAC 2022-01-25 16:51:22 UTC
Github openshift cluster-kube-apiserver-operator pull 1294 0 None Merged Bug 2037856: bump library go 2022-01-21 08:21:00 UTC
Github openshift cluster-kube-controller-manager-operator pull 590 0 None Merged Bug 2037856: bump library go 2022-01-24 18:55:20 UTC
Github openshift cluster-kube-descheduler-operator pull 236 0 None Merged Bug 2037856: bump library go 2022-01-21 08:21:03 UTC
Github openshift cluster-storage-operator pull 255 0 None Merged Bug 2037856: Update library-go to get leader election updates 2022-01-21 08:21:04 UTC
Github openshift library-go pull 1282 0 None Merged Bug 2037856: use 'configmapsleases' based leader election 2022-01-21 08:21:04 UTC
Github openshift vsphere-problem-detector pull 71 0 None Merged Bug 2037856: Update library-go to get leader election updates 2022-01-21 08:21:26 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:37:45 UTC

Description Abu Kashem 2022-01-06 17:10:25 UTC
our operators use 'configmap' in their respective namespaces for leader election lock. we need to use Lease.

this warrants a change in library-go, in https://github.com/openshift/library-go/blob/master/pkg/config/leaderelection/leaderelection.go#L53, we need to use 'configmapsleases' instead of configmaps  (version skew). I hope that's the only change we need to make.


The context is configmap/endpoint based leader election is being removed in 1.24 https://github.com/kubernetes/kubernetes/pull/106852

we need to update the following operators:
- kas
- oas
- etcd
- auth

are there other operators we need to update?


we also need to add dedicated flowschema for each operator to make sure leader election traffic falls into 'leader-election' priority level. we only need to address configmap based traffic in to the flowscheam, lease based traffic will be addressed by apf bootstrap configuration in 1.24.


we will be able to remove the configmaps in 4.12, the lock migration will be as follows:
- 4.10: use 'configmapsleases' for leader election, this client will use both configmaps and lease objects, so that new (4.10) client can work with old (4.9) clients
- 4.11: use 'lease' for leader election, so new (4.11) client use lease only, and old (4.10) client can work with lease object. but we can't remove the configmaps yet, looks like the MultiLock (4.10 client) relies on the configmap to be present. 
- 4.12: remove the configmaps

Related issue in k/k: https://github.com/kubernetes/kubernetes/issues/107454

slack thread: https://coreos.slack.com/archives/CC3CZCQHM/p1641487878025300

Comment 5 Abu Kashem 2022-01-19 16:15:18 UTC
kewang,

we also need to verify it for oas operator: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/490

basically all operators in control plane group.
BZs for other operators:
- https://bugzilla.redhat.com/show_bug.cgi?id=2041554
- https://bugzilla.redhat.com/show_bug.cgi?id=2042501

Comment 12 Xingxing Xia 2022-01-30 05:01:29 UTC
Auth verification FYI: https://bugzilla.redhat.com/show_bug.cgi?id=2041554#c3

Comment 13 Ke Wang 2022-01-30 14:41:19 UTC
Verification steps for apiserver related operators,

Checked the library-go, found the lib PR is: https://github.com/openshift/library-go/pull/1282 . Read its code, the only difference is: leaderelection.go now switches to return ConfigMapsLeasesResourceLock instead of ConfigMapsResourceLock,  the definition and use of ConfigMapsLeasesResourceLock, in lines https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/vendor/k8s.io/client-go/tools/leaderelection/resourcelock/interface.go#L137-L140 :
	case ConfigMapsLeasesResourceLock:
		return &MultiLock{
			Primary:   configmapLock,
			Secondary: leaseLock,
			
This means 4.10 indeed both use old configmap-based election and new lease-baded election, 

Further check from openshift-kube-operator and openshift-apiserver-operator for both locks, 
For OCP 4.10
$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-fc.4   True        False         3h50m   Cluster version is 4.10.0-fc.4

$ oc get cm -n openshift-kube-apiserver-operator | grep lock
kube-apiserver-operator-lock          0      4h12m

$ oc get lease -n openshift-kube-apiserver-operator 
NAME                           HOLDER                                                                         AGE
kube-apiserver-operator-lock   kube-apiserver-operator-55f775964-5jtwp_905d5e54-b695-492c-a6f8-01b3997b22e0   4h18m

$ oc get lease kube-apiserver-operator-lock -n openshift-kube-apiserver-operator -o yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2022-01-30T01:45:41Z"
  name: kube-apiserver-operator-lock
  namespace: openshift-kube-apiserver-operator
  resourceVersion: "101626"
  uid: 9513411e-a2b0-4309-97c9-f5a2325ecc4b
spec:
  acquireTime: "2022-01-30T01:45:41.000000Z"
  holderIdentity: kube-apiserver-operator-55f775964-5jtwp_905d5e54-b695-492c-a6f8-01b3997b22e0
  leaseDurationSeconds: 137
  leaseTransitions: 0
  renewTime: "2022-01-30T06:03:28.351945Z"


$ oc get cm -n openshift-apiserver-operator | grep lock
openshift-apiserver-operator-lock     0      4h13m

$ oc get lease -n openshift-apiserver-operator 
NAME                                HOLDER                                                                               AGE
openshift-apiserver-operator-lock   openshift-apiserver-operator-54cb9c8457-pngzh_e2ad5ac7-4690-49ce-aec8-4a87bd1c838d   4h20m

$ oc get lease openshift-apiserver-operator-lock  -n openshift-apiserver-operator -o yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2022-01-30T01:45:43Z"
  name: openshift-apiserver-operator-lock
  namespace: openshift-apiserver-operator
  resourceVersion: "102601"
  uid: ea4ee887-33b2-4df2-9892-2d4e9121c2c7
spec:
  acquireTime: "2022-01-30T01:45:43.000000Z"
  holderIdentity: openshift-apiserver-operator-54cb9c8457-pngzh_e2ad5ac7-4690-49ce-aec8-4a87bd1c838d
  leaseDurationSeconds: 137
  leaseTransitions: 0
  renewTime: "2022-01-30T06:06:27.629485Z"

There are both configmap and lease locks.

To make sure this feature is new in 4.10 by comparing it to OCP 4.9, 
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.18    True        False         3h51m   Cluster version is 4.9.18

$ oc get cm -n openshift-kube-apiserver-operator | grep lock
kube-apiserver-operator-lock          0      4h12m

$ oc get lease -n openshift-kube-apiserver-operator 
No resources found in openshift-kube-apiserver-operator namespace.

$ oc get cm -n openshift-apiserver-operator | grep lock
openshift-apiserver-operator-lock     0      4h16m

$ oc get lease -n openshift-apiserver-operator
No resources found in openshift-apiserver-operator namespace.

Only configmap locks exist.

To check the openshift-kube-operator and openshift-apiserver-operator for both locks in logs,

on 4.9 and 4.10 cluster, applying the following patches, 
oc patch kubeapiservers.operator.openshift.io/cluster --type=merge -p="
spec:
  operatorLogLevel: TraceAll
"

oc patch openshiftapiservers.operator.openshift.io/cluster --type=merge -p="
spec:
  operatorLogLevel: TraceAll
"

4.10,
Deleted openshift-kube-operator pod, wait for the new pod to be created, then check openshift-kube-operator pod logs, there are:
I0130 06:15:24.012794       1 leaderelection.go:258] successfully acquired lease openshift-kube-apiserver-operator/kube-apiserver-operator-lock
I0130 06:15:24.012918       1 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator-lock", UID:"37a44dd4-     46e3-4681-931f-5be67996f7fc", APIVersion:"v1", ResourceVersion:"105463", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' kube-apiserver-operator-55f775964-7jvr5_6edb975a-4d8f-41     4b-9a2a-6d508cf8f682 became leader
I0130 06:15:24.012934       1 event.go:285] Event(v1.ObjectReference{Kind:"Lease", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator-lock", UID:"9513411e-a2b0     -4309-97c9-f5a2325ecc4b", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"105464", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' kube-apiserver-operator-55f775964-7jvr5_     6edb975a-4d8f-414b-9a2a-6d508cf8f682 became leader

This means configmap-based and lease-based elections both work well in 4.10.

Deleted openshift-apiserver-operator pod, wait for the new pod to be created, then check openshift-apiserver-operator pod logs, there are:
I0130 06:15:42.533693       1 leaderelection.go:258] successfully acquired lease openshift-apiserver-operator/openshift-apiserver-operator-lock
I0130 06:15:42.533921       1 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator-lock", UID:"f0fae852      -6140-4a1f-8f06-10cb646e6877", APIVersion:"v1", ResourceVersion:"105633", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' openshift-apiserver-operator-54cb9c8457-66vwf_394f63fa      -c25b-428c-8b34-2f7d18e8d00a became leader
I0130 06:15:42.533945       1 event.go:285] Event(v1.ObjectReference{Kind:"Lease", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator-lock", UID:"ea4ee887-33b      2-4df2-9892-2d4e9121c2c7", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"105634", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' openshift-apiserver-operator-54cb9c845      7-66vwf_394f63fa-c25b-428c-8b34-2f7d18e8d00a became leader

The same goes for openshift-apiserver-operator. 

Compare it versus 4.9, openshift-kube-operator and openshift-apiserver-operator pod logs only show lines of configmap-based election. No lines of lease-based election. This further verifies 4.10 is working as expected by the bug's PR.

Based on above, no further test can be done IMO, moving to VERIFIED. Per https://github.com/kubernetes/kubernetes/issues/107454 , we should watch QE upgrades from 4.9 (i.e. x) to 4.10 (i.e. x+1)to see if there would be election issue. If there would be, we'll file separate bug. Since 4.11 is not yet rebased to k8s 1.24, we cannot watch upgrade from 4.10 to 4.11 (i.e. x+2) right now.

Comment 16 errata-xmlrpc 2022-03-10 16:37:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.