2037856 – use lease for leader election

Bug 2037856 - use lease for leader election

Summary: use lease for leader election

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Abu Kashem
QA Contact:	Ke Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2041554
TreeView+	depends on / blocked

Reported:	2022-01-06 17:10 UTC by Abu Kashem
Modified:	2022-05-03 00:25 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2041554 2042501 (view as bug list)
Environment:
Last Closed:	2022-03-10 16:37:33 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-csi-snapshot-controller-operator pull 110	None	Merged	Bug 2037856: Update library-go to get leader election updates	2022-01-25 12:54:00 UTC
Github	openshift cluster-csi-snapshot-controller-operator pull 112	None	open	Bug 2037856: Fix typo in VolumeSnapshotContents RBAC	2022-01-25 16:51:22 UTC
Github	openshift cluster-kube-apiserver-operator pull 1294	None	Merged	Bug 2037856: bump library go	2022-01-21 08:21:00 UTC
Github	openshift cluster-kube-controller-manager-operator pull 590	None	Merged	Bug 2037856: bump library go	2022-01-24 18:55:20 UTC
Github	openshift cluster-kube-descheduler-operator pull 236	None	Merged	Bug 2037856: bump library go	2022-01-21 08:21:03 UTC
Github	openshift cluster-storage-operator pull 255	None	Merged	Bug 2037856: Update library-go to get leader election updates	2022-01-21 08:21:04 UTC
Github	openshift library-go pull 1282	None	Merged	Bug 2037856: use 'configmapsleases' based leader election	2022-01-21 08:21:04 UTC
Github	openshift vsphere-problem-detector pull 71	None	Merged	Bug 2037856: Update library-go to get leader election updates	2022-01-21 08:21:26 UTC
Red Hat Product Errata	RHSA-2022:0056	None	None	None	2022-03-10 16:37:45 UTC

Description Abu Kashem 2022-01-06 17:10:25 UTC

our operators use 'configmap' in their respective namespaces for leader election lock. we need to use Lease.

this warrants a change in library-go, in https://github.com/openshift/library-go/blob/master/pkg/config/leaderelection/leaderelection.go#L53, we need to use 'configmapsleases' instead of configmaps  (version skew). I hope that's the only change we need to make.


The context is configmap/endpoint based leader election is being removed in 1.24 https://github.com/kubernetes/kubernetes/pull/106852

we need to update the following operators:
- kas
- oas
- etcd
- auth

are there other operators we need to update?


we also need to add dedicated flowschema for each operator to make sure leader election traffic falls into 'leader-election' priority level. we only need to address configmap based traffic in to the flowscheam, lease based traffic will be addressed by apf bootstrap configuration in 1.24.


we will be able to remove the configmaps in 4.12, the lock migration will be as follows:
- 4.10: use 'configmapsleases' for leader election, this client will use both configmaps and lease objects, so that new (4.10) client can work with old (4.9) clients
- 4.11: use 'lease' for leader election, so new (4.11) client use lease only, and old (4.10) client can work with lease object. but we can't remove the configmaps yet, looks like the MultiLock (4.10 client) relies on the configmap to be present. 
- 4.12: remove the configmaps

Related issue in k/k: https://github.com/kubernetes/kubernetes/issues/107454

slack thread: https://coreos.slack.com/archives/CC3CZCQHM/p1641487878025300

Comment 1 Jan Safranek 2022-01-11 16:20:11 UTC

> are there other operators we need to update?

There is number of storage operators based on library-go and all of them have leader election:

https://github.com/openshift/cluster-storage-operator
https://github.com/openshift/cluster-csi-snapshot-controller-operator
https://github.com/openshift/vsphere-problem-detector

https://github.com/openshift/aws-ebs-csi-driver-operator
https://github.com/openshift/aws-efs-csi-driver-operator
https://github.com/openshift/azure-disk-csi-driver-operator
https://github.com/openshift/azure-file-csi-driver-operator
https://github.com/openshift/openstack-cinder-csi-driver-operator
https://github.com/openshift/csi-driver-manila-operator
https://github.com/openshift/gcp-pd-csi-driver-operator
https://github.com/openshift/vmware-vsphere-csi-driver-operator
https://github.com/openshift/alibaba-disk-csi-driver-operator
https://github.com/openshift/ibm-vpc-block-csi-driver-operator

Comment 3 Jan Safranek 2022-01-17 13:44:27 UTC

(In reply to Jan Safranek from comment #1)
> There is number of storage operators based on library-go and all of them
> have leader election:
> 
> https://github.com/openshift/cluster-storage-operator
> https://github.com/openshift/cluster-csi-snapshot-controller-operator
> https://github.com/openshift/vsphere-problem-detector

I created PRs in these repos ^.

> https://github.com/openshift/aws-ebs-csi-driver-operator
> https://github.com/openshift/aws-efs-csi-driver-operator
> https://github.com/openshift/azure-disk-csi-driver-operator
> https://github.com/openshift/azure-file-csi-driver-operator
> https://github.com/openshift/openstack-cinder-csi-driver-operator
> https://github.com/openshift/csi-driver-manila-operator
> https://github.com/openshift/gcp-pd-csi-driver-operator
> https://github.com/openshift/vmware-vsphere-csi-driver-operator
> https://github.com/openshift/alibaba-disk-csi-driver-operator
> https://github.com/openshift/ibm-vpc-block-csi-driver-operator

These repos are/will be updated as part of fix for https://bugzilla.redhat.com/show_bug.cgi?id=2038934. We will not track them here.

Comment 5 Abu Kashem 2022-01-19 16:15:18 UTC

kewang,

we also need to verify it for oas operator: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/490

basically all operators in control plane group.
BZs for other operators:
- https://bugzilla.redhat.com/show_bug.cgi?id=2041554
- https://bugzilla.redhat.com/show_bug.cgi?id=2042501

Comment 12 Xingxing Xia 2022-01-30 05:01:29 UTC

Auth verification FYI: https://bugzilla.redhat.com/show_bug.cgi?id=2041554#c3

Comment 13 Ke Wang 2022-01-30 14:41:19 UTC

Verification steps for apiserver related operators,

Checked the library-go, found the lib PR is: https://github.com/openshift/library-go/pull/1282 . Read its code, the only difference is: leaderelection.go now switches to return ConfigMapsLeasesResourceLock instead of ConfigMapsResourceLock,  the definition and use of ConfigMapsLeasesResourceLock, in lines https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/vendor/k8s.io/client-go/tools/leaderelection/resourcelock/interface.go#L137-L140 :
	case ConfigMapsLeasesResourceLock:
		return &MultiLock{
			Primary:   configmapLock,
			Secondary: leaseLock,
			
This means 4.10 indeed both use old configmap-based election and new lease-baded election, 

Further check from openshift-kube-operator and openshift-apiserver-operator for both locks, 
For OCP 4.10
$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-fc.4   True        False         3h50m   Cluster version is 4.10.0-fc.4

$ oc get cm -n openshift-kube-apiserver-operator | grep lock
kube-apiserver-operator-lock          0      4h12m

$ oc get lease -n openshift-kube-apiserver-operator 
NAME                           HOLDER                                                                         AGE
kube-apiserver-operator-lock   kube-apiserver-operator-55f775964-5jtwp_905d5e54-b695-492c-a6f8-01b3997b22e0   4h18m

$ oc get lease kube-apiserver-operator-lock -n openshift-kube-apiserver-operator -o yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2022-01-30T01:45:41Z"
  name: kube-apiserver-operator-lock
  namespace: openshift-kube-apiserver-operator
  resourceVersion: "101626"
  uid: 9513411e-a2b0-4309-97c9-f5a2325ecc4b
spec:
  acquireTime: "2022-01-30T01:45:41.000000Z"
  holderIdentity: kube-apiserver-operator-55f775964-5jtwp_905d5e54-b695-492c-a6f8-01b3997b22e0
  leaseDurationSeconds: 137
  leaseTransitions: 0
  renewTime: "2022-01-30T06:03:28.351945Z"


$ oc get cm -n openshift-apiserver-operator | grep lock
openshift-apiserver-operator-lock     0      4h13m

$ oc get lease -n openshift-apiserver-operator 
NAME                                HOLDER                                                                               AGE
openshift-apiserver-operator-lock   openshift-apiserver-operator-54cb9c8457-pngzh_e2ad5ac7-4690-49ce-aec8-4a87bd1c838d   4h20m

$ oc get lease openshift-apiserver-operator-lock  -n openshift-apiserver-operator -o yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2022-01-30T01:45:43Z"
  name: openshift-apiserver-operator-lock
  namespace: openshift-apiserver-operator
  resourceVersion: "102601"
  uid: ea4ee887-33b2-4df2-9892-2d4e9121c2c7
spec:
  acquireTime: "2022-01-30T01:45:43.000000Z"
  holderIdentity: openshift-apiserver-operator-54cb9c8457-pngzh_e2ad5ac7-4690-49ce-aec8-4a87bd1c838d
  leaseDurationSeconds: 137
  leaseTransitions: 0
  renewTime: "2022-01-30T06:06:27.629485Z"

There are both configmap and lease locks.

To make sure this feature is new in 4.10 by comparing it to OCP 4.9, 
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.18    True        False         3h51m   Cluster version is 4.9.18

$ oc get cm -n openshift-kube-apiserver-operator | grep lock
kube-apiserver-operator-lock          0      4h12m

$ oc get lease -n openshift-kube-apiserver-operator 
No resources found in openshift-kube-apiserver-operator namespace.

$ oc get cm -n openshift-apiserver-operator | grep lock
openshift-apiserver-operator-lock     0      4h16m

$ oc get lease -n openshift-apiserver-operator
No resources found in openshift-apiserver-operator namespace.

Only configmap locks exist.

To check the openshift-kube-operator and openshift-apiserver-operator for both locks in logs,

on 4.9 and 4.10 cluster, applying the following patches, 
oc patch kubeapiservers.operator.openshift.io/cluster --type=merge -p="
spec:
  operatorLogLevel: TraceAll
"

oc patch openshiftapiservers.operator.openshift.io/cluster --type=merge -p="
spec:
  operatorLogLevel: TraceAll
"

4.10,
Deleted openshift-kube-operator pod, wait for the new pod to be created, then check openshift-kube-operator pod logs, there are:
I0130 06:15:24.012794       1 leaderelection.go:258] successfully acquired lease openshift-kube-apiserver-operator/kube-apiserver-operator-lock
I0130 06:15:24.012918       1 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator-lock", UID:"37a44dd4-     46e3-4681-931f-5be67996f7fc", APIVersion:"v1", ResourceVersion:"105463", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' kube-apiserver-operator-55f775964-7jvr5_6edb975a-4d8f-41     4b-9a2a-6d508cf8f682 became leader
I0130 06:15:24.012934       1 event.go:285] Event(v1.ObjectReference{Kind:"Lease", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator-lock", UID:"9513411e-a2b0     -4309-97c9-f5a2325ecc4b", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"105464", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' kube-apiserver-operator-55f775964-7jvr5_     6edb975a-4d8f-414b-9a2a-6d508cf8f682 became leader

This means configmap-based and lease-based elections both work well in 4.10.

Deleted openshift-apiserver-operator pod, wait for the new pod to be created, then check openshift-apiserver-operator pod logs, there are:
I0130 06:15:42.533693       1 leaderelection.go:258] successfully acquired lease openshift-apiserver-operator/openshift-apiserver-operator-lock
I0130 06:15:42.533921       1 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator-lock", UID:"f0fae852      -6140-4a1f-8f06-10cb646e6877", APIVersion:"v1", ResourceVersion:"105633", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' openshift-apiserver-operator-54cb9c8457-66vwf_394f63fa      -c25b-428c-8b34-2f7d18e8d00a became leader
I0130 06:15:42.533945       1 event.go:285] Event(v1.ObjectReference{Kind:"Lease", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator-lock", UID:"ea4ee887-33b      2-4df2-9892-2d4e9121c2c7", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"105634", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' openshift-apiserver-operator-54cb9c845      7-66vwf_394f63fa-c25b-428c-8b34-2f7d18e8d00a became leader

The same goes for openshift-apiserver-operator. 

Compare it versus 4.9, openshift-kube-operator and openshift-apiserver-operator pod logs only show lines of configmap-based election. No lines of lease-based election. This further verifies 4.10 is working as expected by the bug's PR.

Based on above, no further test can be done IMO, moving to VERIFIED. Per https://github.com/kubernetes/kubernetes/issues/107454 , we should watch QE upgrades from 4.9 (i.e. x) to 4.10 (i.e. x+1)to see if there would be election issue. If there would be, we'll file separate bug. Since 4.11 is not yet rebased to k8s 1.24, we cannot watch upgrade from 4.10 to 4.11 (i.e. x+2) right now.

Comment 16 errata-xmlrpc 2022-03-10 16:37:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.