2047929 – use lease for leader election

Bug 2047929 - use lease for leader election

Summary: use lease for leader election

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Etcd
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Wally
QA Contact:	ge liu
Docs Contact:
URL:
Whiteboard:	EmergencyRequest
Depends On:	2042501
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-28 21:32 UTC by Abu Kashem
Modified:	2022-05-03 00:25 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2042501
Environment:
Last Closed:	2022-03-10 16:42:52 UTC
Target Upstream Version:
Embargoed:
Flags:	wlewis: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-etcd-operator pull 734	0	None	Merged	Bug 2047929: [release-4.10] bump library-go	2022-02-01 08:55:31 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-10 16:43:04 UTC

Comment 1 Michal Fojtik 2022-01-28 21:41:43 UTC

** A NOTE ABOUT USING URGENT **

This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold.
Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

** INFORMATION REQUIRED **

Please answer these questions before escalation to engineering:

1. Has a link to must-gather output been provided in this BZ? We cannot work without. If must-gather fails to run, attach all relevant logs and provide the error message of must-gather.
2. Give the output of "oc get clusteroperators -o yaml".
3. In case of degraded/unavailable operators, have all their logs and the logs of the operands been analyzed [yes/no]
4. List the top 5 relevant errors from the logs of the operators and operands in (3).
5. Order the list of degraded/unavailable operators according to which is likely the cause of the failure of the other, root-cause at the top.
6. Explain why (5) is likely the right order and list the information used for that assessment.
7. Explain why Engineering is necessary to make progress.

Comment 5 Xingxing Xia 2022-01-30 05:03:33 UTC

QA Contact of this bug:
  Auth verification FYI: https://bugzilla.redhat.com/show_bug.cgi?id=2041554#c3

Comment 7 Sandeep 2022-02-04 14:25:09 UTC

Checked the below version.
oc get clusterversion
NAME      VERSION                         AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-02-04-082923   True        False         82m     Cluster version is 4.10.0-0.nightly-2022-02-04-082923

configmap based and lease based elections work in 4.10


oc get lease
NAME                                   HOLDER                                                                AGE
openshift-cluster-etcd-operator-lock   etcd-operator-859d4cf76f-wqx57_a6976261-ac4b-44e7-a260-4bccea70e451   102m

oc get cm
NAME                                   DATA   AGE
etcd-ca-bundle                         1      104m
etcd-operator-config                   1      104m
etcd-service-ca-bundle                 1      104m
kube-root-ca.crt                       1      104m
openshift-cluster-etcd-operator-lock   0      102m
openshift-service-ca.crt               1      104m


below traces from etcd-operator logs:

I0204 11:54:32.328032       1 leaderelection.go:258] successfully acquired lease openshift-etcd-operator/openshift-cluster-etcd-operator-lock
I0204 11:54:32.328184       1 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-etcd-operator", Name:"openshift-cluster-etcd-operator-lock", UID:"6e054200-6e44-49cd-a309-27bf21e08329", APIVersion:"v1", ResourceVersion:"19672", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' etcd-operator-859d4cf76f-wqx57_a6976261-ac4b-44e7-a260-4bccea70e451 became leader
I0204 11:54:32.328214       1 event.go:285] Event(v1.ObjectReference{Kind:"Lease", Namespace:"openshift-etcd-operator", Name:"openshift-cluster-etcd-operator-lock", UID:"2a39dee7-4ad7-4edb-bb72-22c54e4b214d", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"19673", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' etcd-operator-859d4cf76f-wqx57_a6976261-ac4b-44e7-a260-4bccea70e451 became leader

Comment 9 errata-xmlrpc 2022-03-10 16:42:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.