Bug 2042501 - use lease for leader election [NEEDINFO]
Summary: use lease for leader election
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.10
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.11.0
Assignee: Allen Ray
QA Contact: ge liu
Depends On: 2024643 2040533 2050250 2050253
Blocks: 2041554 2047929
TreeView+ depends on / blocked
Reported: 2022-01-19 16:00 UTC by Abu Kashem
Modified: 2022-08-10 10:43 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 2037856
: 2047929 (view as bug list)
Last Closed: 2022-08-10 10:43:03 UTC
Target Upstream Version:
mfojtik: needinfo?

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-etcd-operator pull 729 0 None Merged Bug 2042501: bump library-go 2022-02-01 08:55:54 UTC
Red Hat Bugzilla 2040533 1 medium CLOSED Install fails to bootstrap, complaining about DefragControllerDegraded and sad members 2023-11-18 04:25:02 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:43:25 UTC

Comment 1 Michal Fojtik 2022-01-19 16:11:43 UTC

This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold.
Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.


Please answer these questions before escalation to engineering:

1. Has a link to must-gather output been provided in this BZ? We cannot work without. If must-gather fails to run, attach all relevant logs and provide the error message of must-gather.
2. Give the output of "oc get clusteroperators -o yaml".
3. In case of degraded/unavailable operators, have all their logs and the logs of the operands been analyzed [yes/no]
4. List the top 5 relevant errors from the logs of the operators and operands in (3).
5. Order the list of degraded/unavailable operators according to which is likely the cause of the failure of the other, root-cause at the top.
6. Explain why (5) is likely the right order and list the information used for that assessment.
7. Explain why Engineering is necessary to make progress.

Comment 6 Xingxing Xia 2022-01-30 05:02:35 UTC
QA Contact of this bug:
  Auth verification FYI: https://bugzilla.redhat.com/show_bug.cgi?id=2041554#c3

Comment 7 ge liu 2022-01-30 07:25:26 UTC
Checked with 4.10.0-0.nightly-2022-01-29-061534, the configmap-based and lease-based elections both work in 4.10, compare to 4.9, there is only configmap-based elections works in 4.9, and it seems not work as description of this bug(we need to use 'configmapsleases' instead of configmaps), I suppose this fix is work as design, so who may double confirm that BOTH WORK is exact fix for this bug before I change bug to verify status, and auth component verification also got similar result. cc: alray, akashem xxia

# oc get lease 
NAME                                   HOLDER                                                                AGE
openshift-cluster-etcd-operator-lock   etcd-operator-55b976c5b7-jt5ss_5aa9d47d-012a-4369-ab8a-c19fd538e4aa   20h
[root@preserved-geliurhel-1 tmp]# oc get cm
NAME                                   DATA   AGE
etcd-ca-bundle                         1      20h
etcd-operator-config                   1      20h
etcd-service-ca-bundle                 1      20h
kube-root-ca.crt                       1      20h
openshift-cluster-etcd-operator-lock   0      20h
openshift-service-ca.crt               1      20h

I0129 10:29:00.500003       1 leaderelection.go:258] successfully acquired lease openshift-etcd-operator/openshift-cluster-etcd-operator-lock
I0129 10:29:00.502598       1 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-etcd-operator", Name:"openshift-cluster-etcd-operator-lock", UID:"76ff29a2-6090-408d-8989-7ea43482c2b7", APIVersion:"v1", ResourceVersion:"8641", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' etcd-operator-55b976c5b7-jt5ss_5aa9d47d-012a-4369-ab8a-c19fd538e4aa became leader
I0129 10:29:00.502632       1 event.go:285] Event(v1.ObjectReference{Kind:"Lease", Namespace:"openshift-etcd-operator", Name:"openshift-cluster-etcd-operator-lock", UID:"701c041a-26d1-4830-9ae4-b13f08e837b3", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"8642", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' etcd-operator-55b976c5b7-jt5ss_5aa9d47d-012a-4369-ab8a-c19fd538e4aa became leader

Comment 10 errata-xmlrpc 2022-08-10 10:43:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.