Bug 2052599 - kube-controller-manger should use configmap lease [NEEDINFO]
Summary: kube-controller-manger should use configmap lease
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: ravig
QA Contact: zhou ying
URL:
Whiteboard: EmergencyRequest
Depends On: 2052700
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-09 16:16 UTC by ravig
Modified: 2022-03-10 16:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 2052598
: 2052700 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:43:51 UTC
Target Upstream Version:
Embargoed:
mfojtik: needinfo?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-controller-manager-operator pull 603 0 None open Bug 2052599: [release-4.10] Update to use configmapleases 2022-02-09 20:35:26 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:44:04 UTC

Description ravig 2022-02-09 16:16:45 UTC
+++ This bug was initially created as a clone of Bug #2052598 +++

Description of problem:

https://bugzilla.redhat.com/show_bug.cgi?id=2037856 wanted to make sure that the operators have configmapleases where as we currently have leases only.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Michal Fojtik 2022-02-09 16:41:40 UTC
** A NOTE ABOUT USING URGENT **

This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold.
Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

** INFORMATION REQUIRED **

Please answer these questions before escalation to engineering:

1. Has a link to must-gather output been provided in this BZ? We cannot work without. If must-gather fails to run, attach all relevant logs and provide the error message of must-gather.
2. Give the output of "oc get clusteroperators -o yaml".
3. In case of degraded/unavailable operators, have all their logs and the logs of the operands been analyzed [yes/no]
4. List the top 5 relevant errors from the logs of the operators and operands in (3).
5. Order the list of degraded/unavailable operators according to which is likely the cause of the failure of the other, root-cause at the top.
6. Explain why (5) is likely the right order and list the information used for that assessment.
7. Explain why Engineering is necessary to make progress.

Comment 4 Michal Fojtik 2022-02-09 20:09:39 UTC
** A NOTE ABOUT USING URGENT **

This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold.
Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

** INFORMATION REQUIRED **

Please answer these questions before escalation to engineering:

1. Has a link to must-gather output been provided in this BZ? We cannot work without. If must-gather fails to run, attach all relevant logs and provide the error message of must-gather.
2. Give the output of "oc get clusteroperators -o yaml".
3. In case of degraded/unavailable operators, have all their logs and the logs of the operands been analyzed [yes/no]
4. List the top 5 relevant errors from the logs of the operators and operands in (3).
5. Order the list of degraded/unavailable operators according to which is likely the cause of the failure of the other, root-cause at the top.
6. Explain why (5) is likely the right order and list the information used for that assessment.
7. Explain why Engineering is necessary to make progress.

Comment 10 zhou ying 2022-02-14 03:23:11 UTC
Checked with latest payload, make sure there are both configmap and lease locks for KCM:

[root@localhost roottest]# oc get clusterversion 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-02-11-123954   True        False         47m     Cluster version is 4.10.0-0.nightly-2022-02-11-123954
[root@localhost roottest]# oc project openshift-kube-controller-manager-operator
Now using project "openshift-kube-controller-manager-operator" on server "https://api.yinzhou214.qe.devcluster.openshift.com:6443".
[root@localhost roottest]# oc get lease
NAME                                    HOLDER                                                                                   AGE
kube-controller-manager-operator-lock   kube-controller-manager-operator-668cf96f7b-9bwvl_6928e020-58ba-4223-a054-1f8b5e19e953   66m
[root@localhost roottest]# oc get lease kube-controller-manager-operator-lock -o yaml 
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2022-02-14T02:13:29Z"
  name: kube-controller-manager-operator-lock
  namespace: openshift-kube-controller-manager-operator
  resourceVersion: "45913"
  uid: 7c875d77-f28f-4de9-96b6-b899ab46b797
spec:
  acquireTime: "2022-02-14T02:14:48.000000Z"
  holderIdentity: kube-controller-manager-operator-668cf96f7b-9bwvl_6928e020-58ba-4223-a054-1f8b5e19e953
  leaseDurationSeconds: 137
  leaseTransitions: 1
  renewTime: "2022-02-14T03:19:59.656523Z"
[root@localhost roottest]# oc get cm |grep lock
kube-controller-manager-operator-lock     0      67m

Comment 12 errata-xmlrpc 2022-03-10 16:43:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.