Bug 1927017
Summary: | CCO does not relinquish leadership when restarting for proxy CA change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Matthew Staebler <mstaeble> | ||||||||
Component: | Cloud Credential Operator | Assignee: | sumehta | ||||||||
Status: | CLOSED ERRATA | QA Contact: | wang lin <lwan> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 4.7 | CC: | arane, dgoodwin, gshereme, lwan, sumehta | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | 4.8.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2021-07-27 22:43:10 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Matthew Staebler
2021-02-09 20:37:17 UTC
Created attachment 1756047 [details] in-cluster CCO log From the in-cluster CCO, > time="2021-02-09T20:21:36Z" level=info msg="generated leader election ID" id=7fca7590-766b-4709-a7be-104708a1260f > I0209 20:21:36.858036 1 leaderelection.go:243] attempting to acquire leader lease openshift-cloud-credential-operator/cloud-credential-operator-leader... > time="2021-02-09T20:21:37Z" level=info msg="current leader: 6706a11c-b8b2-409f-b9bb-3d26b65685b3" > I0209 20:30:05.713252 1 leaderelection.go:253] successfully acquired lease openshift-cloud-credential-operator/cloud-credential-operator-leader > time="2021-02-09T20:30:05Z" level=info msg="became leader" id=7fca7590-766b-4709-a7be-104708a1260f Created attachment 1756048 [details] openshift-machine-api-aws cred request Of particular interest to me is the openshift-machine-api-aws cred request. The cred request was created at 2021-02-09T20:12:17Z, but was not fulfilled until 2021-02-09T20:30:14Z. > time="2021-02-09T20:30:14Z" level=info msg="secret created successfully" actuator=aws cr=openshift-cloud-credential-operator/openshift-machine-api-aws targetSecret=openshift-machine-api/aws-cloud-credentials via Abhinav: we are likely using os.Exit when we detect the proxy CA change which would bypass our leader election lease release. We should have a global context that gets cancelled instead to allow the lease release to execute. The issue didn't fix. When I changed proxy CA, the cco pod didn't restart. version: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-03-24-200346 True False 33m Cluster version is 4.8.0-0.nightly-2021-03-24-200346 ######logs###### time="2021-03-25T07:01:39Z" level=info msg="Proxy CA configmap change detected, restarting pod" configmap=openshift-cloud-credential-operator/cco-trusted-ca controller=configmap time="2021-03-25T07:01:54Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics time="2021-03-25T07:01:54Z" level=info msg="reconcile complete" controller=metrics elapsed=2.498132ms time="2021-03-25T07:01:55Z" level=info msg="reconciling clusteroperator status" time="2021-03-25T07:01:55Z" level=info msg="clusteroperator status updated" controller=status W0325 07:02:40.994149 1 warnings.go:67] admissionregistration.k8s.io/v1beta1 MutatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 MutatingWebhookConfiguration time="2021-03-25T07:03:54Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics time="2021-03-25T07:03:54Z" level=info msg="reconcile complete" controller=metrics elapsed=2.38714ms time="2021-03-25T07:05:54Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics time="2021-03-25T07:05:54Z" level=info msg="reconcile complete" controller=metrics elapsed=2.523944ms time="2021-03-25T07:06:18Z" level=info msg="Proxy CA configmap change detected, restarting pod" configmap=openshift-cloud-credential-operator/cco-trusted-ca controller=configmap time="2021-03-25T07:06:55Z" level=info msg="reconciling clusteroperator status" time="2021-03-25T07:06:55Z" level=info msg="clusteroperator status updated" controller=status time="2021-03-25T07:07:54Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics time="2021-03-25T07:07:54Z" level=info msg="reconcile complete" controller=metrics elapsed=2.539316ms time="2021-03-25T07:09:54Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics time="2021-03-25T07:09:54Z" level=info msg="reconcile complete" controller=metrics elapsed=2.433577ms time="2021-03-25T07:11:54Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics time="2021-03-25T07:11:54Z" level=info msg="reconcile complete" controller=metrics elapsed=2.361834ms time="2021-03-25T07:11:55Z" level=info msg="reconciling clusteroperator status" Verified on 4.8.0-0.nightly-2021-04-15-202330. 1. Change proxy CA, the old pod will detect the changes and restart old pod I0416 12:38:46.968666 1 observer_polling.go:120] Observed file "/var/run/configmaps/trusted-ca-bundle/tls-ca-bundle.pem" has been modified (old="e0c433f773e598341811fd40d74c581fbfe04f864739e80ea763c9e1291f6c5d", new="7336a74c27fc8c30928e9c8f8f275ec4b656221a18ba15aab4aeda42f546da1d") time="2021-04-16T12:38:46Z" level=info msg="Proxy CA configmap change detected, restarting pod" time="2021-04-16T12:38:46Z" level=info msg="leader lost" id=9b649244-8804-4c66-aabd-a3a0b1953d34 2. The new pod can become the leader immediately without waiting 8 minutes. new pod Copying system trust bundle time="2021-04-16T12:38:47Z" level=info msg="setting up client for manager" time="2021-04-16T12:38:47Z" level=info msg="running file observer" file=/var/run/configmaps/trusted-ca-bundle/tls-ca-bundle.pem I0416 12:38:47.554489 1 observer_polling.go:159] Starting file observer time="2021-04-16T12:38:47Z" level=info msg="generated leader election ID" id=7e23979a-7897-47c3-9ab2-ab47e084895a I0416 12:38:47.556316 1 leaderelection.go:243] attempting to acquire leader lease openshift-cloud-credential-operator/cloud-credential-operator-leader... I0416 12:38:47.568094 1 leaderelection.go:253] successfully acquired lease openshift-cloud-credential-operator/cloud-credential-operator-leader time="2021-04-16T12:38:47Z" level=info msg="became leader" id=7e23979a-7897-47c3-9ab2-ab47e084895a time="2021-04-16T12:38:47Z" level=info msg="setting up manager" I0416 12:38:48.618720 1 request.go:645] Throttling request took 1.046397293s, request: GET:https://172.30.0.1:443/apis/monitoring.coreos.com/v1?timeout=32s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |