Bug 1952282
Summary: | CSR approver races with nodelink controller and does not requeue | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Michael Gugino <mgugino> |
Component: | Cloud Compute | Assignee: | Michael Gugino <mgugino> |
Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | kborup |
Version: | 4.8 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 23:02:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1949288 |
Description
Michael Gugino
2021-04-21 23:55:54 UTC
Hi @Michael , Can you help with the steps to try to reproduce condition , may be using logs to confirm etc .? Hi Milind, This will be a tricky one as it doesn't always happen. Look at some recent CI runs, and search the cluster-machine-approver logs like this one (the original test in this BZ): https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-autoscaler-operator/198/pull-ci-openshift-cluster-autoscaler-operator-master-e2e-aws/1382831777525010432/artifacts/e2e-aws/gather-extra/artifacts/pods/openshift-cluster-machine-approver_machine-approver-6644869bdb-d62gk_machine-approver-controller.log Here, we can see the following log lines: E0415 23:29:07.305130 1 csr_check.go:196] csr-rxkvd: Serving Cert: No target machine for node "ip-10-0-242-145.us-east-2.compute.internal" I0415 23:29:07.305136 1 controller.go:172] csr-rxkvd: CSR not authorized csr-rxkvd (randomly generated name) has the message "No target machine for node". Subsequently, we see "CSR not authorized", and then we never see csr-rxkvd show up again in the logs. If this is behaving correctly, we might see the "No target machine for node" (this is the non-deterministic bit due to race) for a given CSR in a newer run. We should see "CSR not authorized" and then we should see the same csr-xxxx again, eventually it should be approved. Thanks Michael , I could see them in Azure serial run below are the details moved to VERIFIED based on them Validated on - Validated on - 4.8.0-0.nightly-2021-05-12-072240 for the certificate - csr-z7z44 from the logs :https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-serial-4.8/1392380279766650880/artifacts/e2e-azure-serial/pods/openshift-cluster-machine-approver_machine-approver-6458fd8964-wnv7q_machine-approver-controller.log . . I0512 07:57:59.359043 1 csr_check.go:442] retrieving serving cert from ci-op-khktrhkg-ce9aa-bvlxm-worker-centralus1-g59bw (10.0.32.4:10250) I0512 07:57:59.361271 1 csr_check.go:186] Failed to retrieve current serving cert: remote error: tls: internal error I0512 07:57:59.361300 1 csr_check.go:191] Falling back to machine-api authorization for ci-op-khktrhkg-ce9aa-bvlxm-worker-centralus1-g59bw E0512 07:57:59.361310 1 csr_check.go:196] csr-z7z44: Serving Cert: No target machine for node "ci-op-khktrhkg-ce9aa-bvlxm-worker-centralus1-g59bw" I0512 07:57:59.361325 1 controller.go:172] csr-z7z44: CSR not authorized .. . . . I0512 07:59:21.282281 1 controller.go:114] Reconciling CSR: csr-z7z44 I0512 07:59:21.282797 1 csr_check.go:150] csr-z7z44: CSR does not appear to be client csr I0512 07:59:21.282851 1 csr_check.go:442] retrieving serving cert from ci-op-khktrhkg-ce9aa-bvlxm-worker-centralus1-g59bw (10.0.32.4:10250) I0512 07:59:21.285221 1 csr_check.go:186] Failed to retrieve current serving cert: remote error: tls: internal error I0512 07:59:21.285244 1 csr_check.go:191] Falling back to machine-api authorization for ci-op-khktrhkg-ce9aa-bvlxm-worker-centralus1-g59bw I0512 07:59:21.297589 1 controller.go:179] CSR csr-z7z44 approved. . . . Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |