Bug 1779390
Summary: | failed to grant creds: error syncing creds in mint-mode: secrets "installer-cloud-credentials" already exists | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
Component: | Cloud Credential Operator | Assignee: | Joel Diaz <jdiaz> |
Status: | CLOSED ERRATA | QA Contact: | Johnny Liu <jialiu> |
Severity: | low | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.4 | CC: | jdiaz, jialiu |
Target Milestone: | --- | ||
Target Release: | 4.4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: Some CredentialsRequests would take a long time to be re-reconciled even though they had error conditions set.
Consequence: The retry times between attempts would be long.
Fix: Always reconcile a CredentialsRequest with conditions set on it, or if it is marked as not yet provisioned.
Result: CredentialsRequests with conditions should have attempts to be fully reconciled at a more regular rate.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-04 11:18:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
W. Trevor King
2019-12-03 21:46:16 UTC
Spotted by Clayton again in: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_node_exporter/48/pull-ci-openshift-node_exporter-master-e2e-aws/45 Appears to be about a 1% failure rate: https://search.svc.ci.openshift.org/chart?search=CredentialsFailing:%20.*credentials%20requests%20are%20failing%20to%20sync From the cco logs: time="2020-01-08T17:55:34Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-image-registry time="2020-01-08T17:55:34Z" level=debug msg="found secret namespace" controller=credreq cr=openshift-cloud-credential-operator/openshift-image-registry secret=openshift-image-registry/installer-cloud-credentials time="2020-01-08T17:55:34Z" level=debug msg="lastsyncgeneration is current and lastsynctimestamp was less than an hour ago, so no need to sync" controller=credreq cr=openshift-cloud-credential-operator/openshift-image-registry secret=openshift-image-registry/installer-cloud-credentials time="2020-01-08T17:55:34Z" level=debug msg="syncing cluster operator status" controller=credreq_status time="2020-01-08T17:55:34Z" level=debug msg="4 cred requests" controller=credreq_status time="2020-01-08T17:55:34Z" level=debug msg="set ClusterOperator condition" controller=credreq_status message="1 of 4 credentials requests are failing to sync." reason=CredentialsFailing status=True type=Degraded time="2020-01-08T17:55:34Z" level=debug msg="set ClusterOperator condition" controller=credreq_status message="3 of 4 credentials requests provisioned, 1 reporting errors." reason=Reconciling status=True type=Progressing This appears to be the broken cred but it's not reconciling due to the logic above. The cred request itself does show a failing CR: { "apiVersion": "cloudcredential.openshift.io/v1", "kind": "CredentialsRequest", "metadata": { "creationTimestamp": "2020-01-08T17:39:34Z", "finalizers": [ "cloudcredential.openshift.io/deprovision" ], "generation": 1, "labels": { "controller-tools.k8s.io": "1.0" }, "name": "openshift-image-registry", "namespace": "openshift-cloud-credential-operator", "resourceVersion": "1526", "selfLink": "/apis/cloudcredential.openshift.io/v1/namespaces/openshift-cloud-credential-operator/credentialsrequests/openshift-image-registry", "uid": "726e65ec-3577-42d7-acd8-f2427a45422a" }, "spec": { "providerSpec": { "apiVersion": "cloudcredential.openshift.io/v1", "kind": "AWSProviderSpec", "statementEntries": [ { "action": [ "s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketTagging", "s3:GetBucketTagging", "s3:PutBucketPublicAccessBlock", "s3:GetBucketPublicAccessBlock", "s3:PutEncryptionConfiguration", "s3:GetEncryptionConfiguration", "s3:PutLifecycleConfiguration", "s3:GetLifecycleConfiguration", "s3:GetBucketLocation", "s3:ListBucket", "s3:HeadBucket", "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucketMultipartUploads", "s3:AbortMultipartUpload" ], "effect": "Allow", "resource": "*" } ] }, "secretRef": { "name": "installer-cloud-credentials", "namespace": "openshift-image-registry" } }, "status": { "conditions": [ { "lastProbeTime": "2020-01-08T17:39:35Z", "lastTransitionTime": "2020-01-08T17:39:35Z", "message": "failed to grant creds: error syncing creds in mint-mode: secrets \"installer-cloud-credentials\" already exists", "reason": "CredentialsProvisionFailure", "status": "True", "type": "CredentialsProvisionFailure" } ], "lastSyncGeneration": 1, "lastSyncTimestamp": "2020-01-08T17:39:35Z", "providerStatus": { "apiVersion": "cloudcredential.openshift.io/v1", "kind": "AWSProviderStatus", "policy": "ci-op-jr96gzdy-37df0-openshift-image-registry-fnf4r-policy", "user": "ci-op-jr96gzdy-37df0-openshift-image-registry-fnf4r" }, "provisioned": false } }, And yet we're not reconciling. Feels like a weird race to kick this off, but then bad logic on how we set and interpret last sync generation and timestamp in the status. From description, this issue is not easy to be produced. I checked recent ci failure job, did not find similar issue yet. Here I only do some testing to ensure no regression issue. Verified with 4.4.0-0.nightly-2020-01-17-003449, installation is completed successfully. If anyone provide more detailed steps for validation, pls let me know. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |