In CI [1]: level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-wild7jfc-d89b3.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..." level=error msg="Cluster operator cloud-credential Degraded is True with CredentialsFailing: 1 of 4 credentials requests are failing to sync." level=info msg="Cluster operator cloud-credential Progressing is True with Reconciling: 3 of 4 credentials requests provisioned, 1 reporting errors." level=info msg="Cluster operator image-registry Available is False with DeploymentNotFound: The deployment does not exist" level=info msg="Cluster operator image-registry Progressing is True with Error: Unable to apply resources: unable to sync storage configuration: InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.\n\tstatus code: 403, request id: 2FBC89FCFA529CC8, host id: jM0bV0q16Z7Xn4OE3iMO73jmh4ETbFQF8hCCQnDtz9PWVrBtosEzvNvOFk7xQaW161LV2yIaUzo=" level=info msg="Cluster operator insights Disabled is False with : " level=fatal msg="failed to initialize the cluster: Cluster operator image-registry is still updating" apparently due to a failure to create the registry creds: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-sdn-multitenant-4.3/148/artifacts/e2e-aws/must-gather/registry-svc-ci-openshift-org-ocp4-3-2019-11-22-122829-sha256-64c63eedf863406fbc6c7515026f909a7221472cf70283708fb7010dd5e6139e/namespaces/openshift-cloud-credential-operator/cloudcredential.openshift.io/credentialsrequests/openshift-image-registry.yaml | yaml2json | jq -r '{name: .metadata.name, status}' | json2yaml name: openshift-image-registry status: conditions: - lastProbeTime: '2019-12-03T11:50:53Z' lastTransitionTime: '2019-12-03T11:50:53Z' message: 'failed to grant creds: error syncing creds in mint-mode: secrets "installer-cloud-credentials" already exists' reason: CredentialsProvisionFailure status: 'True' type: CredentialsProvisionFailure lastSyncGeneration: '1' lastSyncTimestamp: '2019-12-03T11:50:53Z' providerStatus: apiVersion: cloudcredential.openshift.io/v1 kind: AWSProviderStatus policy: ci-op-wild7jfc-d89b3-openshift-image-registry-kg65h-policy user: ci-op-wild7jfc-d89b3-openshift-image-registry-kg65h provisioned: 'false' Failure seems rare, with 10 jobs hitting this over the past 24h [3]. [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-sdn-multitenant-4.3/148 [2]: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-sdn-multitenant-4.3/148/artifacts/e2e-aws/must-gather/registry-svc-ci-openshift-org-ocp-4-3-2019-11-22-122829-sha256-64c63eedf863406fbc6c7515026f909a7221472cf70283708fb7010dd5e6139e/namespaces/openshift-cloud-credential-operator/cloudcredential.openshift.io/credentialsrequests/openshift-image-registry.yaml [3]: https://search.svc.ci.openshift.org/chart?search=CredentialsFailing:%20.*credentials%20requests%20are%20failing%20to%20sync
Spotted by Clayton again in: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_node_exporter/48/pull-ci-openshift-node_exporter-master-e2e-aws/45 Appears to be about a 1% failure rate: https://search.svc.ci.openshift.org/chart?search=CredentialsFailing:%20.*credentials%20requests%20are%20failing%20to%20sync From the cco logs: time="2020-01-08T17:55:34Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-image-registry time="2020-01-08T17:55:34Z" level=debug msg="found secret namespace" controller=credreq cr=openshift-cloud-credential-operator/openshift-image-registry secret=openshift-image-registry/installer-cloud-credentials time="2020-01-08T17:55:34Z" level=debug msg="lastsyncgeneration is current and lastsynctimestamp was less than an hour ago, so no need to sync" controller=credreq cr=openshift-cloud-credential-operator/openshift-image-registry secret=openshift-image-registry/installer-cloud-credentials time="2020-01-08T17:55:34Z" level=debug msg="syncing cluster operator status" controller=credreq_status time="2020-01-08T17:55:34Z" level=debug msg="4 cred requests" controller=credreq_status time="2020-01-08T17:55:34Z" level=debug msg="set ClusterOperator condition" controller=credreq_status message="1 of 4 credentials requests are failing to sync." reason=CredentialsFailing status=True type=Degraded time="2020-01-08T17:55:34Z" level=debug msg="set ClusterOperator condition" controller=credreq_status message="3 of 4 credentials requests provisioned, 1 reporting errors." reason=Reconciling status=True type=Progressing This appears to be the broken cred but it's not reconciling due to the logic above. The cred request itself does show a failing CR: { "apiVersion": "cloudcredential.openshift.io/v1", "kind": "CredentialsRequest", "metadata": { "creationTimestamp": "2020-01-08T17:39:34Z", "finalizers": [ "cloudcredential.openshift.io/deprovision" ], "generation": 1, "labels": { "controller-tools.k8s.io": "1.0" }, "name": "openshift-image-registry", "namespace": "openshift-cloud-credential-operator", "resourceVersion": "1526", "selfLink": "/apis/cloudcredential.openshift.io/v1/namespaces/openshift-cloud-credential-operator/credentialsrequests/openshift-image-registry", "uid": "726e65ec-3577-42d7-acd8-f2427a45422a" }, "spec": { "providerSpec": { "apiVersion": "cloudcredential.openshift.io/v1", "kind": "AWSProviderSpec", "statementEntries": [ { "action": [ "s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketTagging", "s3:GetBucketTagging", "s3:PutBucketPublicAccessBlock", "s3:GetBucketPublicAccessBlock", "s3:PutEncryptionConfiguration", "s3:GetEncryptionConfiguration", "s3:PutLifecycleConfiguration", "s3:GetLifecycleConfiguration", "s3:GetBucketLocation", "s3:ListBucket", "s3:HeadBucket", "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucketMultipartUploads", "s3:AbortMultipartUpload" ], "effect": "Allow", "resource": "*" } ] }, "secretRef": { "name": "installer-cloud-credentials", "namespace": "openshift-image-registry" } }, "status": { "conditions": [ { "lastProbeTime": "2020-01-08T17:39:35Z", "lastTransitionTime": "2020-01-08T17:39:35Z", "message": "failed to grant creds: error syncing creds in mint-mode: secrets \"installer-cloud-credentials\" already exists", "reason": "CredentialsProvisionFailure", "status": "True", "type": "CredentialsProvisionFailure" } ], "lastSyncGeneration": 1, "lastSyncTimestamp": "2020-01-08T17:39:35Z", "providerStatus": { "apiVersion": "cloudcredential.openshift.io/v1", "kind": "AWSProviderStatus", "policy": "ci-op-jr96gzdy-37df0-openshift-image-registry-fnf4r-policy", "user": "ci-op-jr96gzdy-37df0-openshift-image-registry-fnf4r" }, "provisioned": false } }, And yet we're not reconciling. Feels like a weird race to kick this off, but then bad logic on how we set and interpret last sync generation and timestamp in the status.
From description, this issue is not easy to be produced. I checked recent ci failure job, did not find similar issue yet. Here I only do some testing to ensure no regression issue. Verified with 4.4.0-0.nightly-2020-01-17-003449, installation is completed successfully. If anyone provide more detailed steps for validation, pls let me know.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581