Bug 1684738 - Image Registry operator wedged on waiting for installer-cloud-credentials secret
Summary: Image Registry operator wedged on waiting for installer-cloud-credentials secret
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.1.0
Assignee: Corey Daley
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-02 01:25 UTC by Abhinav Dahiya
Modified: 2019-06-04 10:44 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:44:51 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:44:58 UTC

Description Abhinav Dahiya 2019-03-02 01:25:53 UTC
Description of problem:

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/1348/pull-ci-openshift-installer-master-e2e-aws/4334 CI run failed with cluster-image-registry-operator failing to complete.

from https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1348/pull-ci-openshift-installer-master-e2e-aws/4334/artifacts/e2e-aws/clusteroperators.json the image registry operator is reporting failure since `2019-03-01T23:51:37Z`
```
        {
           "apiVersion": "config.openshift.io/v1",
           "kind": "ClusterOperator",
           "metadata": {
               "creationTimestamp": "2019-03-01T23:41:36Z",
               "generation": 1,
               "name": "image-registry",
               "resourceVersion": "19813",
               "selfLink": "/apis/config.openshift.io/v1/clusteroperators/image-registry",
               "uid": "8d76e26d-3c7b-11e9-8519-0ed54486e940"
           },
           "spec": {},
           "status": {
               "conditions": [
                   {
                       "lastTransitionTime": "2019-03-01T23:51:37Z",
                       "message": "Deployment does not exist",
                       "status": "False",
                       "type": "Available"
                   },
                   {
                       "lastTransitionTime": "2019-03-01T23:51:37Z",
                       "message": "Unable to apply resources: unable to sync storage configuration: unable to get cluster minted credentials \"kube-system/installer-cloud-credentials\": timed out waiting for the condition",
                       "status": "True",
                       "type": "Progressing"
                   },
                   {
                       "lastTransitionTime": "2019-03-01T23:51:37Z",
                       "status": "False",
                       "type": "Failing"
                   }
               ],
               "extension": null,
               "relatedObjects": null,
               "versions": [
                   {
                       "name": "operator",
                       "version": "4.0.0-87-gbf6c0c9-dirty"
                   }
               ]
           }
       },
```

from https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1348/pull-ci-openshift-installer-master-e2e-aws/4334/artifacts/e2e-aws/clusteroperators.json the cloud-creds-operator is reporting completed creating creds since 2019-03-01T23:52:23Z

```
        {
           "apiVersion": "config.openshift.io/v1",
           "kind": "ClusterOperator",
           "metadata": {
               "creationTimestamp": "2019-03-01T23:38:30Z",
               "generation": 1,
               "name": "openshift-cloud-credential-operator",
               "resourceVersion": "20301",
               "selfLink": "/apis/config.openshift.io/v1/clusteroperators/openshift-cloud-credential-operator",
               "uid": "1e79a670-3c7b-11e9-8519-0ed54486e940"
           },
           "spec": {},
           "status": {
               "conditions": [
                   {
                       "lastTransitionTime": "2019-03-01T23:52:23Z",
                       "message": "No credentials requests reporting errors.",
                       "reason": "NoCredentialsFailing",
                       "status": "False",
                       "type": "Failing"
                   },
                   {
                       "lastTransitionTime": "2019-03-01T23:52:23Z",
                       "message": "4 of 4 credentials requests provisioned and reconciled.",
                       "reason": "ReconcilingComplete",
                       "status": "False",
                       "type": "Progressing"
                   },
                   {
                       "lastTransitionTime": "2019-03-01T23:38:30Z",
                       "status": "True",
                       "type": "Available"
                   }
               ],
               "extension": null,
               "version": ""
           }
       },
```

And the secret that cluster-image-registry-operator is waiting on was created at 2019-03-01T23:52:18Z 

```
        {
            "apiVersion": "v1",
            "data": {
                "aws_access_key_id": "QUtJQUpZTkVTVktaNEtNNk9MT1E=",
                "aws_secret_access_key": "WHp6QmpXd0E4SW5PTzZNTTZHV1VMU0cvTnc2WmgrSlVGKzhlemJFTg=="
            },
            "kind": "Secret",
            "metadata": {
                "annotations": {
                    "cloudcredential.openshift.io/aws-policy-last-applied": "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Action\":[\"s3:CreateBucket\",\"s3:DeleteBucket\",\"s3:PutBucketTagging\",\"s3:GetBucketTagging\",\"s3:PutEncryptionConfiguration\",\"s3:GetEncryptionConfiguration\",\"s3:PutLifecycleConfiguration\",\"s3:GetLifecycleConfiguration\",\"s3:GetBucketLocation\",\"s3:ListBucket\",\"s3:HeadBucket\",\"s3:GetObject\",\"s3:PutObject\",\"s3:DeleteObject\",\"s3:ListBucketMultipartUploads\",\"s3:AbortMultipartUpload\"],\"Resource\":\"*\"},{\"Effect\":\"Allow\",\"Action\":[\"iam:GetUser\"],\"Resource\":\"arn:aws:iam::460538899914:user/ci-op-3xwgvjmw-1d3f3-openshift-image-registry-5n6bs\"}]}",
                    "cloudcredential.openshift.io/credentials-request": "openshift-cloud-credential-operator/openshift-image-registry"
                },
                "creationTimestamp": "2019-03-01T23:52:18Z",
                "name": "installer-cloud-credentials",
                "namespace": "openshift-image-registry",
                "resourceVersion": "20249",
                "selfLink": "/api/v1/namespaces/openshift-image-registry/secrets/installer-cloud-credentials",
                "uid": "0c578f99-3c7d-11e9-8519-0ed54486e940"
            },
            "type": "Opaque"
        },
```
from https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1348/pull-ci-openshift-installer-master-e2e-aws/4334/artifacts/e2e-aws/secrets.json

And the installer reported failure to initialize at 2019-03-02T00:30:02Z 

from https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1348/pull-ci-openshift-installer-master-e2e-aws/4334/artifacts/e2e-aws/installer/.openshift_install.log
```
time="2019-03-02T00:30:02Z" level=fatal msg="failed to initialize the cluster: Cluster operator image-registry has not yet reported success"
```

Comment 1 Ben Parees 2019-03-02 01:29:53 UTC
Digging into the actual operator logs, the operator is reporting missing region:

E0302 00:28:33.356279       1 controller.go:222] unable to sync: unable to sync storage configuration: MissingRegion: could not find region configuration, requeuing


logs are here:
https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1348/pull-ci-openshift-installer-master-e2e-aws/4334/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-59f7c6cc56-62zrf_cluster-image-registry-operator.log.gz

So a couple issues:

1) we're bubbling up the wrong error condition
2) i don't know why we'd be missing region information

Comment 2 Ben Parees 2019-03-02 01:32:00 UTC
one possibility is that because we failed to get the secret initially:

E0301 23:51:37.805743       1 controller.go:222] unable to sync: unable to sync storage configuration: unable to get cluster minted credentials "kube-system/installer-cloud-credentials": timed out waiting for the condition, requeuing


we ended up in a bad state in terms of region configuration that we could never get out of.

anyway suffice to say the s3 config + credential management/syncing logic need to be looked into.  We should be able to handle the scenario where the s3 cred secret isn't there when we come up, and shows up later.

Comment 4 Oleg Bulatov 2019-03-04 13:09:56 UTC
https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1348/pull-ci-openshift-installer-master-e2e-aws/4334/artifacts/e2e-aws/configmaps.json:

> \nplatform:\n  aws:\n    region: us-east-1\n

The region is set in the kube-system/cluster-config-v1.

Corey, do you know what can be the reason of "unable to sync storage configuration: MissingRegion: could not find region configuration"?

Comment 5 Corey Daley 2019-03-05 15:11:00 UTC
I'm looking into this, seems like that might be coming from the aws sdk.

Comment 6 Ben Parees 2019-03-05 15:37:28 UTC
it's certainly coming from the SDK, but presumably it implies we did not configure things properly when we invoked the sdk.....

Comment 9 XiuJuan Wang 2019-03-22 03:46:11 UTC
In the latest pr https://github.com/openshift/installer/pull/1448 e2e logs, the testsuite "operator Run template e2e-aws - e2e-aws container setup" have passed.

So this bug has been fixed with https://github.com/openshift/cluster-image-registry-operator/pull/238

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/1448/pull-ci-openshift-installer-master-e2e-aws/4564/

Comment 11 errata-xmlrpc 2019-06-04 10:44:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.