The root of the issue I believe is the lack of permissions with the AWS account. Error ends up coming from here: https://github.com/openshift/cloud-credential-operator/blob/release-4.0/pkg/controller/credentialsrequest/status.go#L108-L143 Due to the lack of iam authorization this cloud-credential-operator fails to create the credentials. The bug here more with the operator failing due to the permission issues. It should just pass the original creds through in this case (with the install-time warning for this less-secure approach), instead of failing later.
What are the annotations on the secret kube-system/aws-creds? More interestingly, what AWS IAM permissions was the cluster installed with?
I suppose it depends on exactly how the IAM permissions are organized/granted. You should be able to get your IAM permissions with something like this: aws iam get-user | jq -r .User.UserName (using the IAM creds used for cluster installation) aws iam list-groups-for-user --user-name <USER_NAME> For each group above: aws iam list-group-policies --group-name <GROUP_NAME> aws iam list-attached-group-policies --group-name <GROUP_NAME> For each policy: aws iam get-group-policy --group-name <GROUP_NAME> --policy-name <POLICY_NAME> Each of those 'get-group-policy' commands should dump the actual policies.
The installer can/does change permissions from release-to-release. Can't really see the IAM permissions since you don't have GetGroupPolicy permission. Can you grab all the CredentialsRequest objects 'oc get credentialsrequest --all-namespaces -o yaml' And the logs from the cloud-credential-operator: oc get pods -n openshift-cloud-credential-operator then for that pod get the logs: oc logs -n openshift-cloud-credential-operator POD_NAME_FROM_PREV_COMMAND
If this issue is fixed in 0.14 installer, maybe this BZ should be closed?
If I had to guess, this might be a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1685729 Is the original issue reproducible with 0.14?
We still see something like this in CI: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/127/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.conditions[] | select(.type == "Failing").message' Cluster operator openshift-cloud-credential-operator is reporting a failure: 2 of 4 credentials requests are failing to sync. $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/127/artifacts/e2e-aws-upgrade/pods/openshift-cloud-credential-operator_cloud-credential-operator-5b5b6bd77-kmbs7_manager.log.gz | gunzip | grep -2 level=error | tail -- time="2019-03-13T22:44:54Z" level=debug msg="creating read AWS client" actuator=aws cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-cloud-credential-operator/cloud-credential-operator-iam-ro-creds time="2019-03-13T22:45:56Z" level=info msg="validating cloud cred secret" controller=secretannotator time="2019-03-13T22:47:34Z" level=error msg="error while validating cloud credentials: failed checking create cloud creds: error querying current username: RequestError: send request failed\ncaused by: Post https://iam.amazonaws.com/: dial tcp: lookup iam.amazonaws.com on 172.30.0.10:53: read udp 10.128.0.65:55377->172.30.0.10:53: i/o timeout" controller=secretannotator time="2019-03-13T22:48:34Z" level=error msg="error getting user: {\n\n}" actuator=aws cr=openshift-cloud-credential-operator/openshift-ingress error="RequestError: send request failed\ncaused by: Post https://iam.amazonaws.com/: dial tcp: i/o timeout" time="2019-03-13T22:48:34Z" level=error msg="error determining whether a credentials update is needed" actuator=aws cr=openshift-cloud-credential-operator/openshift-ingress error="unable to read info for username {\n\n}: RequestError: send request failed\ncaused by: Post https://iam.amazonaws.com/: dial tcp: i/o timeout" time="2019-03-13T22:48:34Z" level=error msg="error syncing credentials: <nil>" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2019-03-13T22:48:34Z" level=error msg="errored with condition: CredentialsProvisionFailure" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2019-03-13T22:48:34Z" level=debug msg="updating credentials request status" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2019-03-13T22:48:34Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials Dunno what's up with that; maybe networking died?
I'm going to close this as a dup of bug 1687881. If you can reproduce this issue since that bug was fixed, comment here with logs and we can re-open. *** This bug has been marked as a duplicate of bug 1687881 ***