Bug 1683648
Summary: | Installation in AWS environment fails with 'level=fatal msg="failed to initialize the cluster: Cluster operator openshift-cloud-credential-operator is reporting a failure: 4 of 4 credentials requests are failing to sync."' | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | David Caldwell <dcaldwel> |
Component: | Cloud Compute | Assignee: | Joel Diaz <jdiaz> |
Status: | CLOSED DUPLICATE | QA Contact: | Jianwei Hou <jhou> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.1.0 | CC: | aos-bugs, aos-cloud, dcaldwel, dgoodwin, jdiaz, jokerman, mmccomas, ocasalsa, rhowe, sperezto, sponnaga, suchaudh, wking |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-03-15 23:48:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1664187 |
Comment 6
Ryan Howe
2019-03-04 14:27:27 UTC
What are the annotations on the secret kube-system/aws-creds? More interestingly, what AWS IAM permissions was the cluster installed with? I suppose it depends on exactly how the IAM permissions are organized/granted. You should be able to get your IAM permissions with something like this: aws iam get-user | jq -r .User.UserName (using the IAM creds used for cluster installation) aws iam list-groups-for-user --user-name <USER_NAME> For each group above: aws iam list-group-policies --group-name <GROUP_NAME> aws iam list-attached-group-policies --group-name <GROUP_NAME> For each policy: aws iam get-group-policy --group-name <GROUP_NAME> --policy-name <POLICY_NAME> Each of those 'get-group-policy' commands should dump the actual policies. The installer can/does change permissions from release-to-release. Can't really see the IAM permissions since you don't have GetGroupPolicy permission. Can you grab all the CredentialsRequest objects 'oc get credentialsrequest --all-namespaces -o yaml' And the logs from the cloud-credential-operator: oc get pods -n openshift-cloud-credential-operator then for that pod get the logs: oc logs -n openshift-cloud-credential-operator POD_NAME_FROM_PREV_COMMAND If this issue is fixed in 0.14 installer, maybe this BZ should be closed? If I had to guess, this might be a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1685729 Is the original issue reproducible with 0.14? We still see something like this in CI: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/127/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.conditions[] | select(.type == "Failing").message' Cluster operator openshift-cloud-credential-operator is reporting a failure: 2 of 4 credentials requests are failing to sync. $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/127/artifacts/e2e-aws-upgrade/pods/openshift-cloud-credential-operator_cloud-credential-operator-5b5b6bd77-kmbs7_manager.log.gz | gunzip | grep -2 level=error | tail -- time="2019-03-13T22:44:54Z" level=debug msg="creating read AWS client" actuator=aws cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-cloud-credential-operator/cloud-credential-operator-iam-ro-creds time="2019-03-13T22:45:56Z" level=info msg="validating cloud cred secret" controller=secretannotator time="2019-03-13T22:47:34Z" level=error msg="error while validating cloud credentials: failed checking create cloud creds: error querying current username: RequestError: send request failed\ncaused by: Post https://iam.amazonaws.com/: dial tcp: lookup iam.amazonaws.com on 172.30.0.10:53: read udp 10.128.0.65:55377->172.30.0.10:53: i/o timeout" controller=secretannotator time="2019-03-13T22:48:34Z" level=error msg="error getting user: {\n\n}" actuator=aws cr=openshift-cloud-credential-operator/openshift-ingress error="RequestError: send request failed\ncaused by: Post https://iam.amazonaws.com/: dial tcp: i/o timeout" time="2019-03-13T22:48:34Z" level=error msg="error determining whether a credentials update is needed" actuator=aws cr=openshift-cloud-credential-operator/openshift-ingress error="unable to read info for username {\n\n}: RequestError: send request failed\ncaused by: Post https://iam.amazonaws.com/: dial tcp: i/o timeout" time="2019-03-13T22:48:34Z" level=error msg="error syncing credentials: <nil>" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2019-03-13T22:48:34Z" level=error msg="errored with condition: CredentialsProvisionFailure" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2019-03-13T22:48:34Z" level=debug msg="updating credentials request status" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2019-03-13T22:48:34Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials Dunno what's up with that; maybe networking died? I'm going to close this as a dup of bug 1687881. If you can reproduce this issue since that bug was fixed, comment here with logs and we can re-open. *** This bug has been marked as a duplicate of bug 1687881 *** |