Bug 1497150
| Summary: | atomic-openshift-node randomly failed on AWS due to AWS credentials not set | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Gan Huang <ghuang> |
| Component: | Installer | Assignee: | Michael Gugino <mgugino> |
| Status: | CLOSED ERRATA | QA Contact: | Gan Huang <ghuang> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.7.0 | CC: | aos-bugs, jokerman, mmccomas |
| Target Milestone: | --- | ||
| Target Release: | 3.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-28 22:13:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
PR Submitted: https://github.com/openshift/openshift-ansible/pull/5633 PR merged. Verified with openshift-ansible-3.7.0-0.147.0.git.0.2fb41ee.el7.noarch.rpm Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |
Description of problem: Installation failed on AWS while enabling cloudprovider. Dig more, found that the root cause of atomic-openshift-node failure was `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` not set in /etc/sysconfig/atomic-openshift-node. TASK [openshift_node : Abort if node failed to start] ************************** Friday 29 September 2017 09:37:13 +0000 (0:00:01.155) 0:19:38.142 ****** fatal: [ec2-54-89-99-146.compute-1.amazonaws.com]: FAILED! => {"changed": false, "failed": true, "msg": "Node failed to start please inspect the logs and try again"} Check the logs: Sep 29 05:36:46 ip-172-18-11-50.ec2.internal atomic-openshift-node[30618]: I0929 05:36:46.462338 30627 aws.go:806] Building AWS cloudprovider Sep 29 05:36:46 ip-172-18-11-50.ec2.internal atomic-openshift-node[30618]: F0929 05:36:46.464669 30627 start_node.go:141] could not init cloud provider "aws": error finding instance i-054489e66654e2cc8: "error listing AWS instances: \"NoCredentialProviders: no valid providers in chain. Deprecated. \\n\\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors\"" Sep 29 05:36:46 ip-172-18-11-50.ec2.internal systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a "log" 6048L, 975083C [root@ip-172-18-11-50 ~]# cat /etc/sysconfig/atomic-openshift-node OPTIONS=--loglevel=2 CONFIG_FILE=/etc/origin/node/node-config.yaml IMAGE_VERSION=v3.7.0 Version-Release number of the following components: ansible 2.3 openshift-ansible-3.7.0-0.128.0.git.0.89dcad2.el7.noarch.rpm How reproducible: sometimes Steps to Reproduce: 1. Trigger installation against AWS with cloudprovicer enabled. 2. 3. Actual results: See above Expected results: Additional info: This issue was randomly happen due to https://github.com/ansible/ansible/issues/24450 Introduced by https://github.com/openshift/openshift-ansible/pull/5230 that the task "Start and enable node" is executed prior to `Configure AWS Cloud Provider Settings`.