Description of problem: Performing the install in AWS in Ohio (us-east-2c) fails at the step where it tries to start up the master. It gives the following error ``` Dec 01 18:03:35 ip-172-31-47-71.us-east-2.compute.internal atomic-openshift-master[22654]: F1201 18:03:35.234574 22654 start_master.go:103] could not init cloud provider "aws": not a valid AWS zone (unknown region): us-east-2c ``` The same ansible hosts file works in us-west-1 Version-Release number of selected component (if applicable): [root@ip-172-31-13-56 ~]# ansible --version ansible 2.2.0.0 config file = /etc/ansible/ansible.cfg configured module search path = Default w/o overrides [root@ip-172-31-13-56 ~]# oc version oc v3.3.1.5 kubernetes v1.3.0+52492b4 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-31-13-56.us-west-1.compute.internal:8443 openshift v3.3.1.5 kubernetes v1.3.0+52492b4 [root@ip-172-31-13-56 ~]# rpm -q atomic-openshift-utils atomic-openshift-utils-3.3.54-1.git.0.61a1dee.el7.noarch How reproducible: All the time Steps to Reproduce: 1. Spin up instances in us-east-2c (Ohio) 2. Go through the prereq/host preparation steps 3. Install OpenShift using aws cloud provider profile Actual results: Install Fails Expected results: Install Succeeds Additional info: Ansible Host file - https://paste.fedoraproject.org/495276/41658148/
Is the region actually us-east-2c? I think that is an availability zone.
The region is actually us-east-2 ...but for some reason in the /etc/origin/cloudconfig/aws.conf it says us-east-2c
Just to be clear are you setting the 'Zone' as in https://docs.openshift.com/container-platform/3.3/install_config/configuring_aws.html ? I'd be curious to know if us-east-1c works so we could direct this bug accordingly.
I am setting it up using the ansbile hosts file; so it's whatever the installer populates that file with. I'm going to test it in us-east-1c and report to see if it works.
Just to make sure there isn't something stale in the openshift ansible facts, can you confirm which zone your host is actually in? I see from the hostname that it's in the us-east-2 region (so it probably doesn't make sense to override aws.conf with us-east-1c).
So I got similar in us-east-1 as I did in us-east-2 ``` TASK [openshift_master : Start and enable master] ****************************** FAILED - RETRYING: TASK: openshift_master : Start and enable master (1 retries left). fatal: [ip-172-31-184-203.ec2.internal]: FAILED! => { "attempts": 1, "changed": false, "failed": true } MSG: Unable to start service atomic-openshift-master: Job for atomic-openshift-master.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master.service" and "journalctl -xe" for details. ``` However starting it manually I get more info in us-east-1 ``` [root@ip-172-31-184-203 ~]# /usr/bin/openshift start master --config=${CONFIG_FILE} $OPTIONS I1202 14:37:39.053311 16045 admission.go:99] Admission plugin ProjectRequestLimit is not enabled. It will not be started. I1202 14:37:39.053335 16045 admission.go:99] Admission plugin PodNodeConstraints is not enabled. It will not be started. I1202 14:37:39.053384 16045 admission.go:99] Admission plugin RunOnceDuration is not enabled. It will not be started. I1202 14:37:39.053401 16045 admission.go:99] Admission plugin PodNodeConstraints is not enabled. It will not be started. I1202 14:37:39.053408 16045 admission.go:99] Admission plugin ClusterResourceOverride is not enabled. It will not be started. I1202 14:37:39.053421 16045 admission.go:99] Admission plugin openshift.io/ImagePolicy is not enabled. It will not be started. I1202 14:37:39.053510 16045 admission.go:99] Admission plugin BuildOverrides is not enabled. It will not be started. I1202 14:37:39.053517 16045 admission.go:99] Admission plugin AlwaysPullImages is not enabled. It will not be started. E1202 14:37:39.059925 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.ClusterPolicy: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.060093 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.ClusterPolicyBinding: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.060147 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.Policy: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.060267 16045 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to list *api.ServiceAccount: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/serviceaccounts?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host E1202 14:37:39.060273 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.PolicyBinding: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.060402 16045 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/namespace/lifecycle/admission.go:141: Failed to list *api.Namespace: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/namespaces?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host E1202 14:37:39.060404 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.Group: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.060614 16045 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list *api.LimitRange: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/limitranges?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host E1202 14:37:39.060706 16045 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to list *api.Secret: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/secrets?fieldSelector=type%3Dkubernetes.io%2Fservice-account-token&resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host E1202 14:37:39.060737 16045 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list *api.LimitRange: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/limitranges?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host E1202 14:37:39.061369 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.OAuthAccessToken: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.061405 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.User: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.061428 16045 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed to list *api.ResourceQuota: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/resourcequotas?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host F1202 14:37:39.061672 16045 start_master.go:103] could not init cloud provider "aws": error finding instance i-faa259ed: error listing AWS instances: NoCredentialProviders: no valid providers in chain ```
NoCredentialProviders seems to be relevant. Do you not see that error when you are using us-east-2c if you start the service manually? If you do see it I would wonder if this is the root cause.
Also, I'm still curious if your systems are actually in us-east-2c or a different zone in the us-east-2 region.
Let me see what us-east-2 says... This is what us-east-1 says curl http://169.254.169.254/latest/dynamic/instance-identity/document { "devpayProductCodes" : null, "accountId" : "701119495576", "availabilityZone" : "us-east-1b", "privateIp" : "172.31.184.203", "version" : "2010-08-31", "instanceId" : "i-faa259ed", "billingProducts" : [ "bp-6fa54006" ], "instanceType" : "t2.large", "pendingTime" : "2016-12-02T18:58:48Z", "architecture" : "x86_64", "imageId" : "ami-b63769a1", "kernelId" : null, "ramdiskId" : null, "region" : "us-east-1" }
Just tried it in us-east-2 again and same issue. ``` TASK [openshift_master : Start and enable master] ****************************** FAILED - RETRYING: TASK: openshift_master : Start and enable master (1 retries left). fatal: [ip-172-31-33-216.us-east-2.compute.internal]: FAILED! => { "attempts": 1, "changed": false, "failed": true } MSG: ``` Similar output ``` [ec2-user@ip-172-31-33-216 ~]$ sudo /usr/bin/openshift start master --config=${CONFIG_FILE} $OPTIONS I1202 15:52:44.982303 20002 admission.go:99] Admission plugin ProjectRequestLimit is not enabled. It will not be started. I1202 15:52:44.982329 20002 admission.go:99] Admission plugin PodNodeConstraints is not enabled. It will not be started. I1202 15:52:44.982378 20002 admission.go:99] Admission plugin RunOnceDuration is not enabled. It will not be started. I1202 15:52:44.982398 20002 admission.go:99] Admission plugin PodNodeConstraints is not enabled. It will not be started. I1202 15:52:44.982405 20002 admission.go:99] Admission plugin ClusterResourceOverride is not enabled. It will not be started. I1202 15:52:44.982419 20002 admission.go:99] Admission plugin openshift.io/ImagePolicy is not enabled. It will not be started. I1202 15:52:44.982473 20002 admission.go:99] Admission plugin BuildOverrides is not enabled. It will not be started. I1202 15:52:44.982479 20002 admission.go:99] Admission plugin AlwaysPullImages is not enabled. It will not be started. F1202 15:52:44.983933 20002 start_master.go:103] could not init cloud provider "aws": not a valid AWS zone (unknown region): us-east-2c ``` Versions again ``` [ec2-user@ip-172-31-33-216 ~]$ oc version oc v3.3.1.5 kubernetes v1.3.0+52492b4 features: Basic-Auth GSSAPI Kerberos SPNEGO [ec2-user@ip-172-31-33-216 ~]$ ansible --version ansible 2.2.0.0 config file = /etc/ansible/ansible.cfg configured module search path = Default w/o overrides [ec2-user@ip-172-31-33-216 ~]$ rpm -q atomic-openshift-utils atomic-openshift-utils-3.3.54-1.git.0.61a1dee.el7.noarch ``` Checked to see if I'm in the right place ``` [ec2-user@ip-172-31-33-216 ~]$ curl http://169.254.169.254/latest/dynamic/instance-identity//document { "devpayProductCodes" : null, "privateIp" : "172.31.33.216", "availabilityZone" : "us-east-2c", "accountId" : "701119495576", "version" : "2010-08-31", "instanceId" : "i-0f5caa105f4f0a461", "billingProducts" : [ "bp-6fa54006" ], "instanceType" : "t2.large", "pendingTime" : "2016-12-02T20:15:40Z", "imageId" : "ami-0932686c", "architecture" : "x86_64", "kernelId" : null, "ramdiskId" : null, "region" : "us-east-2" } ```
Needs to backport https://github.com/kubernetes/kubernetes/pull/35013 and possibly others, re-assigning to kube team.
Re-test this bug with the latest build - atomic-openshift-3.4.0.39-1.git.0.5f32f06.el7.x86_64, still failed. # journalctl -f -u atomic-openshift-master <--snip--> Jan 16 00:38:34 ip-172-31-37-46.us-east-2.compute.internal atomic-openshift-master[23090]: I0116 00:38:34.845004 23090 aws.go:745] Building AWS cloudprovider Jan 16 00:38:34 ip-172-31-37-46.us-east-2.compute.internal atomic-openshift-master[23090]: F0116 00:38:34.845068 23090 start_master.go:108] could not init cloud provider "aws": not a valid AWS zone (unknown region): us-east-2c <--snip--> Seem like the PR is not merged into 3.4 rpm package.
Verified this bug with atomic-openshift-3.4.1.0-1.git.0.9e8d48b.el7.x86_64, and PASS. Cluster with cloudprovider enabled is set up successfully in AWS Ohio region.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0218