Bug 1400746
Summary: | [3.4] Installing on AWS in Ohio (us-east-2c) fails | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Christian Hernandez <chernand> | |
Component: | Node | Assignee: | Derek Carr <decarr> | |
Status: | CLOSED ERRATA | QA Contact: | Johnny Liu <jialiu> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 3.3.1 | CC: | aos-bugs, bleanhar, chernand, decarr, jokerman, mmccomas | |
Target Milestone: | --- | |||
Target Release: | 3.4.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
The us-east-2c, eu-west-2, ap-south-1, ca-central-1 AWS regions have been added to the product enabling cloud provider support for those regions.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1406889 (view as bug list) | Environment: | ||
Last Closed: | 2017-01-31 20:19:15 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1406889 |
Description
Christian Hernandez
2016-12-02 01:21:38 UTC
Is the region actually us-east-2c? I think that is an availability zone. The region is actually us-east-2 ...but for some reason in the /etc/origin/cloudconfig/aws.conf it says us-east-2c Just to be clear are you setting the 'Zone' as in https://docs.openshift.com/container-platform/3.3/install_config/configuring_aws.html ? I'd be curious to know if us-east-1c works so we could direct this bug accordingly. I am setting it up using the ansbile hosts file; so it's whatever the installer populates that file with. I'm going to test it in us-east-1c and report to see if it works. Just to make sure there isn't something stale in the openshift ansible facts, can you confirm which zone your host is actually in? I see from the hostname that it's in the us-east-2 region (so it probably doesn't make sense to override aws.conf with us-east-1c). So I got similar in us-east-1 as I did in us-east-2 ``` TASK [openshift_master : Start and enable master] ****************************** FAILED - RETRYING: TASK: openshift_master : Start and enable master (1 retries left). fatal: [ip-172-31-184-203.ec2.internal]: FAILED! => { "attempts": 1, "changed": false, "failed": true } MSG: Unable to start service atomic-openshift-master: Job for atomic-openshift-master.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master.service" and "journalctl -xe" for details. ``` However starting it manually I get more info in us-east-1 ``` [root@ip-172-31-184-203 ~]# /usr/bin/openshift start master --config=${CONFIG_FILE} $OPTIONS I1202 14:37:39.053311 16045 admission.go:99] Admission plugin ProjectRequestLimit is not enabled. It will not be started. I1202 14:37:39.053335 16045 admission.go:99] Admission plugin PodNodeConstraints is not enabled. It will not be started. I1202 14:37:39.053384 16045 admission.go:99] Admission plugin RunOnceDuration is not enabled. It will not be started. I1202 14:37:39.053401 16045 admission.go:99] Admission plugin PodNodeConstraints is not enabled. It will not be started. I1202 14:37:39.053408 16045 admission.go:99] Admission plugin ClusterResourceOverride is not enabled. It will not be started. I1202 14:37:39.053421 16045 admission.go:99] Admission plugin openshift.io/ImagePolicy is not enabled. It will not be started. I1202 14:37:39.053510 16045 admission.go:99] Admission plugin BuildOverrides is not enabled. It will not be started. I1202 14:37:39.053517 16045 admission.go:99] Admission plugin AlwaysPullImages is not enabled. It will not be started. E1202 14:37:39.059925 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.ClusterPolicy: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.060093 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.ClusterPolicyBinding: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.060147 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.Policy: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.060267 16045 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to list *api.ServiceAccount: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/serviceaccounts?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host E1202 14:37:39.060273 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.PolicyBinding: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.060402 16045 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/namespace/lifecycle/admission.go:141: Failed to list *api.Namespace: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/namespaces?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host E1202 14:37:39.060404 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.Group: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.060614 16045 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list *api.LimitRange: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/limitranges?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host E1202 14:37:39.060706 16045 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to list *api.Secret: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/secrets?fieldSelector=type%3Dkubernetes.io%2Fservice-account-token&resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host E1202 14:37:39.060737 16045 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list *api.LimitRange: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/limitranges?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host E1202 14:37:39.061369 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.OAuthAccessToken: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.061405 16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.User: client: etcd cluster is unavailable or misconfigured E1202 14:37:39.061428 16045 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed to list *api.ResourceQuota: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/resourcequotas?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host F1202 14:37:39.061672 16045 start_master.go:103] could not init cloud provider "aws": error finding instance i-faa259ed: error listing AWS instances: NoCredentialProviders: no valid providers in chain ``` NoCredentialProviders seems to be relevant. Do you not see that error when you are using us-east-2c if you start the service manually? If you do see it I would wonder if this is the root cause. Also, I'm still curious if your systems are actually in us-east-2c or a different zone in the us-east-2 region. Let me see what us-east-2 says... This is what us-east-1 says curl http://169.254.169.254/latest/dynamic/instance-identity/document { "devpayProductCodes" : null, "accountId" : "701119495576", "availabilityZone" : "us-east-1b", "privateIp" : "172.31.184.203", "version" : "2010-08-31", "instanceId" : "i-faa259ed", "billingProducts" : [ "bp-6fa54006" ], "instanceType" : "t2.large", "pendingTime" : "2016-12-02T18:58:48Z", "architecture" : "x86_64", "imageId" : "ami-b63769a1", "kernelId" : null, "ramdiskId" : null, "region" : "us-east-1" } Just tried it in us-east-2 again and same issue. ``` TASK [openshift_master : Start and enable master] ****************************** FAILED - RETRYING: TASK: openshift_master : Start and enable master (1 retries left). fatal: [ip-172-31-33-216.us-east-2.compute.internal]: FAILED! => { "attempts": 1, "changed": false, "failed": true } MSG: ``` Similar output ``` [ec2-user@ip-172-31-33-216 ~]$ sudo /usr/bin/openshift start master --config=${CONFIG_FILE} $OPTIONS I1202 15:52:44.982303 20002 admission.go:99] Admission plugin ProjectRequestLimit is not enabled. It will not be started. I1202 15:52:44.982329 20002 admission.go:99] Admission plugin PodNodeConstraints is not enabled. It will not be started. I1202 15:52:44.982378 20002 admission.go:99] Admission plugin RunOnceDuration is not enabled. It will not be started. I1202 15:52:44.982398 20002 admission.go:99] Admission plugin PodNodeConstraints is not enabled. It will not be started. I1202 15:52:44.982405 20002 admission.go:99] Admission plugin ClusterResourceOverride is not enabled. It will not be started. I1202 15:52:44.982419 20002 admission.go:99] Admission plugin openshift.io/ImagePolicy is not enabled. It will not be started. I1202 15:52:44.982473 20002 admission.go:99] Admission plugin BuildOverrides is not enabled. It will not be started. I1202 15:52:44.982479 20002 admission.go:99] Admission plugin AlwaysPullImages is not enabled. It will not be started. F1202 15:52:44.983933 20002 start_master.go:103] could not init cloud provider "aws": not a valid AWS zone (unknown region): us-east-2c ``` Versions again ``` [ec2-user@ip-172-31-33-216 ~]$ oc version oc v3.3.1.5 kubernetes v1.3.0+52492b4 features: Basic-Auth GSSAPI Kerberos SPNEGO [ec2-user@ip-172-31-33-216 ~]$ ansible --version ansible 2.2.0.0 config file = /etc/ansible/ansible.cfg configured module search path = Default w/o overrides [ec2-user@ip-172-31-33-216 ~]$ rpm -q atomic-openshift-utils atomic-openshift-utils-3.3.54-1.git.0.61a1dee.el7.noarch ``` Checked to see if I'm in the right place ``` [ec2-user@ip-172-31-33-216 ~]$ curl http://169.254.169.254/latest/dynamic/instance-identity//document { "devpayProductCodes" : null, "privateIp" : "172.31.33.216", "availabilityZone" : "us-east-2c", "accountId" : "701119495576", "version" : "2010-08-31", "instanceId" : "i-0f5caa105f4f0a461", "billingProducts" : [ "bp-6fa54006" ], "instanceType" : "t2.large", "pendingTime" : "2016-12-02T20:15:40Z", "imageId" : "ami-0932686c", "architecture" : "x86_64", "kernelId" : null, "ramdiskId" : null, "region" : "us-east-2" } ``` Needs to backport https://github.com/kubernetes/kubernetes/pull/35013 and possibly others, re-assigning to kube team. Re-test this bug with the latest build - atomic-openshift-3.4.0.39-1.git.0.5f32f06.el7.x86_64, still failed. # journalctl -f -u atomic-openshift-master <--snip--> Jan 16 00:38:34 ip-172-31-37-46.us-east-2.compute.internal atomic-openshift-master[23090]: I0116 00:38:34.845004 23090 aws.go:745] Building AWS cloudprovider Jan 16 00:38:34 ip-172-31-37-46.us-east-2.compute.internal atomic-openshift-master[23090]: F0116 00:38:34.845068 23090 start_master.go:108] could not init cloud provider "aws": not a valid AWS zone (unknown region): us-east-2c <--snip--> Seem like the PR is not merged into 3.4 rpm package. Verified this bug with atomic-openshift-3.4.1.0-1.git.0.9e8d48b.el7.x86_64, and PASS. Cluster with cloudprovider enabled is set up successfully in AWS Ohio region. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0218 |