1400746 – [3.4] Installing on AWS in Ohio (us-east-2c) fails

Bug 1400746 - [3.4] Installing on AWS in Ohio (us-east-2c) fails

Summary: [3.4] Installing on AWS in Ohio (us-east-2c) fails

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.4.z
Assignee:	Derek Carr
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1406889
TreeView+	depends on / blocked

Reported:	2016-12-02 01:21 UTC by Christian Hernandez
Modified:	2017-01-31 20:19 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	The us-east-2c, eu-west-2, ap-south-1, ca-central-1 AWS regions have been added to the product enabling cloud provider support for those regions.
Clone Of:
Clones:	1406889 (view as bug list)
Environment:
Last Closed:	2017-01-31 20:19:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:0218	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.4.1.2 bug fix update	2017-02-01 01:18:20 UTC

Description Christian Hernandez 2016-12-02 01:21:38 UTC

Description of problem:

Performing the install in AWS in Ohio (us-east-2c) fails at the step where it tries to start up the master. It gives the following error

```
Dec 01 18:03:35 ip-172-31-47-71.us-east-2.compute.internal atomic-openshift-master[22654]: F1201 18:03:35.234574   22654 start_master.go:103] could not init cloud provider "aws": not a valid AWS zone (unknown region): us-east-2c
```

The same ansible hosts file works in us-west-1

Version-Release number of selected component (if applicable):

[root@ip-172-31-13-56 ~]# ansible --version
ansible 2.2.0.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides

[root@ip-172-31-13-56 ~]# oc version
oc v3.3.1.5
kubernetes v1.3.0+52492b4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-31-13-56.us-west-1.compute.internal:8443
openshift v3.3.1.5
kubernetes v1.3.0+52492b4

[root@ip-172-31-13-56 ~]# rpm -q atomic-openshift-utils
atomic-openshift-utils-3.3.54-1.git.0.61a1dee.el7.noarch


How reproducible:

All the time

Steps to Reproduce:
1. Spin up instances in us-east-2c (Ohio)
2. Go through the prereq/host preparation steps
3. Install OpenShift using aws cloud provider profile

Actual results:

Install Fails

Expected results:

Install Succeeds

Additional info:

Ansible Host file - https://paste.fedoraproject.org/495276/41658148/

Comment 1 Brenton Leanhardt 2016-12-02 14:03:00 UTC

Is the region actually us-east-2c?  I think that is an availability zone.

Comment 2 Christian Hernandez 2016-12-02 17:04:48 UTC

The region is actually us-east-2 ...but for some reason in the /etc/origin/cloudconfig/aws.conf it says us-east-2c

Comment 3 Brenton Leanhardt 2016-12-02 18:08:29 UTC

Just to be clear are you setting the 'Zone' as in https://docs.openshift.com/container-platform/3.3/install_config/configuring_aws.html ?

I'd be curious to know if us-east-1c works so we could direct this bug accordingly.

Comment 4 Christian Hernandez 2016-12-02 18:20:55 UTC

I am setting it up using the ansbile hosts file; so it's whatever the installer populates that file with.

I'm going to test it in us-east-1c and report to see if it works.

Comment 5 Brenton Leanhardt 2016-12-02 18:51:05 UTC

Just to make sure there isn't something stale in the openshift ansible facts, can you confirm which zone your host is actually in?  I see from the hostname that it's in the us-east-2 region (so it probably doesn't make sense to override aws.conf with us-east-1c).

Comment 6 Christian Hernandez 2016-12-02 19:39:28 UTC

So I got similar in us-east-1 as I did in us-east-2

```
TASK [openshift_master : Start and enable master] ******************************
FAILED - RETRYING: TASK: openshift_master : Start and enable master (1 retries left).
fatal: [ip-172-31-184-203.ec2.internal]: FAILED! => {
    "attempts": 1, 
    "changed": false, 
    "failed": true
}

MSG:

Unable to start service atomic-openshift-master: Job for atomic-openshift-master.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master.service" and "journalctl -xe" for details.

```

However starting it manually I get more info in us-east-1

```
[root@ip-172-31-184-203 ~]# /usr/bin/openshift start master --config=${CONFIG_FILE} $OPTIONS
I1202 14:37:39.053311   16045 admission.go:99] Admission plugin ProjectRequestLimit is not enabled.  It will not be started.
I1202 14:37:39.053335   16045 admission.go:99] Admission plugin PodNodeConstraints is not enabled.  It will not be started.
I1202 14:37:39.053384   16045 admission.go:99] Admission plugin RunOnceDuration is not enabled.  It will not be started.
I1202 14:37:39.053401   16045 admission.go:99] Admission plugin PodNodeConstraints is not enabled.  It will not be started.
I1202 14:37:39.053408   16045 admission.go:99] Admission plugin ClusterResourceOverride is not enabled.  It will not be started.
I1202 14:37:39.053421   16045 admission.go:99] Admission plugin openshift.io/ImagePolicy is not enabled.  It will not be started.
I1202 14:37:39.053510   16045 admission.go:99] Admission plugin BuildOverrides is not enabled.  It will not be started.
I1202 14:37:39.053517   16045 admission.go:99] Admission plugin AlwaysPullImages is not enabled.  It will not be started.
E1202 14:37:39.059925   16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.ClusterPolicy: client: etcd cluster is unavailable or misconfigured
E1202 14:37:39.060093   16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.ClusterPolicyBinding: client: etcd cluster is unavailable or misconfigured
E1202 14:37:39.060147   16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.Policy: client: etcd cluster is unavailable or misconfigured
E1202 14:37:39.060267   16045 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to list *api.ServiceAccount: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/serviceaccounts?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host
E1202 14:37:39.060273   16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.PolicyBinding: client: etcd cluster is unavailable or misconfigured
E1202 14:37:39.060402   16045 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/namespace/lifecycle/admission.go:141: Failed to list *api.Namespace: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/namespaces?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host
E1202 14:37:39.060404   16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.Group: client: etcd cluster is unavailable or misconfigured
E1202 14:37:39.060614   16045 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list *api.LimitRange: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/limitranges?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host
E1202 14:37:39.060706   16045 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to list *api.Secret: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/secrets?fieldSelector=type%3Dkubernetes.io%2Fservice-account-token&resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host
E1202 14:37:39.060737   16045 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list *api.LimitRange: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/limitranges?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host
E1202 14:37:39.061369   16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.OAuthAccessToken: client: etcd cluster is unavailable or misconfigured
E1202 14:37:39.061405   16045 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.User: client: etcd cluster is unavailable or misconfigured
E1202 14:37:39.061428   16045 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed to list *api.ResourceQuota: Get https://ip-172-31-184-203.ec2.internal:8443/api/v1/resourcequotas?resourceVersion=0: dial tcp: lookup ip-172-31-184-203.ec2.internal on 172.31.0.2:53: no such host
F1202 14:37:39.061672   16045 start_master.go:103] could not init cloud provider "aws": error finding instance i-faa259ed: error listing AWS instances: NoCredentialProviders: no valid providers in chain
```

Comment 7 Brenton Leanhardt 2016-12-02 19:56:03 UTC

NoCredentialProviders seems to be relevant.  Do you not see that error when you are using us-east-2c if you start the service manually?  If you do see it I would wonder if this is the root cause.

Comment 8 Brenton Leanhardt 2016-12-02 20:04:00 UTC

Also, I'm still curious if your systems are actually in us-east-2c or a different zone in the us-east-2 region.

Comment 9 Christian Hernandez 2016-12-02 20:13:27 UTC

Let me see what us-east-2 says...

This is what us-east-1 says

curl http://169.254.169.254/latest/dynamic/instance-identity/document
{
  "devpayProductCodes" : null,
  "accountId" : "701119495576",
  "availabilityZone" : "us-east-1b",
  "privateIp" : "172.31.184.203",
  "version" : "2010-08-31",
  "instanceId" : "i-faa259ed",
  "billingProducts" : [ "bp-6fa54006" ],
  "instanceType" : "t2.large",
  "pendingTime" : "2016-12-02T18:58:48Z",
  "architecture" : "x86_64",
  "imageId" : "ami-b63769a1",
  "kernelId" : null,
  "ramdiskId" : null,
  "region" : "us-east-1"
}

Comment 10 Christian Hernandez 2016-12-02 20:55:41 UTC

Just tried it in us-east-2 again and same issue.

```
TASK [openshift_master : Start and enable master] ******************************
FAILED - RETRYING: TASK: openshift_master : Start and enable master (1 retries left).
fatal: [ip-172-31-33-216.us-east-2.compute.internal]: FAILED! => {
    "attempts": 1, 
    "changed": false, 
    "failed": true
}

MSG:
```

Similar output

```
[ec2-user@ip-172-31-33-216 ~]$ sudo /usr/bin/openshift start master --config=${CONFIG_FILE} $OPTIONS
I1202 15:52:44.982303   20002 admission.go:99] Admission plugin ProjectRequestLimit is not enabled.  It will not be started.
I1202 15:52:44.982329   20002 admission.go:99] Admission plugin PodNodeConstraints is not enabled.  It will not be started.
I1202 15:52:44.982378   20002 admission.go:99] Admission plugin RunOnceDuration is not enabled.  It will not be started.
I1202 15:52:44.982398   20002 admission.go:99] Admission plugin PodNodeConstraints is not enabled.  It will not be started.
I1202 15:52:44.982405   20002 admission.go:99] Admission plugin ClusterResourceOverride is not enabled.  It will not be started.
I1202 15:52:44.982419   20002 admission.go:99] Admission plugin openshift.io/ImagePolicy is not enabled.  It will not be started.
I1202 15:52:44.982473   20002 admission.go:99] Admission plugin BuildOverrides is not enabled.  It will not be started.
I1202 15:52:44.982479   20002 admission.go:99] Admission plugin AlwaysPullImages is not enabled.  It will not be started.
F1202 15:52:44.983933   20002 start_master.go:103] could not init cloud provider "aws": not a valid AWS zone (unknown region): us-east-2c
```
Versions again

```
[ec2-user@ip-172-31-33-216 ~]$ oc version
oc v3.3.1.5
kubernetes v1.3.0+52492b4
features: Basic-Auth GSSAPI Kerberos SPNEGO
[ec2-user@ip-172-31-33-216 ~]$ ansible --version
ansible 2.2.0.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
[ec2-user@ip-172-31-33-216 ~]$ rpm -q atomic-openshift-utils
atomic-openshift-utils-3.3.54-1.git.0.61a1dee.el7.noarch
```

Checked to see if I'm in the right place

```
[ec2-user@ip-172-31-33-216 ~]$ curl http://169.254.169.254/latest/dynamic/instance-identity//document
{
  "devpayProductCodes" : null,
  "privateIp" : "172.31.33.216",
  "availabilityZone" : "us-east-2c",
  "accountId" : "701119495576",
  "version" : "2010-08-31",
  "instanceId" : "i-0f5caa105f4f0a461",
  "billingProducts" : [ "bp-6fa54006" ],
  "instanceType" : "t2.large",
  "pendingTime" : "2016-12-02T20:15:40Z",
  "imageId" : "ami-0932686c",
  "architecture" : "x86_64",
  "kernelId" : null,
  "ramdiskId" : null,
  "region" : "us-east-2"
}
```

Comment 11 Scott Dodson 2016-12-05 15:33:38 UTC

Needs to backport https://github.com/kubernetes/kubernetes/pull/35013 and possibly others, re-assigning to kube team.

Comment 18 Johnny Liu 2017-01-16 05:42:46 UTC

Re-test this bug with the latest build - atomic-openshift-3.4.0.39-1.git.0.5f32f06.el7.x86_64, still failed.


# journalctl -f  -u atomic-openshift-master
<--snip-->
Jan 16 00:38:34 ip-172-31-37-46.us-east-2.compute.internal atomic-openshift-master[23090]: I0116 00:38:34.845004   23090 aws.go:745] Building AWS cloudprovider
Jan 16 00:38:34 ip-172-31-37-46.us-east-2.compute.internal atomic-openshift-master[23090]: F0116 00:38:34.845068   23090 start_master.go:108] could not init cloud provider "aws": not a valid AWS zone (unknown region): us-east-2c
<--snip-->

Seem like the PR is not merged into 3.4 rpm package.

Comment 20 Johnny Liu 2017-01-22 06:08:14 UTC

Verified this bug with atomic-openshift-3.4.1.0-1.git.0.9e8d48b.el7.x86_64, and PASS.

Cluster with cloudprovider enabled is set up successfully in AWS Ohio region.

Comment 22 errata-xmlrpc 2017-01-31 20:19:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0218

Note You need to log in before you can comment on or make changes to this bug.