1507083 – openshift_master_etcd_hosts list get wrong in rpm install.

Bug 1507083 - openshift_master_etcd_hosts list get wrong in rpm install.

Summary: openshift_master_etcd_hosts list get wrong in rpm install.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Andrew Butcher
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-27 16:00 UTC by Johnny Liu
Modified:	2017-11-28 22:20 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-11-28 22:20:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
installation log (4.28 MB, text/plain) 2017-10-27 16:08 UTC, Johnny Liu	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:3188	0	normal	SHIPPED_LIVE	Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update	2017-11-29 02:34:54 UTC

Description Johnny Liu 2017-10-27 16:00:25 UTC

Description of problem:
TASK [set_fact] ****************************************************************
Friday 27 October 2017  15:11:38 +0000 (0:00:00.069)       0:10:28.609 ******** 
ok: [qe-jialiu-jijm-master-etcd-1.1027-qtx.qe.rhcloud.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["qe-jialiu-jijm-master-etcd-1", "qe-jialiu-jijm-master-etcd-1", "qe-jialiu-jijm-master-etcd-1"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}
ok: [qe-jialiu-jijm-master-etcd-2.1027-qtx.qe.rhcloud.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["qe-jialiu-jijm-master-etcd-1", "qe-jialiu-jijm-master-etcd-1", "qe-jialiu-jijm-master-etcd-1"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}
ok: [qe-jialiu-jijm-master-etcd-3.1027-qtx.qe.rhcloud.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["qe-jialiu-jijm-master-etcd-1", "qe-jialiu-jijm-master-etcd-1", "qe-jialiu-jijm-master-etcd-1"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}

That leads to master is connecting wrong etcd cluster.
etcdClientInfo:
  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
  - https://qe-jialiu-jijm-master-etcd-1:2379
  - https://qe-jialiu-jijm-master-etcd-1:2379
  - https://qe-jialiu-jijm-master-etcd-1:2379

I try another install on AWS, no such issues.

Is that because ectd hostnames are similar?

Version-Release number of the following components:
openshift-ansible-3.7.0-0.178.1.git.0.43f8486.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. 
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Johnny Liu 2017-10-27 16:07:10 UTC

Pls get inventory host file and installation log from attachment.

Comment 2 Johnny Liu 2017-10-27 16:08:42 UTC

Created attachment 1344385 [details]
installation log

Comment 3 Johnny Liu 2017-10-27 16:13:49 UTC

okay, seem like I reproduce it on my another rpm install on AWS, seem like it is irrelevant to hostname.

containerized install is passed:
ok: [ec2-34-227-98-143.compute-1.amazonaws.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["ip-172-18-9-222.ec2.internal", "ip-172-18-4-165.ec2.internal", "ip-172-18-12-109.ec2.internal"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}
ok: [ec2-52-90-152-31.compute-1.amazonaws.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["ip-172-18-9-222.ec2.internal", "ip-172-18-4-165.ec2.internal", "ip-172-18-12-109.ec2.internal"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}
ok: [ec2-52-86-178-3.compute-1.amazonaws.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["ip-172-18-9-222.ec2.internal", "ip-172-18-4-165.ec2.internal", "ip-172-18-12-109.ec2.internal"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}


rpm install is failed:
TASK [set_fact] ****************************************************************
Friday 27 October 2017  15:53:42 +0000 (0:00:00.076)       0:08:26.101 ******** 
ok: [ec2-52-202-232-150.compute-1.amazonaws.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["ip-172-18-14-244.ec2.internal", "ip-172-18-14-244.ec2.internal", "ip-172-18-14-244.ec2.internal"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}
ok: [ec2-34-229-115-245.compute-1.amazonaws.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["ip-172-18-14-244.ec2.internal", "ip-172-18-14-244.ec2.internal", "ip-172-18-14-244.ec2.internal"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}
ok: [ec2-52-90-116-202.compute-1.amazonaws.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["ip-172-18-14-244.ec2.internal", "ip-172-18-14-244.ec2.internal", "ip-172-18-14-244.ec2.internal"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}

Comment 4 Johnny Liu 2017-10-28 08:15:40 UTC

Once this issues happened, other masters (not the 1st one) api service would fail to start.

Oct 28 04:09:09 qe-jialiu-xlxf-master-etcd-3 atomic-openshift-master-api[5136]: F1028 04:09:09.247495    5136 hooks.go:133] PostStartHook "oauth.openshift.io-EnsureBootstrapOAuthClients" failed: Post https://qe-jialiu-xlxf-master-etcd-3:8443/apis/oauth.openshift.io/v1/oauthclients: x509: certificate is valid for kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, openshift, openshift.default, openshift.default.svc, openshift.default.svc.cluster.local, qe-jialiu-xlxf-lb-1.1028-v-k.qe.rhcloud.com, qe-jialiu-xlxf-master-etcd-1, qe-jialiu-xlxf-master-etcd-1.1028-v-k.qe.rhcloud.com, 10.240.0.2, 172.30.0.1, 35.202.242.152, not qe-jialiu-xlxf-master-etcd-3


That means the whole multiple master env setup failed.


This is blocking rpm multiple master testing.

Comment 5 Scott Dodson 2017-10-30 14:11:02 UTC

(In reply to Johnny Liu from comment #1)
> Pls get inventory host file and installation log from attachment.

yeah can we get the inventory

Comment 6 Johnny Liu 2017-10-31 02:48:47 UTC

need what info from me? the inventory host file? I said the inventory host file is included in the attachment (searching "openshift-ansible-inventory-start" keyword from the attachment).

> yeah can we get the inventory
I guess Scott did a typo.

Comment 8 Andrew Butcher 2017-11-01 19:01:16 UTC

https://github.com/openshift/openshift-ansible/pull/5978

Comment 10 Johnny Liu 2017-11-03 03:39:54 UTC

Verified this bug with openshift-ansible-3.7.0-0.190.0.git.0.129e91a.el7.noarch, and PASS.


TASK [set_fact] ****************************************************************
Friday 03 November 2017  02:44:58 +0000 (0:00:00.068)       0:08:07.903 ******* 
ok: [ec2-54-242-50-70.compute-1.amazonaws.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["ip-172-18-8-57.ec2.internal", "ip-172-18-12-243.ec2.internal", "ip-172-18-14-135.ec2.internal"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}
ok: [ec2-52-206-149-174.compute-1.amazonaws.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["ip-172-18-8-57.ec2.internal", "ip-172-18-12-243.ec2.internal", "ip-172-18-14-135.ec2.internal"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}
ok: [ec2-52-91-66-4.compute-1.amazonaws.com] => {"ansible_facts": {"openshift_master_etcd_hosts": ["ip-172-18-8-57.ec2.internal", "ip-172-18-12-243.ec2.internal", "ip-172-18-14-135.ec2.internal"], "openshift_master_etcd_port": "2379"}, "changed": false, "failed": false}

Comment 13 errata-xmlrpc 2017-11-28 22:20:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.