Bug 1375110

Summary: Not able to choose etcd IP/interface during installation
Product: OpenShift Container Platform Reporter: jtudelag
Component: InstallerAssignee: Jason DeTiberus <jdetiber>
Status: CLOSED DUPLICATE QA Contact: Johnny Liu <jialiu>
Severity: low Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: aos-bugs, javier.ramirez, jokerman, kyle.bassett, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-14 15:03:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description jtudelag 2016-09-12 08:21:34 UTC
Description of problem:

Not being able to choose the IP/interface where etcd listens.

Sometimes when deploying on certain environments, in some public clouds for example, instances have more than one interface, usually one private and one public. By default etcd always listens in the default one.

I assume the right variable to set is etcd_ip:

https://github.com/openshift/openshift-ansible/blob/ee9413cebdb8a7c5ff03a5da767b1c74742bc898/roles/openshift_etcd_facts/vars/main.yml#L5

https://github.com/openshift/openshift-ansible/blob/1b4bf065f84a28426a010cdc47669b88d5515e34/roles/etcd_common/defaults/main.yml#L32

Also, it would make sense for me to take into consideration these two variables:
openshift_ip and openshift_public_ip.

https://docs.openshift.com/enterprise/3.2/install_config/install/advanced_install.html#configuring-host-variables

Version-Release number of selected component (if applicable):

Openshift Enterprise 3.2, Openshift Origin 3.2

How reproducible:

Deploy Openshift on instances with multiples NICs, and try to make etcd listen on the second interface, not the default one.

Steps to Reproduce:
1. Deploy openshift
2. origin-master-api.service does NOT start. 
3.

Actual results:

systemctl status origin-master-api.service
● origin-master-api.service - Atomic OpenShift Master API
   Loaded: loaded (/usr/lib/systemd/system/origin-master-api.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2016-09-12 08:12:18 UTC; 6min ago
     Docs: https://github.com/openshift/origin
  Process: 29048 ExecStart=/usr/bin/openshift start master api --config=${CONFIG_FILE} $OPTIONS (code=exited, status=255)
 Main PID: 29048 (code=exited, status=255)

Sep 12 08:11:48 master1.example.com atomic-openshift-master-api[29048]: I0912 08:11:48.246794   29048 start_master.go:384] Public master address is https://master1.example.com:8443
Sep 12 08:11:48 master1.example.com atomic-openshift-master-api[29048]: I0912 08:11:48.246821   29048 start_master.go:388] Using images from "openshift/origin-<component>:v1.2.1"
Sep 12 08:11:48 master1.example.com atomic-openshift-master-api[29048]: I0912 08:11:48.339139   29048 run_components.go:204] Using default project node label selector: region=primary
Sep 12 08:12:18 master1.example.com atomic-openshift-master-api[29048]: F0912 08:12:18.829371   29048 controller.go:86] Unable to perform initial IP allocation check: unable to persist the updated service IP allocations: ... has no leader
Sep 12 08:12:18 master1.example.com atomic-openshift-master-api[29048]: error #1: client: etcd member https://master2.example.com:2379 has no leader
Sep 12 08:12:18 master1.example.com atomic-openshift-master-api[29048]: error #2: client: etcd member https://master1.example.com:2379 has no leader
Sep 12 08:12:18 master1.example.com systemd[1]: origin-master-api.service: main process exited, code=exited, status=255/n/a
Sep 12 08:12:18 master1.example.com systemd[1]: Failed to start Atomic OpenShift Master API.
Sep 12 08:12:18 master1.example.com systemd[1]: Unit origin-master-api.service entered failed state.
Sep 12 08:12:18 master1.example.com systemd[1]: origin-master-api.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

journalctl -r -u etcd
-- Logs begin at Sun 2016-09-11 18:29:54 UTC, end at Mon 2016-09-12 08:20:14 UTC. --
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream MsgApp v2 (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream Message (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream Message (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream MsgApp v2 (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream MsgApp v2 (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream Message (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream MsgApp v2 (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream Message (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream MsgApp v2 (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream Message (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream Message (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream MsgApp v2 (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: publish error: etcdserver: request timed out

Expected results:


Additional info:

Comment 1 Javier Ramirez 2016-11-14 15:03:07 UTC

*** This bug has been marked as a duplicate of bug 1375111 ***

Comment 2 Kyle Bassett 2017-03-06 19:47:13 UTC
I ran into this and took a while to find a work around - I documented it here > http://www.arctiq.ca/our-blog/2017/3/6/openshift-install-not-able-to-choose-etcd-ipinterface-during-installation
Hope it helps someone ...