Bug 1375111

Summary: [DOCS] Not able to choose etcd IP/interface during installation
Product: OpenShift Container Platform Reporter: jtudelag
Component: DocumentationAssignee: Ashley Hardin <ahardin>
Status: CLOSED CURRENTRELEASE QA Contact: Gaoyun Pei <gpei>
Severity: low Docs Contact: Vikram Goyal <vigoyal>
Priority: low    
Version: 3.2.1CC: abutcher, aos-bugs, bleanhar, dapark, dmoessne, jokerman, kyle.bassett, mmccomas, naoto30, stwalter
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: 3.10-release-plan
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-11 21:10:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jtudelag 2016-09-12 08:21:49 UTC
Description of problem:

Not being able to choose the IP/interface where etcd listens.

Sometimes when deploying on certain environments, in some public clouds for example, instances have more than one interface, usually one private and one public. By default etcd always listens in the default one.

I assume the right variable to set is etcd_ip:

https://github.com/openshift/openshift-ansible/blob/ee9413cebdb8a7c5ff03a5da767b1c74742bc898/roles/openshift_etcd_facts/vars/main.yml#L5

https://github.com/openshift/openshift-ansible/blob/1b4bf065f84a28426a010cdc47669b88d5515e34/roles/etcd_common/defaults/main.yml#L32

Also, it would make sense for me to take into consideration these two variables:
openshift_ip and openshift_public_ip.

https://docs.openshift.com/enterprise/3.2/install_config/install/advanced_install.html#configuring-host-variables

Version-Release number of selected component (if applicable):

Openshift Enterprise 3.2, Openshift Origin 3.2

How reproducible:

Deploy Openshift on instances with multiples NICs, and try to make etcd listen on the second interface, not the default one.

Steps to Reproduce:
1. Deploy openshift
2. origin-master-api.service does NOT start. 
3.

Actual results:

systemctl status origin-master-api.service
● origin-master-api.service - Atomic OpenShift Master API
   Loaded: loaded (/usr/lib/systemd/system/origin-master-api.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2016-09-12 08:12:18 UTC; 6min ago
     Docs: https://github.com/openshift/origin
  Process: 29048 ExecStart=/usr/bin/openshift start master api --config=${CONFIG_FILE} $OPTIONS (code=exited, status=255)
 Main PID: 29048 (code=exited, status=255)

Sep 12 08:11:48 master1.example.com atomic-openshift-master-api[29048]: I0912 08:11:48.246794   29048 start_master.go:384] Public master address is https://master1.example.com:8443
Sep 12 08:11:48 master1.example.com atomic-openshift-master-api[29048]: I0912 08:11:48.246821   29048 start_master.go:388] Using images from "openshift/origin-<component>:v1.2.1"
Sep 12 08:11:48 master1.example.com atomic-openshift-master-api[29048]: I0912 08:11:48.339139   29048 run_components.go:204] Using default project node label selector: region=primary
Sep 12 08:12:18 master1.example.com atomic-openshift-master-api[29048]: F0912 08:12:18.829371   29048 controller.go:86] Unable to perform initial IP allocation check: unable to persist the updated service IP allocations: ... has no leader
Sep 12 08:12:18 master1.example.com atomic-openshift-master-api[29048]: error #1: client: etcd member https://master2.example.com:2379 has no leader
Sep 12 08:12:18 master1.example.com atomic-openshift-master-api[29048]: error #2: client: etcd member https://master1.example.com:2379 has no leader
Sep 12 08:12:18 master1.example.com systemd[1]: origin-master-api.service: main process exited, code=exited, status=255/n/a
Sep 12 08:12:18 master1.example.com systemd[1]: Failed to start Atomic OpenShift Master API.
Sep 12 08:12:18 master1.example.com systemd[1]: Unit origin-master-api.service entered failed state.
Sep 12 08:12:18 master1.example.com systemd[1]: origin-master-api.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

journalctl -r -u etcd
-- Logs begin at Sun 2016-09-11 18:29:54 UTC, end at Mon 2016-09-12 08:20:14 UTC. --
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream MsgApp v2 (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream Message (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream Message (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream MsgApp v2 (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream MsgApp v2 (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream Message (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream MsgApp v2 (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream Message (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream MsgApp v2 (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream Message (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream Message (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream MsgApp v2 (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: publish error: etcdserver: request timed out

Expected results:


Additional info:

Comment 1 Javier Ramirez 2016-11-14 15:03:07 UTC
*** Bug 1375110 has been marked as a duplicate of this bug. ***

Comment 4 Kyle Bassett 2017-03-06 19:46:24 UTC
I ran into this and took a while to find a work around - I documented it here > http://www.arctiq.ca/our-blog/2017/3/6/openshift-install-not-able-to-choose-etcd-ipinterface-during-installation
Hope it helps someone ...

Comment 5 Scott Dodson 2017-04-03 13:37:21 UTC
From the blogpost in comment 4 the workaround is to set openshift_ip to the internal ip address of the host which is a generic workaround that can be applied in any scenario where we incorrectly determine the ip address that hosts should use to communicate with each other.

Lowering severity based on the workaround.

Comment 13 Gaoyun Pei 2018-04-09 08:59:37 UTC
LGTM, move it to verified.