Bug 1375111 - [DOCS] Not able to choose etcd IP/interface during installation
Summary: [DOCS] Not able to choose etcd IP/interface during installation
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 3.2.1
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Ashley Hardin
QA Contact: Gaoyun Pei
Vikram Goyal
URL:
Whiteboard: 3.10-release-plan
: 1375110 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-12 08:21 UTC by jtudelag
Modified: 2021-09-09 11:55 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-11 21:10:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description jtudelag 2016-09-12 08:21:49 UTC
Description of problem:

Not being able to choose the IP/interface where etcd listens.

Sometimes when deploying on certain environments, in some public clouds for example, instances have more than one interface, usually one private and one public. By default etcd always listens in the default one.

I assume the right variable to set is etcd_ip:

https://github.com/openshift/openshift-ansible/blob/ee9413cebdb8a7c5ff03a5da767b1c74742bc898/roles/openshift_etcd_facts/vars/main.yml#L5

https://github.com/openshift/openshift-ansible/blob/1b4bf065f84a28426a010cdc47669b88d5515e34/roles/etcd_common/defaults/main.yml#L32

Also, it would make sense for me to take into consideration these two variables:
openshift_ip and openshift_public_ip.

https://docs.openshift.com/enterprise/3.2/install_config/install/advanced_install.html#configuring-host-variables

Version-Release number of selected component (if applicable):

Openshift Enterprise 3.2, Openshift Origin 3.2

How reproducible:

Deploy Openshift on instances with multiples NICs, and try to make etcd listen on the second interface, not the default one.

Steps to Reproduce:
1. Deploy openshift
2. origin-master-api.service does NOT start. 
3.

Actual results:

systemctl status origin-master-api.service
● origin-master-api.service - Atomic OpenShift Master API
   Loaded: loaded (/usr/lib/systemd/system/origin-master-api.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2016-09-12 08:12:18 UTC; 6min ago
     Docs: https://github.com/openshift/origin
  Process: 29048 ExecStart=/usr/bin/openshift start master api --config=${CONFIG_FILE} $OPTIONS (code=exited, status=255)
 Main PID: 29048 (code=exited, status=255)

Sep 12 08:11:48 master1.example.com atomic-openshift-master-api[29048]: I0912 08:11:48.246794   29048 start_master.go:384] Public master address is https://master1.example.com:8443
Sep 12 08:11:48 master1.example.com atomic-openshift-master-api[29048]: I0912 08:11:48.246821   29048 start_master.go:388] Using images from "openshift/origin-<component>:v1.2.1"
Sep 12 08:11:48 master1.example.com atomic-openshift-master-api[29048]: I0912 08:11:48.339139   29048 run_components.go:204] Using default project node label selector: region=primary
Sep 12 08:12:18 master1.example.com atomic-openshift-master-api[29048]: F0912 08:12:18.829371   29048 controller.go:86] Unable to perform initial IP allocation check: unable to persist the updated service IP allocations: ... has no leader
Sep 12 08:12:18 master1.example.com atomic-openshift-master-api[29048]: error #1: client: etcd member https://master2.example.com:2379 has no leader
Sep 12 08:12:18 master1.example.com atomic-openshift-master-api[29048]: error #2: client: etcd member https://master1.example.com:2379 has no leader
Sep 12 08:12:18 master1.example.com systemd[1]: origin-master-api.service: main process exited, code=exited, status=255/n/a
Sep 12 08:12:18 master1.example.com systemd[1]: Failed to start Atomic OpenShift Master API.
Sep 12 08:12:18 master1.example.com systemd[1]: Unit origin-master-api.service entered failed state.
Sep 12 08:12:18 master1.example.com systemd[1]: origin-master-api.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

journalctl -r -u etcd
-- Logs begin at Sun 2016-09-11 18:29:54 UTC, end at Mon 2016-09-12 08:20:14 UTC. --
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream MsgApp v2 (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream Message (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream Message (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream MsgApp v2 (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream MsgApp v2 (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream Message (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream MsgApp v2 (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream Message (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream MsgApp v2 (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial 6aae31043c66d966 on stream Message (dial tcp 139.59.129.75:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream Message (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: failed to dial d3bac7b65ff86f14 on stream MsgApp v2 (dial tcp 139.59.129.93:2380: getsockopt: connection refused)
Sep 12 08:20:14 master1.example.com etcd[29160]: publish error: etcdserver: request timed out

Expected results:


Additional info:

Comment 1 Javier Ramirez 2016-11-14 15:03:07 UTC
*** Bug 1375110 has been marked as a duplicate of this bug. ***

Comment 4 Kyle Bassett 2017-03-06 19:46:24 UTC
I ran into this and took a while to find a work around - I documented it here > http://www.arctiq.ca/our-blog/2017/3/6/openshift-install-not-able-to-choose-etcd-ipinterface-during-installation
Hope it helps someone ...

Comment 5 Scott Dodson 2017-04-03 13:37:21 UTC
From the blogpost in comment 4 the workaround is to set openshift_ip to the internal ip address of the host which is a generic workaround that can be applied in any scenario where we incorrectly determine the ip address that hosts should use to communicate with each other.

Lowering severity based on the workaround.

Comment 13 Gaoyun Pei 2018-04-09 08:59:37 UTC
LGTM, move it to verified.


Note You need to log in before you can comment on or make changes to this bug.