Description of problem: On a host with multiple interfaces, /etc/etcd/etcd.conf was configured to use the address of eth1, however the liveness probe is hitting the address of eth0. We need to make sure that we configure /etc/etcd/etcd.conf to match the address that would be use for the liveness probe. Version-Release number of the following components: master branch / 3.10 How reproducible: unknown Steps to Reproduce: 1. Provision host with two interfaces for use as a master 2. Install OpenShift 3. Actual results: etcd static pod is killed due to liveness probe failures Expected results: etcd static pod runs successfully Additional info: Network config docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 broadcast 0.0.0.0 ether 02:42:23:ed:77:2d txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.121.138 netmask 255.255.255.0 broadcast 192.168.121.255 inet6 fe80::5054:ff:fea3:a6e2 prefixlen 64 scopeid 0x20<link> ether 52:54:00:a3:a6:e2 txqueuelen 1000 (Ethernet) RX packets 170661 bytes 1001384775 (954.9 MiB) RX errors 0 dropped 2 overruns 0 frame 0 TX packets 138811 bytes 10348076 (9.8 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.120.4 netmask 255.255.255.0 broadcast 192.168.120.255 inet6 fe80::5054:ff:fede:66fb prefixlen 64 scopeid 0x20<link> ether 52:54:00:de:66:fb txqueuelen 1000 (Ethernet) RX packets 1573 bytes 83138 (81.1 KiB) RX errors 0 dropped 292 overruns 0 frame 0 TX packets 24 bytes 2812 (2.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1 (Local Loopback) RX packets 263009 bytes 116104090 (110.7 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 263009 bytes 116104090 (110.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 Node logs showing liveness probe failure May 03 19:48:22 localhost.localdomain atomic-openshift-node[3463]: I0503 19:48:22.785710 3463 prober.go:111] Liveness probe for "master-etcd-localhost.localdomain_kube-system(e41c2a57b9d15b28cb8a240c7780597f):etcd" failed (failure): dial tcp 192.168.121.138:2379: getsockopt: connection refused etcd config and logs (from a different startup but same behavior) ETCD_ADVERTISE_CLIENT_URLS=https://192.168.120.4:2379 ETCD_CERT_FILE=/etc/etcd/server.crt ETCD_CLIENT_CERT_AUTH=true ETCD_DATA_DIR=/var/lib/etcd/ ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.120.4:2380 ETCD_INITIAL_CLUSTER=192.168.120.4.nip.io=https://192.168.120.4:2380 ETCD_LISTEN_CLIENT_URLS=https://192.168.120.4:2379 ETCD_LISTEN_PEER_URLS=https://192.168.120.4:2380 ETCD_NAME=192.168.120.4.nip.io ... 2018-05-03 19:37:36.722605 I | etcdserver: published {Name:192.168.120.4.nip.io ClientURLs:[https://192.168.120.4:2379]} to cluster f576e02791b30b8a 2018-05-03 19:37:36.722721 I | embed: ready to serve client requests ... 2018-05-03 19:37:52.994948 D | auth: found common name 192.168.120.4.nip.io 2018-05-03 19:37:53.089500 N | pkg/osutil: received terminated signal, shutting down...
PR https://github.com/openshift/openshift-ansible/pull/8495
Fix is available in openshift-ansible-3.10.0-0.51.0
Verified in openshift-ansible-3.10.0-0.53.0.git.0.53fe016.el7.noarch.rpm 1) Spin up two instances with two interfaces # ip addr 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether fa:16:3e:7b:08:8f brd ff:ff:ff:ff:ff:ff inet 172.16.120.67/24 brd 172.16.120.255 scope global noprefixroute dynamic eth0 valid_lft 77221sec preferred_lft 77221sec inet6 fe80::f816:3eff:fe7b:88f/64 scope link valid_lft forever preferred_lft forever 4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether fa:16:3e:d5:bf:9b brd ff:ff:ff:ff:ff:ff inet 192.168.33.5/24 brd 192.168.33.255 scope global noprefixroute dynamic eth1 valid_lft 79793sec preferred_lft 79793sec inet6 fe80::f816:3eff:fed5:bf9b/64 scope link 2) Spefify openshift_ip and openshift_hostname in etcd host group to use eth1 interface. [etcd] host-8-246-185.host.xx.redhat.com openshift_public_hostname=host-8-246-185.host.xx.redhat.com openshift_hostname=192.168.33.5 openshift_ip=192.168.33.5 3) etcd server was running successfully # cat /etc/origin/node/pods/etcd.yaml <--snip--> livenessProbe: exec: command: - etcdctl - --cert-file - /etc/etcd/peer.crt - --key-file - /etc/etcd/peer.key - --ca-file - /etc/etcd/ca.crt - -C - https://192.168.33.5:2379 - cluster-health initialDelaySeconds: 15 timeoutSeconds: 10 <--snip--> # cat /etc/etcd/etcd.conf ETCD_NAME=192.168.33.5 ETCD_LISTEN_PEER_URLS=https://192.168.33.5:2380 ETCD_DATA_DIR=/var/lib/etcd/ #ETCD_WAL_DIR= #ETCD_SNAPSHOT_COUNT=10000 ETCD_HEARTBEAT_INTERVAL=500 ETCD_ELECTION_TIMEOUT=2500 ETCD_LISTEN_CLIENT_URLS=https://192.168.33.5:2379 #ETCD_MAX_SNAPSHOTS=5 #ETCD_MAX_WALS=5 #ETCD_CORS= #[cluster] ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.33.5:2380 ETCD_INITIAL_CLUSTER=192.168.33.5=https://192.168.33.5:2380 ETCD_INITIAL_CLUSTER_STATE=new ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1 #ETCD_DISCOVERY= #ETCD_DISCOVERY_SRV= #ETCD_DISCOVERY_FALLBACK=proxy #ETCD_DISCOVERY_PROXY= ETCD_ADVERTISE_CLIENT_URLS=https://192.168.33.5:2379 <--snip-->
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816