1462652 – Containerized installation with proxy fails

Bug 1462652 - Containerized installation with proxy fails

Summary: Containerized installation with proxy fails

Keywords:
Status:	CLOSED DUPLICATE of bug 1432020
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Tim Bielawa
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-19 07:50 UTC by Marko Myllynen
Modified:	2018-04-18 18:04 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-18 18:04:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Marko Myllynen 2017-06-19 07:50:48 UTC

Description of problem:
Using the following configuration where the proxy IP is invalid (to test that nothing really tries to use it):

openshift_http_proxy=http://10.10.10.10:8080
openshift_https_proxy=http://10.10.10.10:8080
openshift_no_proxy=registry.example.com
openshift_generate_no_proxy_hosts=True

This leads to installation failure on containerized 3.5, node container/service won't start:

I0616 11:52:33.236370   19986 start_node.go:250] Reading node configuration from /etc/origin/node/node-config.yaml
F0616 11:53:03.246969   19986 start_node.go:139] cannot fetch "default" cluster network: Get https://lb.test.example.com:8443/oapi/v1/clusternetworks/default: http: error connecting to proxy http://10.10.10.10:8080: dial tcp 10.10.10.10:8080: i/o timeout

Also, on the etcd container there are errors:

2017-06-16 12:50:30.483903 E | rafthttp: failed to dial 284945f473024b5b on stream Message (dial tcp 192.168.200.149:2380: getsockopt: connection refused)
2017-06-16 12:52:56.036837 E | rafthttp: failed to dial 683d77bc33ccbbe0 on stream Message (peer 683d77bc33ccbbe0 failed to find local node db8b5c1ad6077b1b)
2017-06-16 13:00:11.382006 E | etcdserver/api/v2http: etcdserver: request timed out, possibly due to previous leader failure
2017-06-16 13:00:12.800166 E | etcdserver/api/v2http: etcdserver: request timed out, possibly due to previous leader failure [merged 1 repeated lines in 1.42s]

So it looks like NO_PROXY is not properly setup / propagated everywhere.

Version-Release number of selected component (if applicable):
openshift-ansible-playbooks-3.5.78-1.git.0.f7be576.el7.noarch

Comment 1 Scott Dodson 2017-06-19 13:36:27 UTC

We don't configure etcd hosts for proxies. Is this something you've done manually? The expectation is that etcd hosts are all peers and should never need a proxy to reach each other.

Can you please provide /etc/etcd/etcd.conf? thanks

Comment 2 Scott Dodson 2017-06-19 14:17:08 UTC

Also, why can't your node reach the proxy? 

F0616 11:53:03.246969   19986 start_node.go:139] cannot fetch "default" cluster network: Get https://lb.test.example.com:8443/oapi/v1/clusternetworks/default: http: error connecting to proxy http://10.10.10.10:8080: dial tcp 10.10.10.10:8080: i/o timeout

Comment 3 Scott Dodson 2017-06-19 14:18:15 UTC

Can you please provide /etc/sysconfig/{etcd,atomic-openshift-node,atomic-openshift-master}

Also, this does not happen in RPM based installs?

Comment 4 Marko Myllynen 2017-06-19 14:23:59 UTC

> Also, why can't your node reach the proxy?

The address used here is non-existent, there's nobody there behing the IP.

In some environment the proxy is working only for external addresses, trying to reach internal addresses via such a proxy will only cause a hang or an error.

Any attempt to connect to the proxy in such an environment should be avoided, if not indeed trying to reach out external, non-OpenShift networks.

This should be 100% reproducible, I haven't done any local etcd or other configurations.

I will provide /etc/etcd/etcd.conf tomorrow but it is what the installer generated, nothing local / manual in it.

Thanks.

Comment 6 Scott Dodson 2017-06-19 15:07:24 UTC

Sorry, I missed that it was intentionally configured to point at a proxy that doesn't exist. We'll see what we can do to reproduce.

Comment 8 Marko Myllynen 2017-06-20 18:54:05 UTC

In the inventory file I have:

openshift_master_cluster_method=native
openshift_master_cluster_hostname=lb.test.example.com
openshift_master_cluster_public_hostname=something.somewhere.com

The installation went a bit further after I added openshift_master_cluster_hostname host to the no_proxy list. I think when openshift_generate_no_proxy_hosts=True that should be done automatically.

There are still some etcd errors. /etc/etcd/etcd.conf on the first master looks like (pasting the relevant parts only):

ETCD_NAME=master01.test.example.com
ETCD_LISTEN_PEER_URLS=https://192.168.200.254:2380
ETCD_LISTEN_CLIENT_URLS=https://192.168.200.254:2379

#[cluster]
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.200.254:2380
ETCD_INITIAL_CLUSTER=master01.test.example.com=https://192.168.200.254:2380,master02.test.example.com=https://192.168.200.149:2380,master03.test.example.com=https://192.168.200.209:2380
ETCD_INITIAL_CLUSTER_STATE=new
ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1
#ETCD_DISCOVERY=
#ETCD_DISCOVERY_SRV=
#ETCD_DISCOVERY_FALLBACK=proxy
#ETCD_DISCOVERY_PROXY=
ETCD_ADVERTISE_CLIENT_URLS=https://192.168.200.254:2379

#[proxy]
#ETCD_PROXY=off

And /etc/sysconfig/{etcd,atomic-openshift-node,atomic-openshift-master}* have the same proxy related content:

HTTP_PROXY=http://10.10.10.10:8080
HTTPS_PROXY=http://10.10.10.10:8080
NO_PROXY=.cluster.local,infra01.test.example.com,infra02.test.example.com,lb.test.example.com,master01.test.example.com,master02.test.example.com,master03.test.example.com,node01.test.example.com,node02.test.example.com,node03.test.example.com,registry.example.com,172.30.0.0/16,10.1.0.0/16
OPTIONS=--loglevel=2 --listen=https://0.0.0.0:8443 --master=https://master01.test.example.com:8443

Thanks.

Comment 10 Marko Myllynen 2017-06-21 11:48:51 UTC

I retested a few times and I think it works now after adding the openshift_master_cluster_hostname host to the openshift_no_proxy list, the earlier failure was most likely due to stale local caches in the environment.

So to resolve this I'd suggest that in case openshift_generate_no_proxy_hosts=True and openshift_master_cluster_hostname is defined, then add openshift_master_cluster_hostname to openshift_no_proxy.

Thanks.

Comment 11 Scott Dodson 2018-04-18 18:04:28 UTC


*** This bug has been marked as a duplicate of bug 1432020 ***

Note You need to log in before you can comment on or make changes to this bug.