Description of problem: Using the following configuration where the proxy IP is invalid (to test that nothing really tries to use it): openshift_http_proxy=http://10.10.10.10:8080 openshift_https_proxy=http://10.10.10.10:8080 openshift_no_proxy=registry.example.com openshift_generate_no_proxy_hosts=True This leads to installation failure on containerized 3.5, node container/service won't start: I0616 11:52:33.236370 19986 start_node.go:250] Reading node configuration from /etc/origin/node/node-config.yaml F0616 11:53:03.246969 19986 start_node.go:139] cannot fetch "default" cluster network: Get https://lb.test.example.com:8443/oapi/v1/clusternetworks/default: http: error connecting to proxy http://10.10.10.10:8080: dial tcp 10.10.10.10:8080: i/o timeout Also, on the etcd container there are errors: 2017-06-16 12:50:30.483903 E | rafthttp: failed to dial 284945f473024b5b on stream Message (dial tcp 192.168.200.149:2380: getsockopt: connection refused) 2017-06-16 12:52:56.036837 E | rafthttp: failed to dial 683d77bc33ccbbe0 on stream Message (peer 683d77bc33ccbbe0 failed to find local node db8b5c1ad6077b1b) 2017-06-16 13:00:11.382006 E | etcdserver/api/v2http: etcdserver: request timed out, possibly due to previous leader failure 2017-06-16 13:00:12.800166 E | etcdserver/api/v2http: etcdserver: request timed out, possibly due to previous leader failure [merged 1 repeated lines in 1.42s] So it looks like NO_PROXY is not properly setup / propagated everywhere. Version-Release number of selected component (if applicable): openshift-ansible-playbooks-3.5.78-1.git.0.f7be576.el7.noarch
We don't configure etcd hosts for proxies. Is this something you've done manually? The expectation is that etcd hosts are all peers and should never need a proxy to reach each other. Can you please provide /etc/etcd/etcd.conf? thanks
Also, why can't your node reach the proxy? F0616 11:53:03.246969 19986 start_node.go:139] cannot fetch "default" cluster network: Get https://lb.test.example.com:8443/oapi/v1/clusternetworks/default: http: error connecting to proxy http://10.10.10.10:8080: dial tcp 10.10.10.10:8080: i/o timeout
Can you please provide /etc/sysconfig/{etcd,atomic-openshift-node,atomic-openshift-master} Also, this does not happen in RPM based installs?
> Also, why can't your node reach the proxy? The address used here is non-existent, there's nobody there behing the IP. In some environment the proxy is working only for external addresses, trying to reach internal addresses via such a proxy will only cause a hang or an error. Any attempt to connect to the proxy in such an environment should be avoided, if not indeed trying to reach out external, non-OpenShift networks. This should be 100% reproducible, I haven't done any local etcd or other configurations. I will provide /etc/etcd/etcd.conf tomorrow but it is what the installer generated, nothing local / manual in it. Thanks.
Sorry, I missed that it was intentionally configured to point at a proxy that doesn't exist. We'll see what we can do to reproduce.
In the inventory file I have: openshift_master_cluster_method=native openshift_master_cluster_hostname=lb.test.example.com openshift_master_cluster_public_hostname=something.somewhere.com The installation went a bit further after I added openshift_master_cluster_hostname host to the no_proxy list. I think when openshift_generate_no_proxy_hosts=True that should be done automatically. There are still some etcd errors. /etc/etcd/etcd.conf on the first master looks like (pasting the relevant parts only): ETCD_NAME=master01.test.example.com ETCD_LISTEN_PEER_URLS=https://192.168.200.254:2380 ETCD_LISTEN_CLIENT_URLS=https://192.168.200.254:2379 #[cluster] ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.200.254:2380 ETCD_INITIAL_CLUSTER=master01.test.example.com=https://192.168.200.254:2380,master02.test.example.com=https://192.168.200.149:2380,master03.test.example.com=https://192.168.200.209:2380 ETCD_INITIAL_CLUSTER_STATE=new ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1 #ETCD_DISCOVERY= #ETCD_DISCOVERY_SRV= #ETCD_DISCOVERY_FALLBACK=proxy #ETCD_DISCOVERY_PROXY= ETCD_ADVERTISE_CLIENT_URLS=https://192.168.200.254:2379 #[proxy] #ETCD_PROXY=off And /etc/sysconfig/{etcd,atomic-openshift-node,atomic-openshift-master}* have the same proxy related content: HTTP_PROXY=http://10.10.10.10:8080 HTTPS_PROXY=http://10.10.10.10:8080 NO_PROXY=.cluster.local,infra01.test.example.com,infra02.test.example.com,lb.test.example.com,master01.test.example.com,master02.test.example.com,master03.test.example.com,node01.test.example.com,node02.test.example.com,node03.test.example.com,registry.example.com,172.30.0.0/16,10.1.0.0/16 OPTIONS=--loglevel=2 --listen=https://0.0.0.0:8443 --master=https://master01.test.example.com:8443 Thanks.
I retested a few times and I think it works now after adding the openshift_master_cluster_hostname host to the openshift_no_proxy list, the earlier failure was most likely due to stale local caches in the environment. So to resolve this I'd suggest that in case openshift_generate_no_proxy_hosts=True and openshift_master_cluster_hostname is defined, then add openshift_master_cluster_hostname to openshift_no_proxy. Thanks.
*** This bug has been marked as a duplicate of bug 1432020 ***