Bug 1375031 - OCP 3.2 ansible installer doesn't support proxy setting for OpenShift Node
Summary: OCP 3.2 ansible installer doesn't support proxy setting for OpenShift Node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.2.1
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
: 3.2.1
Assignee: Scott Dodson
QA Contact: Gan Huang
URL:
Whiteboard:
: 1375271 1375414 (view as bug list)
Depends On:
Blocks: 1375723
TreeView+ depends on / blocked
 
Reported: 2016-09-12 00:36 UTC by Kenjiro Nakayama
Modified: 2016-11-22 08:12 UTC (History)
13 users (show)

Fixed In Version: openshift-ansible-3.2.29-1.git.0.2b76696.el7
Doc Type: Bug Fix
Doc Text:
Previously, the installer did not configure proxy settings for the node service. In some cases this is required for the node service to communicate with the cloud provider which would have prevented the node from starting properly. The installer has been updated to configure proxy settings for the node service ensuring the node can communicate with the cloud API when a proxy is required to do so.
Clone Of:
: 1375723 (view as bug list)
Environment:
Last Closed: 2016-10-03 14:51:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Ansible install inventory file used to install (22.77 KB, text/plain)
2016-09-13 17:32 UTC, Peter Larsen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1984 0 normal SHIPPED_LIVE OpenShift Container Platform 3.2 atomic-openshift-utils bug fix update 2016-10-03 18:51:33 UTC

Description Kenjiro Nakayama 2016-09-12 00:36:34 UTC
Description of problem:

  As described in [1], openshift_{http,https,no}_proxy set the proxy for master and docker only. There are no way to add proxy variables to OpenShift Node's /etc/sysconfig/atomic-openshift-node.

Version-Release number of selected component (if applicable):

  atomic-openshift-3.2.1.13-1.git.0.e438b0e.el7.x86_64
  atomic-openshift-utils-3.2.24-1.git.0.337259b.el7.noarch

Actual results:

  There are no way to set proxy variabled to /etc/sysconfig/atomic-openshift-node by ansible installer  

Expected results:

  New variables openshift_node_http_proxy (or existing openshift_http_proxy) should support HTTP_PROXY

Additional info:

  Without setting the variables, the installation fails if there are proxy between OpenStack API and OpenShift Node.

(e.g)
     Sep 07 02:47:24 knakayam-ose32-single-master.os1.phx2.redhat.com atomic-openshift-node[113143]: F0907 02:47:24.304683  113143 start_node.go:124] could not init cloud provider "openstack": Post http://foo.openstack.redhat.com:5000/v2.0/tokens: http: error connecting to proxy 10.0.1.1:8080/: dial tcp 10.0.1.1:8080: i/o timeout

[1] https://docs.openshift.com/enterprise/3.2/install_config/install/advanced_install.html#advanced-install-configuring-global-proxy

Comment 2 Scott Dodson 2016-09-12 17:52:25 UTC
*** Bug 1375271 has been marked as a duplicate of this bug. ***

Comment 4 Scott Dodson 2016-09-13 13:59:05 UTC
*** Bug 1375414 has been marked as a duplicate of this bug. ***

Comment 11 Peter Larsen 2016-09-13 17:32:32 UTC
Created attachment 1200567 [details]
Ansible install inventory file used to install

The proxy server (squid) is running on proxy.rhdemo.net.

Comment 14 Scott Dodson 2016-09-13 20:36:00 UTC
The fix included in this build applies proxy configuration settings to the node services as well. The user will need to properly determine the correct http_proxy, https_proxy, and no_proxy values to define for their environment whether that's AWS, GCE, or OpenStack.

Comment 16 Gan Huang 2016-09-14 06:36:57 UTC
1. Reproduced with openshift-ansible-3.2.28-1.git.0.5a85fc5.el7.noarch.rpm

Installation failed at TASK [openshift_node : Start and enable node again]
#cat /var/log/messages
<--snip-->
- Sep 13 23:36:22 qe-ghuang-master-nfs-1 atomic-openshift-node: F0913 23:36:22.078780    4870 start_node.go:124] could not init cloud provider "openstack": Post http://xxxxxx.redhat.com:5000/v2.0/tokens: dial tcp: lookup xxxxxx.redhat.com: Temporary failure in name resolution
<--snip-->

2. Then test aganist openshift-ansible-3.2.29-1.git.0.2b76696.el7.noarch.rpm
Installation successed.

Will test again once pushed to new puddle.

Comment 18 Gan Huang 2016-09-18 08:30:34 UTC
1, Check the variable of "NO_PROXY" after installation:
# grep NO_PROXY /etc/sysconfig/atomic-openshift-node
NO_PROXY=.cluster.local,169.254.169.254,qe-ghuang-preserve-master,qe-ghuang-preserve-node

2, Docker registry failed to deploy and was in status "CrashLoopBackOff"
# oc describe po docker-registry-1-jyqvx

<--snip-->
  1m	1m	1	{kubelet qe-ghuang-preserve-node}	spec.containers{registry}	Normal	Started		Started container with docker id 33eda6dc7a63
  1m	1m	1	{kubelet qe-ghuang-preserve-node}	spec.containers{registry}	Normal	Created		Created container with docker id 33eda6dc7a63
  1m	1m	1	{kubelet qe-ghuang-preserve-node}	spec.containers{registry}	Normal	Killing		Killing container with docker id 33eda6dc7a63: pod "docker-registry-1-jyqvx_default(47555aad-7d70-11e6-b5e8-fa163ea980d4)" container "registry" is unhealthy, it will be killed and re-created.
  6m	1m	16	{kubelet qe-ghuang-preserve-node}	spec.containers{registry}	Warning	Unhealthy	Readiness probe failed: HTTP probe failed with statuscode: 503
  6m	1m	9	{kubelet qe-ghuang-preserve-node}	spec.containers{registry}	Warning	Unhealthy	Liveness probe failed: HTTP probe failed with statuscode: 503
  1m	14s	8	{kubelet qe-ghuang-preserve-node}					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "registry" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=registry pod=docker-registry-1-jyqvx_default(47555aad-7d70-11e6-b5e8-fa163ea980d4)"

  4m	14s	22	{kubelet qe-ghuang-preserve-node}	spec.containers{registry}	Warning	BackOff	Back-off restarting failed docker container

3, Add the cluster network and pod network into NO_PROXY like this:
# grep NO_PROXY /etc/sysconfig/atomic-openshift-node

NO_PROXY=.cluster.local,169.254.169.254,qe-ghuang-preserve-master,qe-ghuang-preserve-node,172.30.0.0/16,10.1.0.0/16

4, Re-deploy docker-registry and successed.

@Scott, from the testing above, looks like we also need add the cluster network and pod network into NO_PROXY.

Comment 19 Peter Larsen 2016-09-18 16:05:55 UTC
(In reply to Gan Huang from comment #18)
> 1, Check the variable of "NO_PROXY" after installation:
> # grep NO_PROXY /etc/sysconfig/atomic-openshift-node
> NO_PROXY=.cluster.local,169.254.169.254,qe-ghuang-preserve-master,qe-ghuang-
> preserve-node
> 
> 2, Docker registry failed to deploy and was in status "CrashLoopBackOff"
> # oc describe po docker-registry-1-jyqvx
> 
> <--snip-->

> 
> @Scott, from the testing above, looks like we also need add the cluster
> network and pod network into NO_PROXY.

Here's the NO_PROXY on a working system for an atomic-openshift-node:

NO_PROXY=.cluster.local,.rhdemo.net,proxy-master1.rhdemo.net,proxy-node1.rhdemo.net,proxy-node2.rhdemo.net,proxy-node3.rhdemo.net,172.30.0.0/16,10.1.0.0/16

I'm not sure each node should be in the NOPROXY but that seems to be what ansible does. Only node to master would be an issue here - but that's for a different ticket. 

The docker registry IP needs to be part of NO_PROXY of the docker daemon /etc/sysconfig/docker - this means, that once oadm route has been run, the service IP needs to be injected into the running docker daemon on all anodes, and then the daemon needs to be restarted. If this is skipped, then builds will fail when they try to push images into the registry.

Comment 20 Steven Walter 2016-09-19 17:31:54 UTC
Customer tried the install packages. This may be related to the issues already presented, but the installer seems to stall out on cluster facts when using the brew packages. Posting details in pc

Comment 23 Scott Dodson 2016-09-19 18:58:59 UTC
Gan,

Thanks added kubesvc and cluster cidrs to /etc/sysconfig/atomic-openshift-node.

https://github.com/openshift/openshift-ansible/pull/2466

Backported to 3.2 and 3.3 installers.

Comment 24 Gan Huang 2016-09-20 06:09:19 UTC
Test aganist openshift-ansible-3.2.30-1

Installation failed at:
TASK [openshift_node : Configure Proxy Settings] *******************************
[DEPRECATION WARNING]: Skipping task due to undefined Error, in the future this will be a fatal error.: 'dict object' has no attribute 'master'.
This feature will be removed
 in a future release. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
fatal: [qe-ghuang-preserve-node]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'item' is undefined\n\nThe error appears to have been in '/root/openshift-ansible/roles/openshift_node/tasks/systemd_units.yml': line 51, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Configure Proxy Settings\n  ^ here\n"}
changed: [qe-ghuang-preserve-master] => (item={u'regex': u'^HTTP_PROXY=', u'line': u'HTTP_PROXY=http://192.168.1.84:3128'}) => {"backup": "", "changed": true, "item": {"line": "HTTP_PROXY=http://192.168.1.84:3128", "regex": "^HTTP_PROXY="}, "msg": "line added"}
changed: [qe-ghuang-preserve-master] => (item={u'regex': u'^HTTPS_PROXY=', u'line': u'HTTPS_PROXY=http://192.168.1.84:3128'}) => {"backup": "", "changed": true, "item": {"line": "HTTPS_PROXY=http://192.168.1.84:3128", "regex": "^HTTPS_PROXY="}, "msg": "line added"}
changed: [qe-ghuang-preserve-master] => (item={u'regex': u'^NO_PROXY=', u'line': u'NO_PROXY=.cluster.local,169.254.169.254,qe-ghuang-preserve-master,qe-ghuang-preserve-node,172.30.0.0/16,10.1.0.0/16'}) => {"backup": "", "changed": true, "item": {"line": "NO_PROXY=.cluster.local,169.254.169.254,qe-ghuang-preserve-master,qe-ghuang-preserve-node,172.30.0.0/16,10.1.0.0/16", "regex": "^NO_PROXY="}, "msg": "line added"}


Looks good to me when I modify the file like this(hopefully it's useful to you):
diff --git a/roles/openshift_node/tasks/systemd_units.yml b/roles/openshift_node/tasks/systemd_units.yml
index c20eed8..f8d6929 100644
--- a/roles/openshift_node/tasks/systemd_units.yml
+++ b/roles/openshift_node/tasks/systemd_units.yml
@@ -60,7 +60,7 @@
     - regex: '^HTTPS_PROXY='
       line: "HTTPS_PROXY={{ openshift.common.https_proxy }}"
     - regex: '^NO_PROXY='
-      line: "NO_PROXY={{ openshift.common.no_proxy | join(',') }},{{ openshift.common.portal_net }},{{ openshift.master.sdn_cluster_network_cidr }}"
+      line: "NO_PROXY={{ openshift.common.no_proxy | join(',') }},{{ hostvars[groups.oo_first_master.0].openshift.common.portal_net }},{{ hostvars[groups.oo_first_master.0]
   when: "{{ openshift.common.http_proxy is defined and openshift.common.http_proxy != '' }}"
   notify:
   - restart node

Comment 25 Scott Dodson 2016-09-20 14:23:50 UTC
(In reply to Gan Huang from comment #24)
> Looks good to me when I modify the file like this(hopefully it's useful to
> you):

> -      line: "NO_PROXY={{ openshift.common.no_proxy | join(',') }},{{
> openshift.common.portal_net }},{{ openshift.master.sdn_cluster_network_cidr
> }}"
> +      line: "NO_PROXY={{ openshift.common.no_proxy | join(',') }},{{
> hostvars[groups.oo_first_master.0].openshift.common.portal_net }},{{
> hostvars[groups.oo_first_master.0]

Thanks, my test inventory had both nodes as masters so I missed this. Fixing it.

Comment 27 Gan Huang 2016-09-21 03:14:02 UTC
Verified with openshift-ansible-3.2.31-1.git.0.203df76.el7.noarch

Comment 32 errata-xmlrpc 2016-10-03 14:51:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1984


Note You need to log in before you can comment on or make changes to this bug.