| Summary: | OCP 3.2 ansible installer doesn't support proxy setting for OpenShift Node | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Kenjiro Nakayama <knakayam> | ||||
| Component: | Installer | Assignee: | Scott Dodson <sdodson> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Gan Huang <ghuang> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 3.2.1 | CC: | abutcher, aos-bugs, erich, erjones, ghuang, gkeegan, jialiu, jokerman, mmccomas, pforsber, plarsen, stwalter, wmeng | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.2.1 | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openshift-ansible-3.2.29-1.git.0.2b76696.el7 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Previously, the installer did not configure proxy settings for the node service. In some cases this is required for the node service to communicate with the cloud provider which would have prevented the node from starting properly. The installer has been updated to configure proxy settings for the node service ensuring the node can communicate with the cloud API when a proxy is required to do so.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1375723 (view as bug list) | Environment: | |||||
| Last Closed: | 2016-10-03 14:51:45 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1375723 | ||||||
| Attachments: |
|
||||||
*** Bug 1375271 has been marked as a duplicate of this bug. *** *** Bug 1375414 has been marked as a duplicate of this bug. *** Created attachment 1200567 [details]
Ansible install inventory file used to install
The proxy server (squid) is running on proxy.rhdemo.net.
The fix included in this build applies proxy configuration settings to the node services as well. The user will need to properly determine the correct http_proxy, https_proxy, and no_proxy values to define for their environment whether that's AWS, GCE, or OpenStack. 1. Reproduced with openshift-ansible-3.2.28-1.git.0.5a85fc5.el7.noarch.rpm Installation failed at TASK [openshift_node : Start and enable node again] #cat /var/log/messages <--snip--> - Sep 13 23:36:22 qe-ghuang-master-nfs-1 atomic-openshift-node: F0913 23:36:22.078780 4870 start_node.go:124] could not init cloud provider "openstack": Post http://xxxxxx.redhat.com:5000/v2.0/tokens: dial tcp: lookup xxxxxx.redhat.com: Temporary failure in name resolution <--snip--> 2. Then test aganist openshift-ansible-3.2.29-1.git.0.2b76696.el7.noarch.rpm Installation successed. Will test again once pushed to new puddle.
1, Check the variable of "NO_PROXY" after installation:
# grep NO_PROXY /etc/sysconfig/atomic-openshift-node
NO_PROXY=.cluster.local,169.254.169.254,qe-ghuang-preserve-master,qe-ghuang-preserve-node
2, Docker registry failed to deploy and was in status "CrashLoopBackOff"
# oc describe po docker-registry-1-jyqvx
<--snip-->
1m 1m 1 {kubelet qe-ghuang-preserve-node} spec.containers{registry} Normal Started Started container with docker id 33eda6dc7a63
1m 1m 1 {kubelet qe-ghuang-preserve-node} spec.containers{registry} Normal Created Created container with docker id 33eda6dc7a63
1m 1m 1 {kubelet qe-ghuang-preserve-node} spec.containers{registry} Normal Killing Killing container with docker id 33eda6dc7a63: pod "docker-registry-1-jyqvx_default(47555aad-7d70-11e6-b5e8-fa163ea980d4)" container "registry" is unhealthy, it will be killed and re-created.
6m 1m 16 {kubelet qe-ghuang-preserve-node} spec.containers{registry} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 503
6m 1m 9 {kubelet qe-ghuang-preserve-node} spec.containers{registry} Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 503
1m 14s 8 {kubelet qe-ghuang-preserve-node} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "registry" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=registry pod=docker-registry-1-jyqvx_default(47555aad-7d70-11e6-b5e8-fa163ea980d4)"
4m 14s 22 {kubelet qe-ghuang-preserve-node} spec.containers{registry} Warning BackOff Back-off restarting failed docker container
3, Add the cluster network and pod network into NO_PROXY like this:
# grep NO_PROXY /etc/sysconfig/atomic-openshift-node
NO_PROXY=.cluster.local,169.254.169.254,qe-ghuang-preserve-master,qe-ghuang-preserve-node,172.30.0.0/16,10.1.0.0/16
4, Re-deploy docker-registry and successed.
@Scott, from the testing above, looks like we also need add the cluster network and pod network into NO_PROXY.
(In reply to Gan Huang from comment #18) > 1, Check the variable of "NO_PROXY" after installation: > # grep NO_PROXY /etc/sysconfig/atomic-openshift-node > NO_PROXY=.cluster.local,169.254.169.254,qe-ghuang-preserve-master,qe-ghuang- > preserve-node > > 2, Docker registry failed to deploy and was in status "CrashLoopBackOff" > # oc describe po docker-registry-1-jyqvx > > <--snip--> > > @Scott, from the testing above, looks like we also need add the cluster > network and pod network into NO_PROXY. Here's the NO_PROXY on a working system for an atomic-openshift-node: NO_PROXY=.cluster.local,.rhdemo.net,proxy-master1.rhdemo.net,proxy-node1.rhdemo.net,proxy-node2.rhdemo.net,proxy-node3.rhdemo.net,172.30.0.0/16,10.1.0.0/16 I'm not sure each node should be in the NOPROXY but that seems to be what ansible does. Only node to master would be an issue here - but that's for a different ticket. The docker registry IP needs to be part of NO_PROXY of the docker daemon /etc/sysconfig/docker - this means, that once oadm route has been run, the service IP needs to be injected into the running docker daemon on all anodes, and then the daemon needs to be restarted. If this is skipped, then builds will fail when they try to push images into the registry. Customer tried the install packages. This may be related to the issues already presented, but the installer seems to stall out on cluster facts when using the brew packages. Posting details in pc Gan, Thanks added kubesvc and cluster cidrs to /etc/sysconfig/atomic-openshift-node. https://github.com/openshift/openshift-ansible/pull/2466 Backported to 3.2 and 3.3 installers. Test aganist openshift-ansible-3.2.30-1
Installation failed at:
TASK [openshift_node : Configure Proxy Settings] *******************************
[DEPRECATION WARNING]: Skipping task due to undefined Error, in the future this will be a fatal error.: 'dict object' has no attribute 'master'.
This feature will be removed
in a future release. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
fatal: [qe-ghuang-preserve-node]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'item' is undefined\n\nThe error appears to have been in '/root/openshift-ansible/roles/openshift_node/tasks/systemd_units.yml': line 51, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Configure Proxy Settings\n ^ here\n"}
changed: [qe-ghuang-preserve-master] => (item={u'regex': u'^HTTP_PROXY=', u'line': u'HTTP_PROXY=http://192.168.1.84:3128'}) => {"backup": "", "changed": true, "item": {"line": "HTTP_PROXY=http://192.168.1.84:3128", "regex": "^HTTP_PROXY="}, "msg": "line added"}
changed: [qe-ghuang-preserve-master] => (item={u'regex': u'^HTTPS_PROXY=', u'line': u'HTTPS_PROXY=http://192.168.1.84:3128'}) => {"backup": "", "changed": true, "item": {"line": "HTTPS_PROXY=http://192.168.1.84:3128", "regex": "^HTTPS_PROXY="}, "msg": "line added"}
changed: [qe-ghuang-preserve-master] => (item={u'regex': u'^NO_PROXY=', u'line': u'NO_PROXY=.cluster.local,169.254.169.254,qe-ghuang-preserve-master,qe-ghuang-preserve-node,172.30.0.0/16,10.1.0.0/16'}) => {"backup": "", "changed": true, "item": {"line": "NO_PROXY=.cluster.local,169.254.169.254,qe-ghuang-preserve-master,qe-ghuang-preserve-node,172.30.0.0/16,10.1.0.0/16", "regex": "^NO_PROXY="}, "msg": "line added"}
Looks good to me when I modify the file like this(hopefully it's useful to you):
diff --git a/roles/openshift_node/tasks/systemd_units.yml b/roles/openshift_node/tasks/systemd_units.yml
index c20eed8..f8d6929 100644
--- a/roles/openshift_node/tasks/systemd_units.yml
+++ b/roles/openshift_node/tasks/systemd_units.yml
@@ -60,7 +60,7 @@
- regex: '^HTTPS_PROXY='
line: "HTTPS_PROXY={{ openshift.common.https_proxy }}"
- regex: '^NO_PROXY='
- line: "NO_PROXY={{ openshift.common.no_proxy | join(',') }},{{ openshift.common.portal_net }},{{ openshift.master.sdn_cluster_network_cidr }}"
+ line: "NO_PROXY={{ openshift.common.no_proxy | join(',') }},{{ hostvars[groups.oo_first_master.0].openshift.common.portal_net }},{{ hostvars[groups.oo_first_master.0]
when: "{{ openshift.common.http_proxy is defined and openshift.common.http_proxy != '' }}"
notify:
- restart node
(In reply to Gan Huang from comment #24) > Looks good to me when I modify the file like this(hopefully it's useful to > you): > - line: "NO_PROXY={{ openshift.common.no_proxy | join(',') }},{{ > openshift.common.portal_net }},{{ openshift.master.sdn_cluster_network_cidr > }}" > + line: "NO_PROXY={{ openshift.common.no_proxy | join(',') }},{{ > hostvars[groups.oo_first_master.0].openshift.common.portal_net }},{{ > hostvars[groups.oo_first_master.0] Thanks, my test inventory had both nodes as masters so I missed this. Fixing it. Verified with openshift-ansible-3.2.31-1.git.0.203df76.el7.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1984 |
Description of problem: As described in [1], openshift_{http,https,no}_proxy set the proxy for master and docker only. There are no way to add proxy variables to OpenShift Node's /etc/sysconfig/atomic-openshift-node. Version-Release number of selected component (if applicable): atomic-openshift-3.2.1.13-1.git.0.e438b0e.el7.x86_64 atomic-openshift-utils-3.2.24-1.git.0.337259b.el7.noarch Actual results: There are no way to set proxy variabled to /etc/sysconfig/atomic-openshift-node by ansible installer Expected results: New variables openshift_node_http_proxy (or existing openshift_http_proxy) should support HTTP_PROXY Additional info: Without setting the variables, the installation fails if there are proxy between OpenStack API and OpenShift Node. (e.g) Sep 07 02:47:24 knakayam-ose32-single-master.os1.phx2.redhat.com atomic-openshift-node[113143]: F0907 02:47:24.304683 113143 start_node.go:124] could not init cloud provider "openstack": Post http://foo.openstack.redhat.com:5000/v2.0/tokens: http: error connecting to proxy 10.0.1.1:8080/: dial tcp 10.0.1.1:8080: i/o timeout [1] https://docs.openshift.com/enterprise/3.2/install_config/install/advanced_install.html#advanced-install-configuring-global-proxy