Previously, API verification during upgrades was performed from the ansible control host which may not have network access to each API server in some network topologies. Now API server verification happens from the master hosts avoiding problems with network access.
Can you confirm that this is happening only when the ansible host does not have access to the API endpoint? That seems like a really odd configuration, is that expected in this environment?
That said, I agree with the proposed fix. I think the chances of being able to access the API endpoint from the remote host rather than local host is probably higher.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2017:0448
Description of problem: Ansible upgrade from 3.2 to 3.3 fails with "Timeout when waiting for NODE" Version-Release number of selected component (if applicable): openshift-ansible-playbooks-3.3.22-1.git.0.6c888c2.el7.noarch How reproducible: Always but different nodes may fail Steps to Reproduce: 1. Install 3.2 2. Ansible upgrade to 3.3 3. Actual results: 2016-10-11 14:23:40,286 p=15651 u=wnhadm | PLAY [Restart masters] ********************************************************* 2016-10-11 14:23:40,296 p=15651 u=wnhadm | TASK [Restart master system] *************************************************** 2016-10-11 14:23:40,333 p=15651 u=wnhadm | TASK [Wait for master API to come back online] ********************************* 2016-10-11 14:23:40,370 p=15651 u=wnhadm | TASK [Wait for master to start] ************************************************ 2016-10-11 14:23:40,404 p=15651 u=wnhadm | TASK [Wait for master to become available] ************************************* 2016-10-11 14:23:40,438 p=15651 u=wnhadm | TASK [fail] ******************************************************************** 2016-10-11 14:23:40,473 p=15651 u=wnhadm | TASK [Restart master] ********************************************************** 2016-10-11 14:23:40,513 p=15651 u=wnhadm | TASK [Restart master API] ****************************************************** 2016-10-11 14:23:49,069 p=15651 u=wnhadm | TASK [Wait for master API to come back online] ********************************* task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-master/restart_services.yml:11 Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/utilities/logic/wait_for.py <localhost> ESTABLISH LOCAL CONNECTION FOR USER: wnhadm <localhost> EXEC /bin/sh -c '/usr/bin/python2 && sleep 0' fatal: [SIY05E97 -> localhost]: FAILED! => { "changed": false, "elapsed": 301, "failed": true, "invocation": { "module_args": { "connect_timeout": 5, "delay": 10, "exclude_hosts": null, "host": "SIY05E97", "path": null, "port": 8443, "search_regex": null, "state": "started", "timeout": 300 }, "module_name": "wait_for" }, "msg": "Timeout when waiting for SIY05E97:8443" } NO MORE HOSTS LEFT ************************************************************* to retry, use: --limit @/home/wnhadm/.ansible-retry/upgrade.retry PLAY RECAP ********************************************************************* SIY05E85 : ok=86 changed=5 unreachable=0 failed=0 SIY05E86 : ok=86 changed=5 unreachable=0 failed=0 SIY05E87 : ok=86 changed=5 unreachable=0 failed=0 SIY05E88 : ok=86 changed=5 unreachable=0 failed=0 SIY05E89 : ok=86 changed=5 unreachable=0 failed=0 SIY05E90 : ok=86 changed=5 unreachable=0 failed=0 SIY05E91 : ok=86 changed=5 unreachable=0 failed=0 SIY05E92 : ok=86 changed=5 unreachable=0 failed=0 SIY05E93 : ok=86 changed=5 unreachable=0 failed=0 SIY05E94 : ok=86 changed=5 unreachable=0 failed=0 SIY05E95 : ok=86 changed=5 unreachable=0 failed=0 SIY05E96 : ok=86 changed=5 unreachable=0 failed=0 SIY05E97 : ok=195 changed=14 unreachable=0 failed=1 SIY05E98 : ok=189 changed=10 unreachable=0 failed=0 SIY05E99 : ok=189 changed=10 unreachable=0 failed=0 localhost : ok=30 changed=17 unreachable=0 failed=0 Expected results: Successful upgrade Additional info: Issue seems to be in /usr/share/ansible/openshift-ansible/playbooks/common/openshift-master/restart_services.yml Customer found workaround by commenting out with the following: - name: Wait for master API to come back online become: no # local_action: # module: wait_for wait_for: host="{{ inventory_hostname }}" state=started delay=10 port="{{ openshift.master.api_port }}" when: openshift_master_ha | bool and openshift.master.cluster_method != 'pacemaker' Doing so, the wait_for module is executed on the remote side. Same fix can be applied to : playbooks/common/openshift-master/restart_hosts.yml /playbooks/common/openshift-master/restart_hosts_pacemaker.yml /playbooks/common/openshift-master/restart_services.yml /playbooks/common/openshift-master/restart_services_pacemaker.yml