Description of problem: Ansible upgrade from 3.2 to 3.3 fails with "Timeout when waiting for NODE" Version-Release number of selected component (if applicable): openshift-ansible-playbooks-3.3.22-1.git.0.6c888c2.el7.noarch How reproducible: Always but different nodes may fail Steps to Reproduce: 1. Install 3.2 2. Ansible upgrade to 3.3 3. Actual results: 2016-10-11 14:23:40,286 p=15651 u=wnhadm | PLAY [Restart masters] ********************************************************* 2016-10-11 14:23:40,296 p=15651 u=wnhadm | TASK [Restart master system] *************************************************** 2016-10-11 14:23:40,333 p=15651 u=wnhadm | TASK [Wait for master API to come back online] ********************************* 2016-10-11 14:23:40,370 p=15651 u=wnhadm | TASK [Wait for master to start] ************************************************ 2016-10-11 14:23:40,404 p=15651 u=wnhadm | TASK [Wait for master to become available] ************************************* 2016-10-11 14:23:40,438 p=15651 u=wnhadm | TASK [fail] ******************************************************************** 2016-10-11 14:23:40,473 p=15651 u=wnhadm | TASK [Restart master] ********************************************************** 2016-10-11 14:23:40,513 p=15651 u=wnhadm | TASK [Restart master API] ****************************************************** 2016-10-11 14:23:49,069 p=15651 u=wnhadm | TASK [Wait for master API to come back online] ********************************* task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-master/restart_services.yml:11 Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/utilities/logic/wait_for.py <localhost> ESTABLISH LOCAL CONNECTION FOR USER: wnhadm <localhost> EXEC /bin/sh -c '/usr/bin/python2 && sleep 0' fatal: [SIY05E97 -> localhost]: FAILED! => { "changed": false, "elapsed": 301, "failed": true, "invocation": { "module_args": { "connect_timeout": 5, "delay": 10, "exclude_hosts": null, "host": "SIY05E97", "path": null, "port": 8443, "search_regex": null, "state": "started", "timeout": 300 }, "module_name": "wait_for" }, "msg": "Timeout when waiting for SIY05E97:8443" } NO MORE HOSTS LEFT ************************************************************* to retry, use: --limit @/home/wnhadm/.ansible-retry/upgrade.retry PLAY RECAP ********************************************************************* SIY05E85 : ok=86 changed=5 unreachable=0 failed=0 SIY05E86 : ok=86 changed=5 unreachable=0 failed=0 SIY05E87 : ok=86 changed=5 unreachable=0 failed=0 SIY05E88 : ok=86 changed=5 unreachable=0 failed=0 SIY05E89 : ok=86 changed=5 unreachable=0 failed=0 SIY05E90 : ok=86 changed=5 unreachable=0 failed=0 SIY05E91 : ok=86 changed=5 unreachable=0 failed=0 SIY05E92 : ok=86 changed=5 unreachable=0 failed=0 SIY05E93 : ok=86 changed=5 unreachable=0 failed=0 SIY05E94 : ok=86 changed=5 unreachable=0 failed=0 SIY05E95 : ok=86 changed=5 unreachable=0 failed=0 SIY05E96 : ok=86 changed=5 unreachable=0 failed=0 SIY05E97 : ok=195 changed=14 unreachable=0 failed=1 SIY05E98 : ok=189 changed=10 unreachable=0 failed=0 SIY05E99 : ok=189 changed=10 unreachable=0 failed=0 localhost : ok=30 changed=17 unreachable=0 failed=0 Expected results: Successful upgrade Additional info: Issue seems to be in /usr/share/ansible/openshift-ansible/playbooks/common/openshift-master/restart_services.yml Customer found workaround by commenting out with the following: - name: Wait for master API to come back online become: no # local_action: # module: wait_for wait_for: host="{{ inventory_hostname }}" state=started delay=10 port="{{ openshift.master.api_port }}" when: openshift_master_ha | bool and openshift.master.cluster_method != 'pacemaker' Doing so, the wait_for module is executed on the remote side. Same fix can be applied to : playbooks/common/openshift-master/restart_hosts.yml /playbooks/common/openshift-master/restart_hosts_pacemaker.yml /playbooks/common/openshift-master/restart_services.yml /playbooks/common/openshift-master/restart_services_pacemaker.yml
Can you confirm that this is happening only when the ansible host does not have access to the API endpoint? That seems like a really odd configuration, is that expected in this environment? That said, I agree with the proposed fix. I think the chances of being able to access the API endpoint from the remote host rather than local host is probably higher.
Proposed fix: https://github.com/openshift/openshift-ansible/pull/3032
Version: atomic-openshift-utils-3.3.64-1.git.0.43bfb06.el7.noarch Step: 1. rpm install ocp3.2 2. upgrade 3.2 to 3.3 Result: Upgrade successfully.
update containerized Env pass too.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:0448