Description of problem: Ansible upgrade from 3.1 to 3.2 is failing due to issue with /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre.yml:30. Apparently it adds a hostname "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master" which causes the upgrade to fail. It fails because the host "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master" is unreachable. TASK [Evaluate etcd_hosts_to_backup] ******************************************* task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre.yml:30 creating host via 'add_host': hostname=groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master changed: [localhost] => (item=groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master) => { "add_host": { "groups": [ "etcd_hosts_to_backup" ], "host_name": "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master", "host_vars": {} }, "changed": true, "invocation": { "module_args": { "groups": "etcd_hosts_to_backup", "name": "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master" }, "module_name": "add_host" }, "item": "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master" } Version-Release number of selected component (if applicable): Openshift 3.2 openshift-ansible-playbooks-3.2.36-1.git.0.164eb4c.el7.noarch How reproducible: Always Steps to Reproduce: 1. ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.yml 2. 3. Actual results: TASK [setup] ******************************************************************* Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/system/setup.py <groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master> ESTABLISH SSH CONNECTION FOR USER: None <groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/%h-%r 'groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master' '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1478273458.8-15024504637190 `" && echo ansible-tmp-1478273458.8-15024504637190="` echo $HOME/.ansible/tmp/ansible-tmp-1478273458.8-15024504637190 `" ) && sleep 0'"'"'' fatal: [groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master]: UNREACHABLE! => { "changed": false, "msg": "Failed to connect to the host via ssh: OpenSSH_6.6.1, OpenSSL 1.0.1e-fips 11 Feb 2013\r\ndebug1: Reading configuration data /root/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 56: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\nControlPath too long\r\n", "unreachable": true } to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.retry PLAY RECAP ********************************************************************* groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master : ok=0 changed=0 unreachable=1 failed=0 localhost : ok=12 changed=7 unreachable=0 failed=0 infra01.ose : ok=81 changed=1 unreachable=0 failed=0 master01.ose : ok=91 changed=1 unreachable=0 failed=0 node01.ose : ok=81 changed=1 unreachable=0 failed=0 node02.ose : ok=81 changed=1 unreachable=0 failed=0 node03.ose : ok=81 changed=1 unreachable=0 failed=0 node04.ose : ok=81 changed=1 unreachable=0 failed=0 The "ControlPath too long" failure is not the real problem, but it does help show the hostname being wrongly set. ControlPath=/root/.ansible/cp/%h-%r 'groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master' Expected results: Upgrade should succeed. Additional info: Have reproduced locally in all in one environment. Was able to workaround and complete upgrade in lab environment by modifying the with_items to contain fqdn of master. #with_items: groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master with_items: master.lab I haven't dug in to see where exactly "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master" is going wrong. Are there any side affects from simply setting just the etcd master hostname there?
Which version of openshift-anible are you using. I hit same issue with openshift-ansible-3.2.37-1.git.0.8f013d0.el7.noarch https://bugzilla.redhat.com/show_bug.cgi?id=1391805
ansible-2.2.0.0-0.62.rc1.el7.noarch openshift-ansible-3.2.36-1.git.0.164eb4c.el7.noarch
*** Bug 1391805 has been marked as a duplicate of this bug. ***
Fixed in https://github.com/openshift/openshift-ansible/pull/2715
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:2778