Bug 1392169

Summary: Evaluate etcd_hosts_to_backup task causes upgrade failure from 3.1 to 3.2
Product: OpenShift Container Platform Reporter: Brendan Mchugh <bmchugh>
Component: Cluster Version OperatorAssignee: Andrew Butcher <abutcher>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.2.1CC: anli, aos-bugs, dgoodwin, jokerman, mmccomas, tobias.genannt
Target Milestone: ---   
Target Release: 3.2.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-15 19:11:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Brendan Mchugh 2016-11-07 07:57:01 UTC
Description of problem:

Ansible upgrade from 3.1 to 3.2 is failing due to issue with /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre.yml:30.

Apparently it adds a hostname "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master" which causes the upgrade to fail.
It fails because the host "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master" is unreachable.


TASK [Evaluate etcd_hosts_to_backup] *******************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre.yml:30
creating host via 'add_host': hostname=groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master
changed: [localhost] => (item=groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master) => {
    "add_host": {
        "groups": [
            "etcd_hosts_to_backup"
        ], 
        "host_name": "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master", 
        "host_vars": {}
    }, 
    "changed": true, 
    "invocation": {
        "module_args": {
            "groups": "etcd_hosts_to_backup", 
            "name": "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master"
        }, 
        "module_name": "add_host"
    }, 
    "item": "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master"
}


Version-Release number of selected component (if applicable):
Openshift 3.2
openshift-ansible-playbooks-3.2.36-1.git.0.164eb4c.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.yml
2.
3.

Actual results:

TASK [setup] *******************************************************************
Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/system/setup.py
<groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master> ESTABLISH SSH CONNECTION FOR USER: None
<groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/%h-%r 'groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master' '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1478273458.8-15024504637190 `" && echo ansible-tmp-1478273458.8-15024504637190="` echo $HOME/.ansible/tmp/ansible-tmp-1478273458.8-15024504637190 `" ) && sleep 0'"'"''
fatal: [groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master]: UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: OpenSSH_6.6.1, OpenSSL 1.0.1e-fips 11 Feb 2013\r\ndebug1: Reading configuration data /root/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 56: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\nControlPath too long\r\n",
    "unreachable": true
}
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.retry

PLAY RECAP *********************************************************************
groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master : ok=0    changed=0    unreachable=1    failed=0
localhost                  : ok=12   changed=7    unreachable=0    failed=0
infra01.ose : ok=81   changed=1    unreachable=0    failed=0
master01.ose : ok=91   changed=1    unreachable=0    failed=0
node01.ose : ok=81   changed=1    unreachable=0    failed=0
node02.ose : ok=81   changed=1    unreachable=0    failed=0
node03.ose : ok=81   changed=1    unreachable=0    failed=0
node04.ose : ok=81   changed=1    unreachable=0    failed=0


The "ControlPath too long" failure is not the real problem, but it does help show the hostname being wrongly set.

ControlPath=/root/.ansible/cp/%h-%r 'groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master'


Expected results:
Upgrade should succeed.

Additional info:
Have reproduced locally in all in one environment.

Was able to workaround and complete upgrade in lab environment by modifying the with_items to contain fqdn of master.

#with_items: groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master
with_items: master.lab


I haven't dug in to see where exactly "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master" is going wrong.
Are there any side affects from simply setting just the etcd master hostname there?

Comment 2 Anping Li 2016-11-07 08:07:38 UTC
Which version of openshift-anible are you using. I hit same issue with openshift-ansible-3.2.37-1.git.0.8f013d0.el7.noarch
https://bugzilla.redhat.com/show_bug.cgi?id=1391805

Comment 3 Brendan Mchugh 2016-11-07 08:14:12 UTC
ansible-2.2.0.0-0.62.rc1.el7.noarch
openshift-ansible-3.2.36-1.git.0.164eb4c.el7.noarch

Comment 4 Devan Goodwin 2016-11-08 12:33:25 UTC
*** Bug 1391805 has been marked as a duplicate of this bug. ***

Comment 5 Andrew Butcher 2016-11-08 20:26:48 UTC
Fixed in https://github.com/openshift/openshift-ansible/pull/2715

Comment 8 errata-xmlrpc 2016-11-15 19:11:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:2778