Bug 1392169 - Evaluate etcd_hosts_to_backup task causes upgrade failure from 3.1 to 3.2
Summary: Evaluate etcd_hosts_to_backup task causes upgrade failure from 3.1 to 3.2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.2.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.2.1
Assignee: Andrew Butcher
QA Contact: Anping Li
URL:
Whiteboard:
: 1391805 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-05 14:08 UTC by Brendan Mchugh
Modified: 2019-12-16 07:19 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-15 19:11:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2778 0 normal SHIPPED_LIVE Moderate: atomic-openshift-utils security and bug fix update 2016-11-16 00:08:29 UTC

Comment 1 Brendan Mchugh 2016-11-07 07:57:01 UTC
Description of problem:

Ansible upgrade from 3.1 to 3.2 is failing due to issue with /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre.yml:30.

Apparently it adds a hostname "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master" which causes the upgrade to fail.
It fails because the host "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master" is unreachable.


TASK [Evaluate etcd_hosts_to_backup] *******************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre.yml:30
creating host via 'add_host': hostname=groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master
changed: [localhost] => (item=groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master) => {
    "add_host": {
        "groups": [
            "etcd_hosts_to_backup"
        ], 
        "host_name": "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master", 
        "host_vars": {}
    }, 
    "changed": true, 
    "invocation": {
        "module_args": {
            "groups": "etcd_hosts_to_backup", 
            "name": "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master"
        }, 
        "module_name": "add_host"
    }, 
    "item": "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master"
}


Version-Release number of selected component (if applicable):
Openshift 3.2
openshift-ansible-playbooks-3.2.36-1.git.0.164eb4c.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.yml
2.
3.

Actual results:

TASK [setup] *******************************************************************
Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/system/setup.py
<groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master> ESTABLISH SSH CONNECTION FOR USER: None
<groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/%h-%r 'groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master' '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1478273458.8-15024504637190 `" && echo ansible-tmp-1478273458.8-15024504637190="` echo $HOME/.ansible/tmp/ansible-tmp-1478273458.8-15024504637190 `" ) && sleep 0'"'"''
fatal: [groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master]: UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: OpenSSH_6.6.1, OpenSSL 1.0.1e-fips 11 Feb 2013\r\ndebug1: Reading configuration data /root/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 56: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\nControlPath too long\r\n",
    "unreachable": true
}
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.retry

PLAY RECAP *********************************************************************
groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master : ok=0    changed=0    unreachable=1    failed=0
localhost                  : ok=12   changed=7    unreachable=0    failed=0
infra01.ose : ok=81   changed=1    unreachable=0    failed=0
master01.ose : ok=91   changed=1    unreachable=0    failed=0
node01.ose : ok=81   changed=1    unreachable=0    failed=0
node02.ose : ok=81   changed=1    unreachable=0    failed=0
node03.ose : ok=81   changed=1    unreachable=0    failed=0
node04.ose : ok=81   changed=1    unreachable=0    failed=0


The "ControlPath too long" failure is not the real problem, but it does help show the hostname being wrongly set.

ControlPath=/root/.ansible/cp/%h-%r 'groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master'


Expected results:
Upgrade should succeed.

Additional info:
Have reproduced locally in all in one environment.

Was able to workaround and complete upgrade in lab environment by modifying the with_items to contain fqdn of master.

#with_items: groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master
with_items: master.lab


I haven't dug in to see where exactly "groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master" is going wrong.
Are there any side affects from simply setting just the etcd master hostname there?

Comment 2 Anping Li 2016-11-07 08:07:38 UTC
Which version of openshift-anible are you using. I hit same issue with openshift-ansible-3.2.37-1.git.0.8f013d0.el7.noarch
https://bugzilla.redhat.com/show_bug.cgi?id=1391805

Comment 3 Brendan Mchugh 2016-11-07 08:14:12 UTC
ansible-2.2.0.0-0.62.rc1.el7.noarch
openshift-ansible-3.2.36-1.git.0.164eb4c.el7.noarch

Comment 4 Devan Goodwin 2016-11-08 12:33:25 UTC
*** Bug 1391805 has been marked as a duplicate of this bug. ***

Comment 5 Andrew Butcher 2016-11-08 20:26:48 UTC
Fixed in https://github.com/openshift/openshift-ansible/pull/2715

Comment 8 errata-xmlrpc 2016-11-15 19:11:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:2778


Note You need to log in before you can comment on or make changes to this bug.