Description of problem: backup etcd failed when upgrade the embedded etcd Env Version-Release number of selected component (if applicable): atomic-openshift-utils-3.2.37-1.git.0.8f013d0.el7.noarch ansible-2.2.0.0-0.100.el7.noarch How reproducible: always Steps to Reproduce: 1. install OCP-3.2 2. ugprade to OCP-3.2 ansible-playbook /root/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.yml Actual results: 2. PLAY [Backup etcd] ************************************************************* TASK [setup] ******************************************************************* Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/system/setup.py <groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master> ESTABLISH SSH CONNECTION FOR USER: None <groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r 'groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master' '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1478233944.01-82732812709270 `" && echo ansible-tmp-1478233944.01-82732812709270="` echo $HOME/.ansible/tmp/ansible-tmp-1478233944.01-82732812709270 `" ) && sleep 0'"'"'' fatal: [groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master]: UNREACHABLE! => { "changed": false, "msg": "Failed to connect to the host via ssh: ControlPath too long\r\n", "unreachable": true } to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.retry PLAY RECAP ********************************************************************* groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master : ok=0 changed=0 unreachable=1 failed=0 localhost : ok=13 changed=8 unreachable=0 failed=0 openshift-223.lab.eng.nay.redhat.com : ok=87 changed=1 unreachable=0 failed=0 openshift-224.lab.eng.nay.redhat.com : ok=77 changed=1 unreachable=0 failed=0 Expected results: Additional info: [OSEv3:children] masters nodes nfs [OSEv3:vars] ansible_ssh_user=root openshift_master_default_subdomain_enable=true openshift_master_default_subdomain=1104-43x.qe.rhcloud.com openshift_auth_type=htpasswd openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/htpasswd'}] deployment_type=openshift-enterprise oreg_url=openshift3/ose-${component}:${version} osm_use_cockpit=false osm_cockpit_plugins=['cockpit-kubernetes'] openshift_node_kubelet_args={"minimum-container-ttl-duration": ["10s"], "maximum-dead-containers-per-container": ["1"], "maximum-dead-containers": ["20"], "image-gc-high-threshold": ["80"], "image-gc-low-threshold": ["70"]} openshift_hosted_registry_selector="role=node,registry=enabled" openshift_hosted_router_selector="role=node,router=enabled" debug_level=5 openshift_set_hostname=true openshift_override_hostname_check=true openshift_hosted_registry_storage_kind=nfs openshift_hosted_registry_storage_nfs_options="*(rw,root_squash,sync,no_wdelay)" openshift_hosted_registry_storage_nfs_directory=/var/lib/exports openshift_hosted_registry_storage_volume_name=regpv openshift_hosted_registry_storage_access_modes=["ReadWriteMany"] openshift_hosted_registry_storage_volume_size=17G openshift_docker_additional_registries=virt-openshift-05.lab.eng.nay.redhat.com:5000 openshift_docker_insecure_registries=virt-openshift-05.lab.eng.nay.redhat.com:5000 [masters] openshift-223.lab.eng.nay.redhat.com ansible_user=root ansible_ssh_user=root openshift_public_hostname=openshift-223.lab.eng.nay.redhat.com openshift_hostname=openshift-223.lab.eng.nay.redhat.com [nodes] openshift-223.lab.eng.nay.redhat.com ansible_user=root ansible_ssh_user=root openshift_public_hostname=openshift-223.lab.eng.nay.redhat.com openshift_hostname=openshift-223.lab.eng.nay.redhat.com openshift_node_labels="{'role': 'node'}" openshift-224.lab.eng.nay.redhat.com ansible_user=root ansible_ssh_user=root openshift_public_hostname=openshift-224.lab.eng.nay.redhat.com openshift_hostname=openshift-224.lab.eng.nay.redhat.com openshift_node_labels="{'role': 'node','registry': 'enabled','router': 'enabled'}" [nfs] openshift-223.lab.eng.nay.redhat.com ansible_user=root ansible_ssh_user=root
hit same issue when upgrade openshift 3.2 with the external etcd
Created attachment 1217294 [details] Upgrade logs
I believe this is a known issue with ansible and hosts with long hostnames, for example we have to work around this when using AWS by editing by setting /etc/ansible/ansible.cfg param: control_path = %(directory)s/ansible-ssh-%%C More information available here: http://docs.ansible.com/ansible/intro_configuration.html#control-path
It seems the control path doesn't work. and I didn't use long hostname and home directory. the socket names seems less than 108 characters. ansible-2.2.0.0-0.100.el7.noarch openshift-ansible-3.2.37-1.git.0.8f013d0.el7.noarch PLAY [Backup etcd] ************************************************************* TASK [setup] ******************************************************************* Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/system/setup.py <groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master> ESTABLISH SSH CONNECTION FOR USER: None <groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r 'groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master' '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1478505418.18-152149785284243 `" && echo ansible-tmp-1478505418.18-152149785284243="` echo $HOME/.ansible/tmp/ansible-tmp-1478505418.18-152149785284243 `" ) && sleep 0'"'"'' fatal: [groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master]: UNREACHABLE! => { "changed": false, "msg": "Failed to connect to the host via ssh: OpenSSH_6.6.1, OpenSSL 1.0.1e-fips 11 Feb 2013\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 57: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\nControlPath too long\r\n", "unreachable": true } to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.retry PLAY RECAP ********************************************************************* groups.oo_etcd_to_config if groups.oo_etcd_to_config is defined and groups.oo_etcd_to_config | length > 0 else groups.oo_first_master : ok=0 changed=0 unreachable=1 failed=0 localhost : ok=13 changed=8 unreachable=0 failed=0 openshift-190.lab.eng.nay.redhat.com : ok=86 changed=1 unreachable=0 failed=0
In your previous comment we can see that the control path fix is not in effect: "ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r" It should be using "control_path = %(directory)s/%%h-%%r" per the link above. Also note that it must be in the [ssh_connection] of ansible.cfg, and it may be ignored if you are using custom ssh_args. Please attach /etc/ansible/ansible.cfg if the problem still persists.
May also be able to set it on CLI with the ANSIBLE_SSH_CONTROL_PATH environment variable.
ANSIBLE_SSH_CONTROL_PATH=/root/.ansible/cp/%%h-%%r example.
This looks to have surfaced with a customer and the other bugzilla has caught something we did not notice yet, closing this one as duplicate, lets resume in 1392169. *** This bug has been marked as a duplicate of bug 1392169 ***
So, depending on the generated hostname, /root/.ansible/cp/%%h-%%r could still be too long. Switching to someething like /tmp/cp/%%h-%%r could solve the problem, as could using shorter hostnames.