Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1924106

Summary: FFU 13-16.1: Overcloud FFU prepare fails due to host unreachable
Product: Red Hat OpenStack Reporter: Eduardo Olivares <eolivare>
Component: python-tripleoclientAssignee: mathieu bultel <mbultel>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: acanan, ccamposr, cjeanner, drosenfe, elicohen, hbrock, igallagh, itbrown, jjoyce, jpretori, jschluet, jslagle, ltoscano, mbultel, mburns, michele, myadla, nweinber, pkesavar, sathlang, sbekkerm, spower, tvignaud
Target Milestone: z4Keywords: Regression, Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-tripleoclient-12.3.2-1.20201114043247.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-17 15:36:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1893113, 1895758, 1901157, 1906698    

Description Eduardo Olivares 2021-02-02 16:06:19 UTC
Description of problem:
The following error is raised during Overcloud FFU prepare:
2021-01-31 22:15:54 | fatal: [undercloud]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"localhost\". Make sure this host can be reached over ssh: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.\r\nno such identity: /home/stack/.ssh/id_rsa: No such file or directory\r\ntripleo-admin@localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true}

It had been reported before in this Jira ticket: https://projects.engineering.redhat.com/browse/UPG-2562


It was found in several FFU jobs, all of them upgrading to the latest 16.1 build, RHOS-16.1-RHEL-8-20210129.n.0:
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_3db_3msg_2net_2comp_3ceph-ipv6-vxlan-composable/81/undercloud-0/home/stack/overcloud_upgrade_prepare_containers.log.gz
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv6-vxlan-HA-no-ceph/90/undercloud-0/home/stack/overcloud_upgrade_prepare_containers.log.gz
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp_1ipa-ipv4-vxlan-HA-no-ceph-tls-everywhere/52/undercloud-0/home/stack/overcloud_upgrade_prepare_containers.log.gz
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-HA-no-ceph-ovn-vlan-provider-network/18/undercloud-0/home/stack/overcloud_upgrade_prepare_containers.log.gz


I compared the build failing with the previous build for the same job and I think I found something that could explain the issue:
fails (http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_3db_3msg_2net_2comp_3ceph-ipv6-vxlan-composable/81/undercloud-0/var/lib/mistral/a3c756f0-4ca5-4805-9c6a-dad1014140e9/inventory.yaml.gz):
Undercloud:
  hosts:
    undercloud: {}
  vars:
    ansible_connection: ssh
    ansible_host: localhost
    ansible_python_interpreter: /usr/bin/python3
    ansible_remote_tmp: /tmp/ansible-${USER}
    ansible_ssh_private_key_file: /home/stack/.ssh/id_rsa
    ansible_ssh_user: tripleo-admin

passes (http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_3db_3msg_2net_2comp_3ceph-ipv6-vxlan-composable/80/undercloud-0/var/lib/mistral/e8cbadc2-a691-4d5f-afa2-1f7237431630/inventory.yaml.gz):
Undercloud:
  hosts:
    undercloud: {}
  vars:
    ansible_connection: ssh
    ansible_host: localhost
    ansible_python_interpreter: /usr/bin/python3
    ansible_remote_tmp: /tmp/ansible-${USER}
    ansible_ssh_private_key_file: /var/lib/mistral/.ssh/tripleo-admin-rsa
    ansible_ssh_user: tripleo-admin


I checked this on my env after the FFU job failed:
[root@undercloud-0 ~]# ssh -i /home/stack/.ssh/id_rsa tripleo-admin@localhost
tripleo-admin@localhost's password:

[root@undercloud-0 ~]# ssh -i  /var/lib/mistral/.ssh/tripleo-admin-rsa tripleo-admin@localhost
[tripleo-admin@undercloud-0 ~]$





Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20210129.n.0
It didn't fail with previous 16.1 build RHOS-16.1-RHEL-8-20210120.n.1

How reproducible:
it seems it happens 100%

Steps to Reproduce:
1. run any? 13-16.1 ffu job
2.
3.

Comment 11 David Rosenfeld 2021-02-11 13:43:13 UTC
No longer see the error: fatal: [undercloud]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"localhost\"
when executing: openstack overcloud external-upgrade run -y --stack overcloud --tags container_image_prepare

Comment 23 errata-xmlrpc 2021-03-17 15:36:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817