Bug 1924106 - FFU 13-16.1: Overcloud FFU prepare fails due to host unreachable
Summary: FFU 13-16.1: Overcloud FFU prepare fails due to host unreachable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z4
: 16.1 (Train on RHEL 8.2)
Assignee: mathieu bultel
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks: 1893113 1895758 1901157 1906698
TreeView+ depends on / blocked
 
Reported: 2021-02-02 16:06 UTC by Eduardo Olivares
Modified: 2021-03-17 15:38 UTC (History)
23 users (show)

Fixed In Version: python-tripleoclient-12.3.2-1.20201114043247.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-17 15:36:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:0817 0 None None None 2021-03-17 15:38:22 UTC

Description Eduardo Olivares 2021-02-02 16:06:19 UTC
Description of problem:
The following error is raised during Overcloud FFU prepare:
2021-01-31 22:15:54 | fatal: [undercloud]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"localhost\". Make sure this host can be reached over ssh: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.\r\nno such identity: /home/stack/.ssh/id_rsa: No such file or directory\r\ntripleo-admin@localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true}

It had been reported before in this Jira ticket: https://projects.engineering.redhat.com/browse/UPG-2562


It was found in several FFU jobs, all of them upgrading to the latest 16.1 build, RHOS-16.1-RHEL-8-20210129.n.0:
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_3db_3msg_2net_2comp_3ceph-ipv6-vxlan-composable/81/undercloud-0/home/stack/overcloud_upgrade_prepare_containers.log.gz
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv6-vxlan-HA-no-ceph/90/undercloud-0/home/stack/overcloud_upgrade_prepare_containers.log.gz
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp_1ipa-ipv4-vxlan-HA-no-ceph-tls-everywhere/52/undercloud-0/home/stack/overcloud_upgrade_prepare_containers.log.gz
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-HA-no-ceph-ovn-vlan-provider-network/18/undercloud-0/home/stack/overcloud_upgrade_prepare_containers.log.gz


I compared the build failing with the previous build for the same job and I think I found something that could explain the issue:
fails (http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_3db_3msg_2net_2comp_3ceph-ipv6-vxlan-composable/81/undercloud-0/var/lib/mistral/a3c756f0-4ca5-4805-9c6a-dad1014140e9/inventory.yaml.gz):
Undercloud:
  hosts:
    undercloud: {}
  vars:
    ansible_connection: ssh
    ansible_host: localhost
    ansible_python_interpreter: /usr/bin/python3
    ansible_remote_tmp: /tmp/ansible-${USER}
    ansible_ssh_private_key_file: /home/stack/.ssh/id_rsa
    ansible_ssh_user: tripleo-admin

passes (http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_3db_3msg_2net_2comp_3ceph-ipv6-vxlan-composable/80/undercloud-0/var/lib/mistral/e8cbadc2-a691-4d5f-afa2-1f7237431630/inventory.yaml.gz):
Undercloud:
  hosts:
    undercloud: {}
  vars:
    ansible_connection: ssh
    ansible_host: localhost
    ansible_python_interpreter: /usr/bin/python3
    ansible_remote_tmp: /tmp/ansible-${USER}
    ansible_ssh_private_key_file: /var/lib/mistral/.ssh/tripleo-admin-rsa
    ansible_ssh_user: tripleo-admin


I checked this on my env after the FFU job failed:
[root@undercloud-0 ~]# ssh -i /home/stack/.ssh/id_rsa tripleo-admin@localhost
tripleo-admin@localhost's password:

[root@undercloud-0 ~]# ssh -i  /var/lib/mistral/.ssh/tripleo-admin-rsa tripleo-admin@localhost
[tripleo-admin@undercloud-0 ~]$





Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20210129.n.0
It didn't fail with previous 16.1 build RHOS-16.1-RHEL-8-20210120.n.1

How reproducible:
it seems it happens 100%

Steps to Reproduce:
1. run any? 13-16.1 ffu job
2.
3.

Comment 11 David Rosenfeld 2021-02-11 13:43:13 UTC
No longer see the error: fatal: [undercloud]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"localhost\"
when executing: openstack overcloud external-upgrade run -y --stack overcloud --tags container_image_prepare

Comment 23 errata-xmlrpc 2021-03-17 15:36:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817


Note You need to log in before you can comment on or make changes to this bug.