Bug 1949290 - cold migration and resize failing in nova-compute: ssh: Host key verification failed
Summary: cold migration and resize failing in nova-compute: ssh: Host key verification...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z6
: 16.1 (Train on RHEL 8.2)
Assignee: RHOS Maint
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks: 1911891 1974985 2058027
TreeView+ depends on / blocked
 
Reported: 2021-04-13 21:22 UTC by Pavel Sedlák
Modified: 2022-02-24 10:16 UTC (History)
14 users (show)

Fixed In Version: tripleo-ansible-0.5.1-1.20210323173504.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2058027 (view as bug list)
Environment:
Last Closed: 2021-05-26 11:43:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 923403 0 None None None 2021-04-14 06:19:52 UTC
OpenStack gerrit 786159 0 None NEW Use ansible_facts in tripleo_ssh_known_hosts role 2021-04-14 14:47:12 UTC
Red Hat Issue Tracker OSP-2621 0 None None None 2022-02-24 10:16:18 UTC
Red Hat Product Errata RHSA-2021:2119 0 None None None 2021-05-26 11:44:24 UTC

Description Pavel Sedlák 2021-04-13 21:22:43 UTC
Tempest tests for cold migration and resize fail:
> tempest.lib.exceptions.TimeoutException: Request timed out
> Details: (ServerDiskConfigTestJSON:test_resize_server_from_auto_to_manual)
> Server b592c193-88cd-4958-bf15-44b90b6531ed failed to reach VERIFY_RESIZE status and task state "None" within the required time (300 s).
> Current status: ACTIVE. Current task state: None.

Reproducible, in OSP CI phase2, though possibly not always / not in all setups.

Nova compute log shows:
> 2021-04-13 18:12:34.754 8 DEBUG oslo_concurrency.processutils [req-f7a2d6f8-51a7-4f4c-9f47-9640018c0b52 314612978bc24e9eb344156a3fc7f9b8 b0fe0b6a054f468e81b851d79e358729 - default default] 'ssh -o BatchMode=yes 172.17.1.115 mkdir -p /var/lib/nova/instances/b592c193-88cd-4958-bf15-44b90b6531ed' failed. Not Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:457
> 2021-04-13 18:12:34.792 8 INFO nova.compute.manager [req-f7a2d6f8-51a7-4f4c-9f47-9640018c0b52 314612978bc24e9eb344156a3fc7f9b8 b0fe0b6a054f468e81b851d79e358729 - default default] [instance: b592c193-88cd-4958-bf15-44b90b6531ed] Setting instance back to active after: Instance rollback performed due to: Resize error: not able to execute ssh command: Unexpected error while running command.
> Command: ssh -o BatchMode=yes 172.17.1.115 mkdir -p /var/lib/nova/instances/b592c193-88cd-4958-bf15-44b90b6531ed
> Exit code: 255
> Stdout: ''
> Stderr: 'Host key verification failed.\r\n'

Possibly could be same issue as https://bugs.launchpad.net/tripleo/+bug/1923403 ?


Versions from undercloud-0:
> ansible.noarch                                2.9.19-1.el8ae                                  @rhosp-ansible-2.9       
> openstack-tempest.noarch                      1:24.0.0-1.20201113224606.c73e6b1.el8ost        @rhelosp-16.1            
> openstack-tripleo-common.noarch               11.4.1-1.20210407183434.75bd92a.el8ost          @rhelosp-16.1            
> openstack-tripleo-common-containers.noarch    11.4.1-1.20210407183434.75bd92a.el8ost          @rhelosp-16.1            
> openstack-tripleo-heat-templates.noarch       11.3.2-1.20210408163446.29a02c1.el8ost          @rhelosp-16.1            
> openstack-tripleo-image-elements.noarch       10.6.2-1.20201113215051.7dc0fa1.el8ost          @rhelosp-16.1            
> openstack-tripleo-puppet-elements.noarch      11.2.2-1.20201114042506.f061f90.el8ost          @rhelosp-16.1            
> openstack-tripleo-validations.noarch          11.3.2-1.20210408103437.4db92ba.el8ost          @rhelosp-16.1            
> tripleo-ansible.noarch                        0.5.1-1.20210323173503.902c3c8.el8ost           @rhelosp-16.1            

Versions from compute-1:
> ansible.noarch                                2.9.19-1.el8ae                                  @rhos-16.1-rhel-8-ansible      
> puppet-nova.noarch                            15.6.1-1.20201114010908.51a6857.el8ost          @rhos-16.1                     
> ### podman images compute:
> undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute                16.1_20210409.1   452734cc0544   8 hours ago    1.94 GB

Versions from compute-1 container nova-compute:
> 2021-04-13T01:37:06Z SUBDEBUG Installed: openstack-nova-common-1:20.4.1-1.20210406183726.1ee93b9.el8ost.noarch
> 2021-04-13T11:08:36Z SUBDEBUG Installed: openstack-nova-compute-1:20.4.1-1.20210406183726.1ee93b9.el8ost.noarch
> 2021-04-09T13:49:12Z SUBDEBUG Installed: puppet-tripleo-11.5.0-1.20210406223722.f716ef5.el8ost.noarch
> 2021-04-09T13:49:12Z SUBDEBUG Installed: openstack-tripleo-common-container-base-11.4.1-1.20210407183434.75bd92a.el8ost.noarch

Comment 2 Martin Schuppert 2021-04-14 06:19:53 UTC
This was introduced by https://bugzilla.redhat.com/show_bug.cgi?id=1911891, where setting ANSIBLE_INJECT_FACT_VARS=False the tripleo_ssh_known_hosts misses ansible_ssh_host_key_rsa_public information.

Comment 3 David Peacock 2021-04-14 14:48:59 UTC
Waiting for https://review.opendev.org/c/openstack/tripleo-ansible/+/786159 to hit master; should be the fix once it's merged and backported.

Comment 7 David Rosenfeld 2021-04-16 13:14:17 UTC
The Phase 2 jobs referenced in comment 1 that found this BZ are passing now.

Comment 18 errata-xmlrpc 2021-05-26 11:43:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenStack Platform 16.1.6 (tripleo-ansible) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2119


Note You need to log in before you can comment on or make changes to this bug.