Bug 1767894 - [RHOS 16] Live Migration failure: operation failed: Failed to connect to remote libvirt URI
Summary: [RHOS 16] Live Migration failure: operation failed: Failed to connect to remo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 16.0 (Train on RHEL 8.1)
Assignee: Martin Schuppert
QA Contact: Archit Modi
URL:
Whiteboard:
Depends On:
Blocks: 1716335
TreeView+ depends on / blocked
 
Reported: 2019-11-01 15:42 UTC by Archit Modi
Modified: 2022-08-11 08:49 UTC (History)
8 users (show)

Fixed In Version: tripleo-ansible-0.4.1-0.20191120040251.25a5df8.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-06 14:42:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1852064 0 None None None 2019-11-11 11:33:33 UTC
OpenStack gerrit 694902 0 'None' MERGED Fix ssh_known_hosts hostname entries 2021-02-21 23:59:14 UTC
Red Hat Issue Tracker OSP-6087 0 None None None 2022-08-11 08:49:20 UTC
Red Hat Product Errata RHEA-2020:0283 0 None None None 2020-02-06 14:43:30 UTC

Description Archit Modi 2019-11-01 15:42:51 UTC
Description of problem:
Failed to connect to remote libvirt URI

How reproducible: always

Steps to Reproduce:
1. Deploy osp16 with at least 1 controller, 2 compute 
2. Migrate vm using live migration from  one compute to another 

Actual results:

2019-10-17 05:18:43.473 7 DEBUG nova.virt.libvirt.driver [-] [instance: bc2268e1-b936-4ab1-9e1a-5aed02958a0a] About to invoke the migrate API _live_migration_operation /usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:8550
2019-10-17 05:18:43.515 7 ERROR nova.virt.libvirt.driver [-] [instance: bc2268e1-b936-4ab1-9e1a-5aed02958a0a] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://nova_migration.redhat.local:2022/system?keyfile=/etc/nova/migration/identity: Cannot recv data: Host key verification failed.: Connection reset by peer: libvirt.libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+ssh://nova_migration.redhat.local:2022/system?keyfile=/etc/nova/migration/identity: Cannot recv data: Host key verification failed.: Connection reset by peer
2019-10-17 05:18:43.515 7 DEBUG nova.virt.libvirt.driver [-] [instance: bc2268e1-b936-4ab1-9e1a-5aed02958a0a] Migration operation thread notification thread_finished /usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:8906
2019-10-17 05:18:43.592 7 DEBUG oslo_concurrency.lockutils [req-3d305166-216f-4003-b698-a8d74605a3ff d142225cb3234a5fb43ca20576bc8586 e72a01a6e6804afcad97897b58c9578b - default default] Lock "c6bf45ca-4175-4095-9802-7437aeb538e9" released by "nova.compute.manager.ComputeManager.terminate_instance.<locals>.do_terminate_instance" :: held 1.230s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
2019-10-17 05:18:43.960 7 DEBUG nova.virt.libvirt.migration [-] [instance: bc2268e1-b936-4ab1-9e1a-5aed02958a0a] VM running on src, migration failed find_job_type /usr/lib/python3.6/site-packages/nova/virt/libvirt/migration.py:404
2019-10-17 05:18:43.960 7 DEBUG nova.virt.libvirt.driver [-] [instance: bc2268e1-b936-4ab1-9e1a-5aed02958a0a] Fixed incorrect job type to be 4 _live_migration_monitor /usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:8720
2019-10-17 05:18:43.960 7 ERROR nova.virt.libvirt.driver [-] [instance: bc2268e1-b936-4ab1-9e1a-5aed02958a0a] Migration operation has aborted

Expected results:
Deployment succeeded

Additional info:

Comment 5 Martin Schuppert 2019-11-11 10:46:38 UTC
Issue is the network name used in the tripleo-ssh-known-hosts ansible role [1]. Right now it adds `[{{ host }}.{{ networks[network]['name'] }}]*` which adds compute references to `internal_api` instead of `internalapi`: [compute-0.internal_api]*.
Same issue for FQDN: [compute-0.internal_api.redhat.local]*

As a result we don't have a matching key for `compute-0.internalapi.redhat.local`:

~~~
# BEGIN ANSIBLE MANAGED BLOCK
[192.168.24.51]*,[compute-0.redhat.local]*,[compute-0]*,[172.17.3.74]*,[compute-0.storage]*,[compute-0.storage.redhat.local]*,[172.17.1.121]*,[compute-0.internal_api]*,[compute-0.internal_api.redhat.local]*,[172.17.2.77]*,[compute-0.tenant]*,[compute-0.tenant.redhat.local]*, ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC4sLXKd8YjMEyofVsFpvwBvnoK34kHzgJt1dF/MBqdvxF6jLc6HlROQ5Qb6dYI4Fjt1jgxUYWR1+iVuS5d08C1JJY2EyTQxII2F6RSw3PRkt3+QWtHmfOhH7ljJ6MlPbYCUPueeIefJSx3wTZcdMCw2cLVqnx9YlRMo30uPJO2Q7zmQ2UM4TwW+x3a7PEJXFhzkXAmER3dkxwX4C832iA3riP4JQ6pPcvUX50ZdK7fWhngPb0D1UAmPrzmS8zf61c1ZWXymmjGc4sEBmjp1RS+wdH/YNTreM5sofewrXqrjIpSwHuuX+YBt/qCXSjtWW99e/CguGaa6wpxYxo03vHZIcrZoupTSC6n8UvEUfk3ZiVBQuyAytv4QQAy9NPeZFaKpyiwDm68n+th4fzB/PX54VRnSNqZXac6qr/dgBcrfXOGjdijISbUas1XHIDSzES7NpPvX6ZoLwG4mN8l2/tz3sGugsK0RtKtmS3/WstFRTkgMi3q094PU6+KHlu+e4M=
[192.168.24.51]*,[compute-1.redhat.local]*,[compute-1]*,[172.17.3.16]*,[compute-1.storage]*,[compute-1.storage.redhat.local]*,[172.17.1.60]*,[compute-1.internal_api]*,[compute-1.internal_api.redhat.local]*,[172.17.2.116]*,[compute-1.tenant]*,[compute-1.tenant.redhat.local]*, ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC1K1+8OcsnKvA0iIpirsyhM554V/2KpA/vfBRik3mxii/VAV2VbXY/pwnAyuqv+2WyjAo7cknJCIXDH8Qz98wgQM11dEmrtwiHho9QHnnTUzODBXIaZe69fQuq9YTHY2egL8SBtMG+ODX1IhRm83qe1+3aeW4HNMcYczf4i3IddO0vCE5gIygW2O4Q5MeCqe3XxQ2em1rHFa/GM0dlsso9EVt92nMfa+hXDj/u2iDFxueBWJ+qP51NkS/l0HcnfTqApcaVCdGWcjbHg4wozBR0IzF3sDj1fXNH3VnlOl9z00wg1QXaDwaShkYUxKhrYIcpsFga1KxlbBq/grK5pS0MzicS8acsogRnETVweTT0RAYYpBUIKOf4k55uRdtnckFrTFi3Fbj3wCCUMRFms9zQPuJQFmC4pK16zvLeI4hbA2JazqjZOMnSzKgBR3S0feUPTrDCyNn3vj4wZ5eW4kETOcynvSuQVKKHPifBS4FqAVtfKLdubXJFKieyhn4P66k=
[192.168.24.51]*,[controller-0.redhat.local]*,[controller-0]*,[172.17.3.83]*,[controller-0.storage]*,[controller-0.storage.redhat.local]*,[172.17.1.56]*,[controller-0.internal_api]*,[controller-0.internal_api.redhat.local]*,[172.17.2.130]*,[controller-0.tenant]*,[controller-0.tenant.redhat.local]*, ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDLxjtFeJXvYfeFgz0s5cQtiaxUHclTXm7wrssf/rwL58DAyQlZs2R8g7mqy8RpD3qy9FL9wij4D+r5ofpbYtvrO5uYgajxw9coRDORGIgw+8ffCI6Fo+Uuyi+gBQfIyRj+T3OdqDwHY1Pvd3E3ZKvcQkU/9WWujgyz6TCvHmJxOaX376RRQzfnMeodlDXFzHnGutw2tanki3CQgr0vTIq3Ifgae00z7ihiBgQqXj+a0HjjscwX7mc3S5bnGJYFIdIvEoFiQzLDCH5P6Lp2xYEq/uWi1EzL2xL+UoolSkbxoCsRPpRu6cbN7vouoyOHcMIAookJvaereWpROpsZJ1V/T0Me8ErNbX9lxesYwbMKxIdfXcO/yaprVColoLcgmcuxrfGRfpDNP9lTNpd2bEQap8/UWynHsyv3Q1EV7CB6L5CPgdi3YAy+2DV5kTWrVU9C/LKPc1penaz6ByLQPcprfHz88lAEqlQSV8lUln10m8hvqswa6k0q5TRZGvEk0Z8=
# END ANSIBLE MANAGED BLOCK
~~~

Role needs to be modified to add the `network_name` + _hostname entry to the ssh_known_hosts line:

~~~
Controller:
  hosts:
    controller-0:
      ansible_host: 192.168.24.44
      canonical_hostname: controller-0.redhat.local
      ctlplane_hostname: controller-0.ctlplane.redhat.local
      ctlplane_ip: 192.168.24.44
      deploy_server_id: a1ffc38e-cd08-42ee-a271-32b4ce82b546
      enabled_networks: [ctlplane, storage, storage_mgmt, internal_api, tenant, external,
        management]
      external_hostname: controller-0.external.redhat.local
      external_ip: 10.0.0.110
      internal_api_hostname: controller-0.internalapi.redhat.local
      internal_api_ip: 172.17.1.56
      management_ip: 192.168.24.44
      storage_hostname: controller-0.storage.redhat.local
      storage_ip: 172.17.3.83
      storage_mgmt_hostname: controller-0.storagemgmt.redhat.local
      storage_mgmt_ip: 172.17.4.117
      tenant_hostname: controller-0.tenant.redhat.local
      tenant_ip: 172.17.2.130
~~~


[1] https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo-ssh-known-hosts/tasks/main.yml#L55

Comment 16 errata-xmlrpc 2020-02-06 14:42:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283


Note You need to log in before you can comment on or make changes to this bug.