Bug 1926602 - [OSP16.2] Live Migration failure: operation failed: Failed to connect to remote libvirt URI with libvirt-daemon-6.10
Summary: [OSP16.2] Live Migration failure: operation failed: Failed to connect to remo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 16.2 (Train)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: beta
: 16.2 (Train on RHEL 8.4)
Assignee: Martin Schuppert
QA Contact: James Parker
URL:
Whiteboard: libvirt_OSP_INT
: 1942000 1942881 (view as bug list)
Depends On:
Blocks: 1936804
TreeView+ depends on / blocked
 
Reported: 2021-02-09 07:57 UTC by chhu
Modified: 2021-09-15 07:12 UTC (History)
22 users (show)

Fixed In Version: puppet-tripleo-11.5.1-2.20210302224955.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1936804 (view as bug list)
Environment:
Last Closed: 2021-09-15 07:11:47 UTC
Target Upstream Version: Train


Attachments (Terms of Use)
nova-compute.log (4.03 KB, text/plain)
2021-02-09 08:01 UTC, chhu
no flags Details
source_novacompute-1_log.tgz (381.71 KB, application/gzip)
2021-02-23 07:45 UTC, chhu
no flags Details
target_novacompute-2_log.tgz (379.42 KB, application/gzip)
2021-02-23 07:47 UTC, chhu
no flags Details
containers-prepare-parameter-prepare.yaml and Dockfiles (4.80 KB, application/gzip)
2021-03-05 05:25 UTC, chhu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1918250 0 None None None 2021-03-09 07:24:09 UTC
OpenStack gerrit 779784 0 None NEW Fix live-migration with libvirt >= 6.8.0 2021-03-12 09:25:44 UTC
Red Hat Product Errata RHEA-2021:3483 0 None None None 2021-09-15 07:12:21 UTC

Description chhu 2021-02-09 07:57:00 UTC
Description of problem:
Live Migration failure: operation failed: Failed to connect to remote libvirt URI

Version-Release number of selected component (if applicable):
tripleo-ansible-0.5.1-2.20201223225653.c876e30.el8ost.1.noarch
openstack-nova-migration-20.4.2-2.20201224134938.81a3f4b.el8ost.1.noarch
openstack-nova-compute-20.4.2-2.20201224134938.81a3f4b.el8ost.1.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP16.2 with 1 controller, 3 compute nodes, cinder with nfs storage.
2. Create VM from volume and check it's running on overcloud-novacompute-1
$ openstack server list|grep vol
| 0404929c-b8db-4535-a4e0-bca31c3aaff1 | asb-vm-qcow2-vol  | ACTIVE | asb-net1=192.168.33.135                                                                                              |           |        |
3. Try to live migrate asb-vm-qcow2-vol from overcloud-novacompute-1 to overcloud-novacompute-2, the command line return without error
(overcloud) [stack@dell-per730-44 ~]$ openstack server migrate --live-migration asb-vm-qcow2-vol
(overcloud) [stack@dell-per730-44 ~]$ echo $?
0

4. Check the log in nova-compute.log, there is "error: Failed to connect to remote libvirt URI"

[heat-admin@overcloud-novacompute-1 ~]$ sudo tail -f /var/log/containers/nova/nova-compute.log
2021-02-09 07:46:41.806 7 INFO nova.compute.manager [-] [instance: 0404929c-b8db-4535-a4e0-bca31c3aaff1] Took 3.26 seconds for pre_live_migration on destination host overcloud-novacompute-2.localdomain.
2021-02-09 07:46:42.488 7 ERROR nova.virt.libvirt.driver [-] [instance: 0404929c-b8db-4535-a4e0-bca31c3aaff1] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://nova_migration@overcloud-novacompute-2.internalapi.localdomain:2022/system?keyfile=/etc/nova/migration/identity: End of file while reading data: Forbidden: Input/output error: libvirt.libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+ssh://nova_migration@overcloud-novacompute-2.internalapi.localdomain:2022/system?keyfile=/etc/nova/migration/identity: End of file while reading data: Forbidden: Input/output error
2021-02-09 07:46:42.716 7 ERROR nova.virt.libvirt.driver [-] [instance: 0404929c-b8db-4535-a4e0-bca31c3aaff1] Migration operation has aborted
2021-02-09 07:46:42.737 7 INFO nova.compute.manager [-] [instance: 0404929c-b8db-4535-a4e0-bca31c3aaff1] Swapping old allocation on dict_keys(['117f60e4-6c4b-4ed6-88c3-5b7add7d197c']) held by migration cfd0a7dd-0e7a-47cb-ab30-e322ecd42d23 for instance

Actual results:
In step3: the command line return without error
In step4: the live migration is failed

Expected results:
In step3: the command line return error, if the live migration is failed
In step4: the live migration should success and no error in the nova-compute.log

Additional info:

Comment 1 chhu 2021-02-09 08:01:23 UTC
Created attachment 1755862 [details]
nova-compute.log

Comment 2 David Vallee Delisle 2021-02-19 15:46:39 UTC
Can we have full sosreports from the compute node please, or access to the node?

Comment 3 chhu 2021-02-23 07:45:35 UTC
Created attachment 1758774 [details]
source_novacompute-1_log.tgz

Comment 4 chhu 2021-02-23 07:47:23 UTC
Created attachment 1758775 [details]
target_novacompute-2_log.tgz

Comment 6 chhu 2021-02-24 04:39:42 UTC
Hi, Jiri

This bug blocked the live migration on OSP16.2,
I attached the libvirtd logs in files: source_novacompute-1_log.tgz, target_novacompute-2_log.tgz,
will you please help to confirm if it need libvirt code change? Many thanks!

Please notice: 
In order to collect the libvirtd debug log, after reproduce this issue,
I systemctl restart tripleo_nova_libvirt.service, then reran the reproduce steps.

Regards,
Chenli Hu

Comment 7 Jiri Denemark 2021-02-24 09:11:14 UTC
This is most likely a configuration issue on the destination host. Check that
the nova_migration user is allowed to properly connect to libvirtd on that
hosts. If not, check authorization or access control method enabled in
libvirtd.conf and their configuration.

Comment 8 chhu 2021-02-25 01:50:12 UTC
Thanks Jiri! 
Normally we don't need to do any configuration for live migration in OSP after deployment, 
then the code changed should be in OSP, let's waiting for their debugging ...

Comment 13 chhu 2021-03-05 05:25:05 UTC
Created attachment 1760815 [details]
containers-prepare-parameter-prepare.yaml and Dockfiles

Comment 23 Martin Schuppert 2021-03-23 12:00:19 UTC
*** Bug 1942000 has been marked as a duplicate of this bug. ***

Comment 24 Jakub Libosvar 2021-03-30 13:44:19 UTC
*** Bug 1942881 has been marked as a duplicate of this bug. ***

Comment 34 errata-xmlrpc 2021-09-15 07:11:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:3483


Note You need to log in before you can comment on or make changes to this bug.