Bug 1936804 - nova-migration-wrapper to allow using virt-ssh-helper
Summary: nova-migration-wrapper to allow using virt-ssh-helper
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 17.0 (Wallaby)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: beta
: 17.0
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On: 1926602
Blocks: 2029501
TreeView+ depends on / blocked
 
Reported: 2021-03-09 08:23 UTC by Martin Schuppert
Modified: 2023-03-21 19:40 UTC (History)
23 users (show)

Fixed In Version: openstack-nova-23.1.1-0.20220217110407.57bf72b.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1926602
: 2029501 (view as bug list)
Environment:
Last Closed: 2022-09-21 12:14:01 UTC
Target Upstream Version: Wallaby
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1918250 0 None None None 2021-03-11 07:09:30 UTC
RDO 36142 0 None None None 2021-10-09 17:01:07 UTC
Red Hat Issue Tracker OSP-3476 0 None None None 2021-12-06 16:34:50 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:14:32 UTC

Description Martin Schuppert 2021-03-09 08:23:09 UTC
This clone is to track the change to nova-migration-wrapper to
support virt-ssh-helper.

+++ This bug was initially created as a clone of Bug #1926602 +++

Description of problem:
Live Migration failure: operation failed: Failed to connect to remote libvirt URI

Version-Release number of selected component (if applicable):
tripleo-ansible-0.5.1-2.20201223225653.c876e30.el8ost.1.noarch
openstack-nova-migration-20.4.2-2.20201224134938.81a3f4b.el8ost.1.noarch
openstack-nova-compute-20.4.2-2.20201224134938.81a3f4b.el8ost.1.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP16.2 with 1 controller, 3 compute nodes, cinder with nfs storage.
2. Create VM from volume and check it's running on overcloud-novacompute-1
$ openstack server list|grep vol
| 0404929c-b8db-4535-a4e0-bca31c3aaff1 | asb-vm-qcow2-vol  | ACTIVE | asb-net1=192.168.33.135                                                                                              |           |        |
3. Try to live migrate asb-vm-qcow2-vol from overcloud-novacompute-1 to overcloud-novacompute-2, the command line return without error
(overcloud) [stack@dell-per730-44 ~]$ openstack server migrate --live-migration asb-vm-qcow2-vol
(overcloud) [stack@dell-per730-44 ~]$ echo $?
0

4. Check the log in nova-compute.log, there is "error: Failed to connect to remote libvirt URI"

[heat-admin@overcloud-novacompute-1 ~]$ sudo tail -f /var/log/containers/nova/nova-compute.log
2021-02-09 07:46:41.806 7 INFO nova.compute.manager [-] [instance: 0404929c-b8db-4535-a4e0-bca31c3aaff1] Took 3.26 seconds for pre_live_migration on destination host overcloud-novacompute-2.localdomain.
2021-02-09 07:46:42.488 7 ERROR nova.virt.libvirt.driver [-] [instance: 0404929c-b8db-4535-a4e0-bca31c3aaff1] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://nova_migration.localdomain:2022/system?keyfile=/etc/nova/migration/identity: End of file while reading data: Forbidden: Input/output error: libvirt.libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+ssh://nova_migration.localdomain:2022/system?keyfile=/etc/nova/migration/identity: End of file while reading data: Forbidden: Input/output error
2021-02-09 07:46:42.716 7 ERROR nova.virt.libvirt.driver [-] [instance: 0404929c-b8db-4535-a4e0-bca31c3aaff1] Migration operation has aborted
2021-02-09 07:46:42.737 7 INFO nova.compute.manager [-] [instance: 0404929c-b8db-4535-a4e0-bca31c3aaff1] Swapping old allocation on dict_keys(['117f60e4-6c4b-4ed6-88c3-5b7add7d197c']) held by migration cfd0a7dd-0e7a-47cb-ab30-e322ecd42d23 for instance

Actual results:
In step3: the command line return without error
In step4: the live migration is failed

Expected results:
In step3: the command line return error, if the live migration is failed
In step4: the live migration should success and no error in the nova-compute.log

Additional info:

--- Additional comment from David Vallee Delisle on 2021-03-08 16:56:33 UTC ---

Mar 08 16:52:39 overcloud-novacompute-1 nova_migration_wrapper[240622]: Denying connection='192.168.24.18 54668 192.168.24.9 2022' command=['sh', '-c', "'which", 'virt-ssh-helper', '1>/dev/null', '2>&1;', 'if', 'test', '$?', '=', '0;', 'then', '', '', '', '', 'virt-ssh-helper', "'qemu:///system';", 'else', '', '', '', 'if', "'nc'", '-q', '2>&1', '|', 'grep', '"requires', 'an', 'argument"', '>/dev/null', '2>&1;', 'then', 'ARG=-q0;else', "ARG=;fi;'nc'", '$ARG', '-U', '/var/run/libvirt/libvirt-sock;', "fi'"]

At the end of /bin/nova-migration-wrapper:
~~~
if command == live_migration_tunnel_cmd:
~~

After adding debug in the wrapper:
~~~
command: sh -c 'which virt-ssh-helper 1>/dev/null 2>&1; if test $? = 0; then     virt-ssh-helper 'qemu:///system'; else    if 'nc' -q 2>&1 | grep "requires an argument" >/dev/null 2>&1; then ARG=-q0;else ARG=;fi;'nc' $ARG -U /var/run/libvirt/libvirt-sock; fi'
~~~

~~~
tunnelcmd: sh -c 'if 'nc' -q 2>&1 | grep "requires an argument" >/dev/null 2>&1; then ARG=-q0;else ARG=;fi;'nc' $ARG -U /var/run/libvirt/libvirt-sock'
~~~

--- Additional comment from David Vallee Delisle on 2021-03-08 17:04:11 UTC ---

This looks like libvirt is using this ssh-helper since this [1] patch.


[1] https://listman.redhat.com/archives/libvir-list/2020-September/msg01394.html

--- Additional comment from Kashyap Chamarthy on 2021-03-08 17:41:16 UTC ---

Summarizing IRC discussion, in addition to what David wrote in comment#20:

- libvirt first seems to check for the `virt-ssh-helper` binary, if it's
  not present, then it falls back to `nc`.

- The code fow where the 'nova-migration-wrapper' script looks for the 
  "nc" binary is here[1]

  libvirt used to first check for `nc` (netcat).  But these two libvirt 
  commits[2][3] -- which are present in the libvirt build used in this
  bug -- have now changed it to first look for `virt-ssh-helper`, if it
  not available, then fall back to `nc`  (see David's comment#18 for the
  debug logging).

  So this trips up the 'nova-migration-wrapper'

- Workaround to force-use "netcat" (`nc`) by appending to the
  migration URI: "&proxy=netcat", so the `diff` of the URL:

  - qemu+ssh://nova_migration.redhat.local:2022/system?keyfile=/etc/nova/migration/identity
  + qemu+ssh://nova_migration.redhat.local:2022/system?keyfile=/etc/nova/migration/identity&proxy=netcat

                * * *

We still need to take `virt-ssh-helper` into account, because long-term
it is needed for the modular libvirt daemons to work properly.  Here's
the deployment RFE[4] for it.


[1] https://github.com/rdo-packages/nova-distgit/blob/rpm-master/nova-migration-wrapper#L32

[2] https://libvirt.org/git/?p=libvirt.git;a=commit;h=f8ec7c842d (rpc: 
    use new virt-ssh-helper binary for remote tunnelling, 2020-07-08)

[3] https://libvirt.org/git/?p=libvirt.git;a=commit;h=7d959c302d (rpc: 
    Fix virt-ssh-helper detection, 2020-10-27)

[4] https://bugzilla.redhat.com/show_bug.cgi?id=1920022 — [RFE]
    [Deployment] Adjust OSP's deployment configuration to use libvirt's
    modular daemons (likely to be the default in RHEL-9)

Comment 12 errata-xmlrpc 2022-09-21 12:14:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.