1638889 – RFE: allow live migration over unix socket

Bug 1638889 - RFE: allow live migration over unix socket

Summary: RFE: allow live migration over unix socket

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	8.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Martin Kletzander
QA Contact:	Fangge Jin
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1639597
TreeView+	depends on / blocked

Reported:	2018-10-12 17:36 UTC by David Vossel
Modified:	2021-03-04 07:34 UTC (History)
CC List:	19 users (show)
Fixed In Version:	libvirt-6.6.0-11.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1639597 (view as bug list)
Environment:
Last Closed:	2021-02-22 15:39:38 UTC
Type:	Feature Request
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	mprivozn: mirror+

Attachments	(Terms of Use)
Script for testing the forwarding (3.42 KB, text/plain) 2020-11-14 01:33 UTC, Martin Kletzander	no flags	Details
Script for python3.6 (4.04 KB, text/x-python) 2020-11-14 21:34 UTC, Martin Kletzander	no flags	Details
Test scenarios (17.60 KB, text/plain) 2020-12-17 08:29 UTC, Fangge Jin	no flags	Details
View All

Description David Vossel 2018-10-12 17:36:01 UTC

Description of problem:

Libvirt prevents performing a live migration when the destination libvirt connection is a unix socket. 

In KubeVirt, we need the ability to tunnel a live migration through a unix socket to another network namespace which then connects to the destination libvirt.  From the libvirtd perspective in the source environment, it looks like we're attempting to migrate the domain to the same host since the destination connection is a local unix socket. 


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. On the source node, create a socat connection that forwards all connections on a unix socket to a remote libvirtd instance.

Example.
socat unix-listen:local-destination-socket,reuseaddr,fork tcp:192.168.66.102:22222


2. Attempt to perform a live migration with the local unix socket 'local-destination-socket' as the destination. 

Example:
virsh migrate --copy-storage-all --tunnelled --p2p --live --xml domain.xml my-vm qemu+unix:///system?socket=local-destination-socket

Actual results:

error "Attempt to migrate guest to the same host"

Expected results:

successful migration

Comment 1 Martin Kletzander 2018-10-16 08:07:00 UTC

This particular error happens because there is no server part in the URI.  However removing that [1] is not enough to work around this, even though it's not blocked by anything.

The real problem I'm facing wvwn with that logic "fixed" is that remote driver is no longer consulted for the URI.  I have an idea where to look next, will update this BZ whenever I find something.

In the meantime, would you be so kind and tried reproducing this, but instead of a socket just listen on a tcp port?

Something along the lines of (written from head, not checked for errors):

  socat tcp-listen:12345 tcp:192.168.66.102:22222

and then:

  virsh migrate --copy-storage-all --tunnelled --p2p --live --xml domain.xml my-vm qemu+tcp://127.1:12345/system

Thanks.

[1] https://github.com/nertpinx/libvirt/commit/5f0bcbf70f716ac4432635817cd6ff5459ad5615

Comment 2 David Vossel 2018-10-16 18:16:28 UTC

"In the meantime, would you be so kind and tried reproducing this, but instead of a socket just listen on a tcp port?
Something along the lines of (written from head, not checked for errors):
socat tcp-listen:12345 tcp:192.168.66.102:22222"

Without testing I know that will work. As a workaround for this tunneling issue, we're doing the equivalent of this. 

----------------------
socat tcp-listen:127.0.0.1:12345,reuseaddr,fork unix-connect:local-destination-socket

socat unix-listen:local-destination-socket,reuseaddr,fork tcp:192.168.66.102:22222
----------------------

qemu-tcp://127.0.0.1:12345/system then works as a destination uri for a migration even though we're proxying that connection through a unix socket. This method tricks libvirt into working, but impacts performance.

Comment 3 Fabian Deutsch 2019-03-16 18:43:35 UTC

Vladik, David, does this issue still exist? If so should this be fixed for CNV?

Comment 4 David Vossel 2019-03-18 15:05:56 UTC

>Vladik, David, does this issue still exist? If so should this be fixed for CNV? 

The issue still exists. We have a work around as I described in a previous comment, but the work around adds additional buffers and copying during the migration. We're essentially adding another leg to the copy to get around this.

Comment 5 Peter Krempa 2019-03-18 15:23:48 UTC

Please note that when using --tunelled prevents from using new qemu features. While we should probably allow using a unix socket for migration using --tunelled is strongly discouraged even for normal use cases.

Comment 6 Daniel Berrangé 2019-03-18 15:30:22 UTC

(In reply to Peter Krempa from comment #5)
> Please note that when using --tunelled prevents from using new qemu
> features. While we should probably allow using a unix socket for migration
> using --tunelled is strongly discouraged even for normal use cases.

The primary reason tunnelled exists in the first place was that QEMU's migration stream was clear text. Tunnelling over libvirtd's connection enabled use of TLS / SSH for encryption.

QEMU now has native support for TLS, so the primary reason for tunnelled migration goes away.  This is good because tunnelled migration was always inherantly inefficient creating extra data copies and latency.

QEMU 4.0 is about to get support for using multiple sockets for migration. This will be a major performance boost on high capacity network links. Libvirt's tunnelling feature is limited to a single connection.

So essentially for modern QEMU tunnelling via libvirt should always be avoided as it is harmful to performance due to its archtecture & impl.

Comment 7 David Vossel 2019-03-18 15:40:46 UTC

> Please note that when using --tunelled prevents from using new qemu features. While we should probably allow using a unix socket for migration using --tunelled is strongly discouraged even for normal use cases.

The tunneled flag no longer exists in our implementation.

Even without tunneling, it doesn't appear we can specify the destination as a unix socket. The same thing can be said for the qemu connections for disk block migration.  Everything (all connections) need to be over unix sockets in our environment.

Comment 9 Martin Kletzander 2020-06-03 19:01:53 UTC

I have _something_ working, but my guess is that the information in this BZ is not up to date.  Could you tell me how are you actually running the migration?  I cannot simply modify the example from the description by just removing the --tunneled and --p2p flags as that would only use the socket for the libvirt connection and not the qemu one.  I need to make sure I replicate your setup and usage to make it work exactly how you expect, but I cannot find the information in the kubevirt codebase.

Comment 10 Peter Krempa 2020-06-03 19:24:49 UTC

Also don't forget that if --copy-storage-all is requested, the storage migration is done via NBD via a separate connection, so that one probably also needs to be converted to unix socket.

Comment 11 David Vossel 2020-06-03 19:30:18 UTC

Hey,

Here's the function that actually issues the migration client command to libvirt. [1]


This involves 3 connections for us now

port 22222 is what we're exposing for libvirt
port 49152 direct migration port
port 49153 block migration port


We are currently proxying all three of those connections through a unix socket ourselves. Ideally the goal is to remove that extra hop and have libvirt/qemu expose those connections as unix sockets directly for us.


Hopefully that provides more context to the situation as it has evolved today. 



1. https://github.com/kubevirt/kubevirt/blob/master/pkg/virt-launcher/virtwrap/manager.go#L398

Comment 12 Martin Kletzander 2020-07-10 12:41:24 UTC

Just for the sake of completeness and early review, I am trying to eliminate a workaround that looks roughly like this (situation before the fix):

1) I setup a tunnel between libvirt daemons, it listens on a unix socket (already available with no patches needed, just use `qemu+unix:///system?socket=/tmp/test-sock-driver` as the destination URI), a command that I am using is:

   i) on the source: `socat -v unix-listen:/tmp/test-sock-driver,fork tcp:destination:12344`

   ii) on the destination: `socat -v tcp-listen:12344,reuseaddr,fork unix:/var/run/libvirt/libvirt-sock`

2) I setup a tunnel between QEMU processes for port 49152 with commands:

   i) on the source: `socat -v tcp-listen:49152,reuseaddr,fork tcp:destination:12345`

   ii) on the destination: `socat -v tcp-listen:12345,reuseaddr,fork tcp:localhost:49152`

3) I setup a tunnel for QEMUs disk migration (NBD protocol):

   i) on the source: `socat -v tcp-listen:49153,reuseaddr,fork tcp:destination:12346`

   ii) on the destination: `socat -v tcp-listen:12346,reuseaddr,fork tcp:localhost:49153`

Note that the fact the actual communication between source and destination is still done over network exists only because it is easier for me to test between two remote libvirts (at least for now) and is not supposed to reflect what virt-launcher/virtwrap is doing.  That channel is not what is important here.

The command that I am using to migrate in this case is:

  virsh migrate machinename --verbose --compressed --live --copy-storage-all 'qemu+unix:///system?socket=/tmp/test-sock-driver' tcp:localhost

The `tcp:localhost` in the end only tells libvirt not to try to connect to the destination hostname/address, but rather just locally for everything.

Please verify this resembles your current workaround, at least roughly.


What I am trying to reach in the end is the same thing, but instead of the tunnels listening on TCP port they are listening on UNIX sockets.  These tunnels need not be created if both daemons are running on the same host as the only thing to make that work is to just make sure the directory with the sockets is mounted in the same place for both source and destination daemon.  The setup with non-local daemons could look like this:

1) The setup for the tunnel between libvirt daemons is the same:

   i) on the source: `socat -v unix-listen:/tmp/test-sock-driver,fork tcp:destination:12344`

   ii) on the destination: `socat -v tcp-listen:12344,reuseaddr,fork unix:/var/run/libvirt/libvirt-sock`

2) To setup a tunnel between QEMU processes listen on some UNIX socket and forward it the same way as (1):

   i) on the source: `socat -v unix-listen:/tmp/test-sock-qemu,fork tcp:destination:12345`

   ii) on the destination: `socat -v tcp-listen:12345,reuseaddr,fork unix:/tmp/test-sock-qemu`

3) To setup a tunnel for QEMUs disk migration (NBD protocol) listen on some other UNIX socket and forward it the same way as (1):

   i) on the source: `socat -v unix-listen:/tmp/test-sock-nbd,fork tcp:destination:12346`

   ii) on the destination: `socat -v tcp-listen:12346,reuseaddr,fork unix:/tmp/test-sock-nbd`


The command that to migrate the machine after this is implemented could look like this:

  virsh migrate machinename --verbose --compressed --live --copy-storage-all 'qemu+unix:///system?socket=/tmp/test-sock-driver' unix:/tmp/test-sock-qemu --disks-socket /tmp/test-sock-nbd

or (in case you prefer proper URI for the `desturi` parameter):

  virsh migrate machinename --verbose --compressed --live --copy-storage-all 'qemu+unix:///system?socket=/tmp/test-sock-driver' unix://?socket=/tmp/test-sock-qemu --disks-socket /tmp/test-sock-nbd


Please check that this end result is acceptable for you.  As far as I understand this would solve two major issues for you:

1) No need for a tunnel in case the daemons are running on the same host, just making sure the sockets are (bind) mounted properly (the libvirt socket needs to be bind-mounted somewhere else or forwarded).

2) No need for a network access from libvirt daemon, no matter whether migrating to the same host or to a remote node (in this case the wrapper would still need to forward it, probably through its own network).


I would love if these could be checked before I am done with the implementation.

Comment 13 Martin Kletzander 2020-07-10 12:42:19 UTC

Please see comment #12, thanks.

Comment 14 David Vossel 2020-07-14 20:04:19 UTC

Hey, yes this does look accurate to me. I think you're on the right track.

As soon as you feel like you have something  worth us testing, it would be worthwhile for us to drop an early build of your libvirt work into kubevirt and attempt to remove the tcp proxies to see if we hit any surprises.

Comment 15 Martin Kletzander 2020-07-15 08:34:52 UTC

Thanks. I found out you also use peer2peer migration unconditionally.  My guess is that it's because you used to use tunnelled migration which was since removed.  I'll try to support both peer2peer and direct, but maybe switching to direct makes more sense for you (the libvirt connection to the other side is not initiated by the daemon, but the client).

Comment 16 David Vossel 2020-07-15 19:53:28 UTC

>  I'll try to support both peer2peer and direct, but maybe switching to direct makes more sense for you (the libvirt connection to the other side is not initiated by the daemon, but the client).

oh, good observation. I'm not 100% sure what the implications might be for us to switch to direct. It would be great if you can support both. If you find yourself in a situation where technically it's difficult to support peer2peer, we can look further into the implications of using direct on our side.

Comment 17 Martin Kletzander 2020-08-25 05:52:25 UTC

I have a first version posted here:

https://www.redhat.com/archives/libvir-list/2020-August/msg00879.html

Comment 18 Martin Kletzander 2020-08-25 05:57:58 UTC

(In reply to David Vossel from comment #16)
Well, the only difference between p2p and direct is who is connecting to the destination daemon.  By dropping peer2peer you could remove one socket forward between the daemons.  For more info see https://libvirt.org/migration.html#flowmanageddirect and https://libvirt.org/migration.html#flowpeer2peer which should, hopefully, illustrate the difference.

About the early build, you can grab the current tree here: https://gitlab.com/nertpinx/libvirt/-/tree/unix_migration I'm not sure what build you'd like, so let me know and I'll do my best to help with that.

Comment 19 David Vossel 2020-08-25 20:06:14 UTC

>About the early build, you can grab the current tree here: https://gitlab.com/nertpinx/libvirt/-/tree/unix_migration I'm not sure what build you'd like, so let me know and I'll do my best to help with that. 

your branch is fine.

This Dockerfile [1] checks out your branch, and builds libvirt within a container that we can use as the base image for the kubevirt VMIs. 


I'll see if we can get someone tasked with kubevirt integration using your dev branch as a starting point. I'm not sure how quickly this can be done though. 


1. https://github.com/davidvossel/libvirt/blob/unix-sock-fix/Dockerfile.in

Comment 21 Martin Kletzander 2020-09-01 14:38:00 UTC

Patches for v2 were sent on the mailing list, the gitlab branch is also updated.  The differences are in the URI specification, now it is enough to use "unix:///path/to/socket" for both destination URI and the disks URI (which used to be a disks socket).  All is explained in the documentation (virsh manpage for "migrate" command, "migration" web page with example use case and documentation for the new typed parameter itself), I can provide accurate links after the code is merged (the artifacts build in the CI are not directly accessible since it is only in my branch now).

Here is the link to the v2:

  https://www.redhat.com/archives/libvir-list/2020-September/msg00049.html

Comment 25 Martin Kletzander 2020-09-05 06:11:03 UTC

Pushed upstream with commit v6.7.0-58-gf51cbe92c0d8:

commit f51cbe92c0d84e29ea1f158ad544a4d69ec1cee3
Author: Martin Kletzander <mkletzan>
Date:   Wed Sep 2 12:06:12 2020 +0200

    qemu: Allow migration over UNIX socket

Comment 26 Michal Privoznik 2020-09-16 12:50:28 UTC

Setting Upstream keyword per comment 25.

Comment 31 Fangge Jin 2020-11-10 09:39:41 UTC

Try to verify with libvirt-6.6.0-8.module+el8.3.1+8648+130818f2.x86_64

Steps:
1. On src host: 
# socat unix-listen:/tmp/sock,reuseaddr,fork tcp:<dest host>:16509
# socat unix-listen:/tmp/sock-49152,reuseaddr,fork tcp:<dest host>:49152

2. On dest host:
Start libvirtd-tcp.socket, set auth_tcp=none' and restart libvirtd.service.

3. Migrate a vm from src to dest host, it reported error as below:
# virsh migrate avocado-vt-vm1 qemu+unix:///system?socket=/tmp/sock --live --verbose --migrateuri unix:///tmp/sock-49152
error: Failed to connect socket to '/tmp/sock-49152': Permission denied

Audit log on src host:
type=AVC msg=audit(1604999472.307:1736): avc:  denied  { connectto } for  pid=148647 comm="rpc-worker" path="/tmp/sock-49152" scontext=system_u:system_r:svirt_t:s0:c735,c952 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0


4. Change the migrateuri to unix:///var/lib/libvirt/qemu/sock-49152, it will report different error:
# virsh migrate avocado-vt-vm1 qemu+unix:///system?socket=/tmp/sock --live --verbose --migrateuri unix:///var/lib/libvirt/qemu/sock-49152
error: Failed to connect socket to '/var/lib/libvirt/qemu/sock-49152': Permission denied

Libvirtd log on dest host:
2020-11-10 09:21:22.972+0000: 512585: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"id": "libvirt-370", "error": {"class": "GenericError", "desc": "Failed to bind socket to /var/lib/libvirt/sock-49152: Permission denied"}}]


5. Change migrateuri back to unix:///tmp/sock-49152, and set selinux to permissive on src host, try migration again:
# virsh migrate avocado-vt-vm1 qemu+unix:///system?socket=/tmp/sock --live --verbose --migrateuri unix:///tmp/sock-49152
error: operation failed: migration out job: Unable to write to socket: Broken pipe

Comment 32 Fangge Jin 2020-11-10 09:56:04 UTC

Hi Martin

Could you help to check above comment?

Comment 33 Martin Kletzander 2020-11-10 15:03:13 UTC

The "permission denied is weird because libvirt definitely labels them. It might be that the listening socket has its own label separate from the label on the socket on the filesystem (how it happens in some cases), but also because it looks like the avc denial comes from `rpc-worker`.

Anyway, even though forwarding the socket to a tcp address of another machine, that does not work for any other connection than the libvirt one.  Basically because you said that the migration should happen over a particular socket then both source and destination expect that (destination listens on a socket, not the port which you picked).

Could you please check that if you start socket listening on a socket in the same way (the other way being just a `readline` for example) that you can use that socket as an output for chardev device?  Because both of them are being labelled the same way, so I think if one does not work, then the other one should not either.

Comment 34 Fangge Jin 2020-11-11 07:11:48 UTC

I tried with following steps, please check:

1. Start a socat to listen on a unix socket:
# socat unix-listen:/var/lib/libvirt/qemu/f16x86_64.agent,reuseaddr,fork tcp:<host ip>:49152

2. Start a guest with unix socket of connect mode:
# virsh dumpxml avocado-vt-vm1
...
    <channel type="unix">
      <source mode="connect" path="/var/lib/libvirt/qemu/f16x86_64.agent" />
      <target type="virtio" />
      <address type="virtio-serial" controller="0" bus="0" port="2" />
    </channel>
...

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: internal error: process exited while connecting to monitor: 2020-11-11T06:47:31.199945Z qemu-kvm: -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/f16x86_64.agent: Failed to connect socket /var/lib/libvirt/qemu/f16x86_64.agent: Permission denied

3. Check audit log:
type=AVC msg=audit(1605077691.158:2144): avc:  denied  { connectto } for  pid=165747 comm="qemu-kvm" path="/var/lib/libvirt/qemu/f16x86_64.agent" scontext=system_u:system_r:svirt_t:s0:c1,c78 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0

4. I also checked the selinux context of the unix socket file during vm starts, it was labeled correctly:
# ll /var/lib/libvirt/qemu/f16x86_64.agent -Z
srwxr-xr-x. 1 qemu qemu system_u:object_r:svirt_image_t:s0:c1,c78 0 Nov 11 01:45 /var/lib/libvirt/qemu/f16x86_64.agent


Change the unix socket directory to /tmp or /var/lib/libvirt, got same result as above

Comment 35 Fangge Jin 2020-11-13 03:01:18 UTC

Hi David

I'm a little confused by the requirement of CNV.

According to the comment 0 of the bug, my understanding about the requirement is as:

   On src host :

       socat unix-listen:local-socket-1,reuseaddr,fork tcp:<dest host>:22222
       socat unix-listen:local-socket-2,reuseaddr,fork tcp:<dest host>:49152
       socat unix-listen:local-socket-3,reuseaddr,fork tcp:<dest host>:49153

   Then do migration:

       virsh migrate [--copy-storage-all] [--p2p] --live --xml domain.xml my-vm qemu+unix:///system?socket=local-socket-1 --migrateuri unix://local-socket-2 [--disks-uri unix://local-socket-3]


While from comment 12 and comment 13, it also require two socat on dest host before migration as below?
       socat tcp-listen:49152,reuseaddr,fork unix://local-socket-2
       socat tcp-listen:49153,reuseaddr,fork unix://local-socket-3


Could you please clarify again?

Comment 36 David Vossel 2020-11-13 22:22:30 UTC

> While from comment 12 and comment 13, it also require two socat on dest host before migration as below?
>       socat tcp-listen:49152,reuseaddr,fork unix://local-socket-2
>       socat tcp-listen:49153,reuseaddr,fork unix://local-socket-3
> Could you please clarify again?

Yes, the simulated setup will need socat on both src and destination. I know it looks strange, but we're tunneling a migration over unix sockets which are streamed across a TCP connection.

The real world reasoning behind this is that in CNV the container libvirt is running in does not have network access. All we have for communication is unix sockets. So we're tunnelling out of that libvirt container with the unix socket and then streaming in another environment that unix socket over tcp.

so it conceptually looks like this.


---------- SOURCE NODE ------ | ------ DESTINATION NODE -----
Libevirt <-> unix socket <-> TCP <-> unix socket <-> libvirt

Comment 37 Martin Kletzander 2020-11-14 01:33:28 UTC

Created attachment 1729265 [details]
Script for testing the forwarding

So the issue with SELinux was the socket labeling one.  The socket needs to be labeled and the label is different from the one that is on the file that represents the socket.  Since socat does not have this functionality, I wrote a small script that forwards the data in a very simple manner, but also supports setting the context the way it is supposed to be done.

So you can run it like so on the source host:

  ./pmsocat.py unix2tcp -c system_u:system_r:svirt_socket_t:s0 /tmp/test.sock destination 12345

or even more times if you are testing nbd as well.  You can also run it on the destination instead of socat if you want:

  ./pmsocat.py tcp2unix 12345 /tmp/test.sock

I found that it also performs better when testing migration with multiple parallel connections (virsh migrate --parallel), I even tried big numbers of parallel connections and it worked very good.  Feel free to ask if you need any more help with this peculiar BZ.

Also, I plan to document this more closely, but if you want to track that as well, then please create another BZ so that a simple doc change does not stall this BZ for no reason.  Thank you.

Comment 38 Fangge Jin 2020-11-14 02:37:52 UTC

(In reply to Martin Kletzander from comment #37)
> Created attachment 1729265 [details]
> Script for testing the forwarding
Thanks, I will try with your script.

Comment 39 Fangge Jin 2020-11-14 08:10:27 UTC

Hi Martin

I tried your script, some of the functions are not supported with python3.6(the latest python version on RHEL8), I need some time to modify it, and I will be very grateful if you can provide a script for python3.6

And I did some test with selinux disabled on src host, and met two problems:
1. Migration is stuck at 1% when test with --parallel and without --copy-storage-all(while it works well when test with both --parallel and --copy-storage-all)

2. Migration with both --copy-storage-all and --tls failed:
# virsh -k0 migrate avocado-vt-vm1 qemu+unix:///system?socket=/tmp/sock --live --verbose --migrateuri unix:///tmp/sock-49152 --p2p --postcopy --compressed  --comp-methods xbzrle   --copy-storage-all --disks-uri unix:///tmp/sock-49153 --tls 
error: internal error: unable to execute QEMU command 'nbd-server-start': TLS is only supported with IPv4/IPv6


Steps for problem 1:
1). On src host:
# socat unix-listen:/tmp/sock,reuseaddr,fork tcp:dell-per740-04.dell2.lab.eng.bos.redhat.com:16509
# socat unix-listen:/tmp/sock-49152,reuseaddr,fork tcp:dell-per740-04.dell2.lab.eng.bos.redhat.com:49152

2). On dest host:
# socat tcp-listen:49152,reuseaddr,fork unix:/tmp/sock-49152

3). Do migration with --parallel:
# virsh migrate avocado-vt-vm1 qemu+unix:///system?socket=/tmp/sock --live --verbose --p2p --migrateuri unix:///tmp/sock-49152 --parallel
Migration: [ 1 %]

4). Check migration progress after a while, I found the data transferred stayed 2.025MiB forever.
# virsh domjobinfo avocado-vt-vm1
Job type: Unbounded
Operation: Outgoing migration
Time elapsed: 483192 ms
Data processed: 2.025 MiB
Data remaining: 1022.133 MiB
Data total: 1.008 GiB
Memory processed: 2.025 MiB
Memory remaining: 1022.133 MiB
Memory total: 1.008 GiB
Dirty rate: 0 pages/s
Page size: 4096 bytes
Iteration: 1
Postcopy requests: 0
Constant pages: 2064
Normal pages: 639
Normal data: 2.496 MiB
Expected downtime: 300 ms
Setup time: 9 ms

5). Check the opened sockets on both hosts.
Src host:
# netstat -tuxnap|grep 49152
tcp        0 300328 10.16.218.250:59848     10.16.218.252:49152     ESTABLISHED 176024/socat        
tcp        0 270584 10.16.218.250:59852     10.16.218.252:49152     ESTABLISHED 176025/socat        
unix  2      [ ACC ]     STREAM     LISTENING     828327   172796/socat         /tmp/sock-49152
unix  3      [ ]         STREAM     CONNECTED     801621   176025/socat         /tmp/sock-49152
unix  3      [ ]         STREAM     CONNECTED     801620   176024/socat         /tmp/sock-49152

Dest host:
# netstat -tuxnap|grep 49152
tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      518422/socat        
tcp   231616      0 10.16.218.252:49152     10.16.218.250:59848     ESTABLISHED 518971/socat        
tcp   242080      0 10.16.218.252:49152     10.16.218.250:59852     ESTABLISHED 518973/socat        
unix  2      [ ACC ]     STREAM     LISTENING     4980248  518943/qemu-kvm      /tmp/sock-49152
unix  3      [ ]         STREAM     CONNECTED     4993667  518943/qemu-kvm      /tmp/sock-49152
unix  3      [ ]         STREAM     CONNECTED     4993669  518943/qemu-kvm      /tmp/sock-49152

6). Try to cancel migration by Ctrl+C or "virsh domjobabort avocado-vt-vm1", the status stays at "cancelling" forever.

2020-11-11 11:20:15.113+0000: 175406: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7ffaf8003020 reply={"return": {"expected-downtime": 300, "status": "cancelling", "setup-time": 9, "total-time": 819084, "ram": {"total": 1082859520, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 2105216, "pages-per-second": 0, "page-size": 4096, "remaining": 1071783936, "mbps": 0, "transferred": 2123799, "duplicate": 2064, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 2617344, "normal": 639}}, "id": "libvirt-1655"}

Comment 40 Fangge Jin 2020-11-14 08:13:50 UTC

I also have another question:
Is it needed to provide an option to specify the unix socket path on dest host in case the path on src host is not available on dest host?

Comment 41 Martin Kletzander 2020-11-14 21:34:59 UTC

Created attachment 1729364 [details]
Script for python3.6

(In reply to Fangge Jin from comment #39)
I tried rewriting it to 3.6, please have a look and try it.

ad 1) I'll have to check that out.

ad 2) Seems fine, QEMU properly says that it does not support TLS on UNIX sockets, only on IPv4 and/or IPv6.  We can disable it and stop it earlier, sure, but it does not seem like an issue.  QEMU might be able to do that (at some point).

(In reply to Fangge Jin from comment #40)
There are two sockets, one that the destination listens on and one that the source connects to.  Because those are two different systems the sockets exist on different machines and you need to connect them somehow. Of course, if these were two containers, then you would just share a bind-mounted folder in them and did not have to run any socat or any custom code).  But since the source is trying to connect to src_machine:/tmp/test.sock and the destination listens on dst_machine:/tmp/test.sock they have to be specified for both socket forwarding programs.  I hope that's clear.

Comment 42 Fangge Jin 2020-11-16 08:04:36 UTC

(In reply to Martin Kletzander from comment #41)
> Created attachment 1729364 [details]
> Script for python3.6
> 
> (In reply to Fangge Jin from comment #39)
> I tried rewriting it to 3.6, please have a look and try it.
> 
> ad 1) I'll have to check that out.

1) I tested with your script, migration with --parallel can succeed occasionally but still get stuck 
at most of time, the error printed by the script is as below:

Task exception was never retrieved
future: <Task finished coro=<forward() done, defined at ./pmsocket-3.6.py:11> exception=ConnectionResetError(104, 'Connection reset by peer')>
Traceback (most recent call last):
  File "./pmsocket-3.6.py", line 24, in forward
    yield from writer.drain()
  File "/usr/lib64/python3.6/asyncio/streams.py", line 329, in drain
    raise exc
  File "./pmsocket-3.6.py", line 15, in forward
    data = yield from reader.read(_args.blocksize)
  File "/usr/lib64/python3.6/asyncio/streams.py", line 634, in read
    yield from self._wait_for_data('read')
  File "/usr/lib64/python3.6/asyncio/streams.py", line 464, in _wait_for_data
    yield from self._waiter
  File "/usr/lib64/python3.6/asyncio/selector_events.py", line 714, in _read_ready
    data = self._sock.recv(self.max_size)
ConnectionResetError: [Errno 104] Connection reset by peer

> 
> ad 2) Seems fine, QEMU properly says that it does not support TLS on UNIX
> sockets, only on IPv4 and/or IPv6.  We can disable it and stop it earlier,
> sure, but it does not seem like an issue.  QEMU might be able to do that (at
> some point).

2) I wonder whether kubevirt will need to use TLS NBD for disk migration.

> 
> (In reply to Fangge Jin from comment #40)
> There are two sockets, one that the destination listens on and one that the
> source connects to.  Because those are two different systems the sockets
> exist on different machines and you need to connect them somehow. Of course,
> if these were two containers, then you would just share a bind-mounted
> folder in them and did not have to run any socat or any custom code).  But
> since the source is trying to connect to src_machine:/tmp/test.sock and the
> destination listens on dst_machine:/tmp/test.sock they have to be specified
> for both socket forwarding programs.  I hope that's clear.

3) Now the path of the unix socket file is hard-coded to be same on src and dest host.
Should we provide an option(e.g. --listen-uri) so user can specify different path to 
listen on dest host?

Comment 43 Martin Kletzander 2020-11-18 12:30:50 UTC

(In reply to Fangge Jin from comment #42)
1) That might be an issue with the script and if it happens only sometimes it most probably is.  Does the migration stop as well?  It looks like the connection gets broken somewhere, maybe the source qemu?  I'm also not sure I handle the disconnects properly, maybe I'm closing the socket while also trying to read something from it.  I did not have problem with the 3.7 script, but maybe I was just lucky.

2) They are in charge of the forwarding and they can encrypt the data themselves in which case the only unencrypted transfer would only go through the unix sockets.  If they want to delegate that to QEMU then they need to create a BZ for qemu to request that feature.  I can't speak for QEMU about whether or nt this makes sense for them.

3) Not really, it would only complicate things.  This is going to be used mostly by containers where you'll probably bind-mount the directory/socket anyway.  Outside of containers it is still possible to bind-mount them, but I do not see a reason for having different paths on source and destination, especially when the forwarder is in charge of creating the socket where they want (i.e. you can just create a new directory just for the sockets).  If anything, keeping the names in sync is actually more error proof.

Comment 44 Martin Kletzander 2020-11-18 14:09:51 UTC

I checked once more that --parallel without --copy-storage-all (with nfs-backed disks) is working fine.  I just have problems with the python 3.6 script and I think it might be related to asyncio not being as function in python36 as it is in 3.7 and 3.8.  Feel free to write anything simpler and smaller for the forwarding or just run a fedora container in which you bind mount the socket directory so that you can run the script in there (it would actually be way closer to what CNV is going to do IMHO).

Feel free to create BZs for the error on NBD TLS migration and for documenting the selinux rules needed for this to work.  I have patches for that already.

Comment 45 Fangge Jin 2020-11-19 03:55:54 UTC

(In reply to Martin Kletzander from comment #43)
> (In reply to Fangge Jin from comment #42)
> 1) That might be an issue with the script and if it happens only sometimes
> it most probably is.  Does the migration stop as well?  It looks like the
> connection gets broken somewhere, maybe the source qemu?  I'm also not sure
> I handle the disconnects properly, maybe I'm closing the socket while also
> trying to read something from it.  I did not have problem with the 3.7
> script, but maybe I was just lucky.
> 
Thanks, I will try with container

> 2) They are in charge of the forwarding and they can encrypt the data
> themselves in which case the only unencrypted transfer would only go through
> the unix sockets.  If they want to delegate that to QEMU then they need to
> create a BZ for qemu to request that feature.  I can't speak for QEMU about
> whether or nt this makes sense for them.
If CNV is not using NBD TLS, then I think the current test result is acceptable.
So I won't file new BZs for this.

Comment 46 Fangge Jin 2020-11-19 04:00:13 UTC

Another issue I met:
Dst libvirtd crashed when --disks-uri contains on valid schema, do you prefer a new BZ or address it in this BZ?:
# virsh migrate avocado-vt-vm1 qemu+unix:///system?socket=/tmp/test.sock --live --verbose  --compressed --copy-storage-all  --disks-uri fjeifeke
error: End of file while reading data: Input/output error

(gdb) bt
#0  0x00007ff42e73cd4e in __strcmp_avx2 () from /lib64/libc.so.6
#1  0x00007ff3e876579c in qemuMigrationDstStartNBDServer (tls_alias=0x0, nbdURI=0x5584fc300e70 "fjeifeke", nbdPort=0, migrate_disks=0x0, nmigrate_disks=0, 
    listenAddr=<optimized out>, vm=0x5584fc2be6a0, driver=0x7ff3cc107830) at ../../src/qemu/qemu_migration.c:403
#2  qemuMigrationDstPrepareAny (driver=driver@entry=0x7ff3cc107830, dconn=dconn@entry=0x7ff410004010, 
    cookiein=cookiein@entry=0x7ff4180048f0 "<qemu-migration>\n  <name>avocado-vt-vm1</name>\n  <uuid>ee090bb6-714e-4ab7-9625-0483510bf987</uuid>\n  <hostname>dell-per740-03.dell2.lab.eng.bos.redhat.com</hostname>\n  <hostuuid>4c4c4544-0051-4210-805"..., cookieinlen=cookieinlen@entry=1783, cookieout=cookieout@entry=0x7ff4279a7818, 
    cookieoutlen=cookieoutlen@entry=0x7ff4279a780c, def=<optimized out>, origname=<optimized out>, st=<optimized out>, protocol=<optimized out>, port=<optimized out>, 
    autoPort=<optimized out>, listenAddress=<optimized out>, nmigrate_disks=<optimized out>, migrate_disks=<optimized out>, nbdPort=<optimized out>, nbdURI=<optimized out>, 
    migParams=<optimized out>, flags=<optimized out>) at ../../src/qemu/qemu_migration.c:2726
#3  0x00007ff3e87676a7 in qemuMigrationDstPrepareDirect (driver=driver@entry=0x7ff3cc107830, dconn=dconn@entry=0x7ff410004010, 
    cookiein=cookiein@entry=0x7ff4180048f0 "<qemu-migration>\n  <name>avocado-vt-vm1</name>\n  <uuid>ee090bb6-714e-4ab7-9625-0483510bf987</uuid>\n  <hostname>dell-per740-03.dell2.lab.eng.bos.redhat.com</hostname>\n  <hostuuid>4c4c4544-0051-4210-805"..., cookieinlen=cookieinlen@entry=1783, cookieout=cookieout@entry=0x7ff4279a7818, 
    cookieoutlen=cookieoutlen@entry=0x7ff4279a780c, uri_in=<optimized out>, uri_out=<optimized out>, def=<optimized out>, origname=<optimized out>, listenAddress=0x0, 
    nmigrate_disks=<optimized out>, migrate_disks=<optimized out>, nbdPort=<optimized out>, nbdURI=<optimized out>, migParams=<optimized out>, flags=<optimized out>)
    at ../../src/qemu/qemu_migration.c:3030
#4  0x00007ff3e87cf850 in qemuDomainMigratePrepare3Params (dconn=0x7ff410004010, params=<optimized out>, nparams=2, 
    cookiein=0x7ff4180048f0 "<qemu-migration>\n  <name>avocado-vt-vm1</name>\n  <uuid>ee090bb6-714e-4ab7-9625-0483510bf987</uuid>\n  <hostname>dell-per740-03.dell2.lab.eng.bos.redhat.com</hostname>\n  <hostuuid>4c4c4544-0051-4210-805"..., cookieinlen=1783, cookieout=0x7ff4279a7818, cookieoutlen=0x7ff4279a780c, uri_out=0x7ff4180025a0, flags=2113)
    at ../../src/qemu/qemu_driver.c:12536
--Type <RET> for more, q to quit, c to continue without paging--
#5  0x00007ff4329d031c in virDomainMigratePrepare3Params (dconn=dconn@entry=0x7ff410004010, params=0x7ff418004ff0, nparams=2, 
    cookiein=0x7ff4180048f0 "<qemu-migration>\n  <name>avocado-vt-vm1</name>\n  <uuid>ee090bb6-714e-4ab7-9625-0483510bf987</uuid>\n  <hostname>dell-per740-03.dell2.lab.eng.bos.redhat.com</hostname>\n  <hostuuid>4c4c4544-0051-4210-805"..., cookieinlen=1783, cookieout=cookieout@entry=0x7ff4279a7818, cookieoutlen=0x7ff4279a780c, 
    uri_out=0x7ff4180025a0, flags=2113) at ../../src/libvirt-domain.c:4871
#6  0x00005584fa3adda4 in remoteDispatchDomainMigratePrepare3Params (ret=0x7ff418002f10, args=0x7ff418002ee0, rerr=0x7ff4279a78e0, msg=0x5584fc311120, 
    client=<optimized out>, server=<optimized out>) at ../../src/remote/remote_daemon_dispatch.c:5610
#7  remoteDispatchDomainMigratePrepare3ParamsHelper (server=<optimized out>, client=<optimized out>, msg=0x5584fc311120, rerr=0x7ff4279a78e0, args=0x7ff418002ee0, 
    ret=0x7ff418002f10) at ./remote/remote_daemon_dispatch_stubs.h:8789
#8  0x00007ff43289cdfc in virNetServerProgramDispatchCall (msg=0x5584fc311120, client=0x5584fc306050, server=0x5584fc2be080, prog=0x5584fc312810)
    at ../../src/rpc/virnetserverprogram.c:430
#9  virNetServerProgramDispatch (prog=0x5584fc312810, server=server@entry=0x5584fc2be080, client=client@entry=0x5584fc306050, msg=msg@entry=0x5584fc311120)
    at ../../src/rpc/virnetserverprogram.c:302
#10 0x00007ff4328a49ac in virNetServerProcessMsg (srv=srv@entry=0x5584fc2be080, client=0x5584fc306050, prog=<optimized out>, msg=0x5584fc311120)
    at ../../src/rpc/virnetserver.c:137
#11 0x00007ff4328a4e1c in virNetServerHandleJob (jobOpaque=0x5584fc2f6160, opaque=0x5584fc2be080) at ../../src/rpc/virnetserver.c:154
#12 0x00007ff4327192fb in virThreadPoolWorker (opaque=<optimized out>) at ../../src/util/virthreadpool.c:163
#13 0x00007ff4327181d7 in virThreadHelper (data=<optimized out>) at ../../src/util/virthread.c:233
#14 0x00007ff42edcb14a in start_thread () from /lib64/libpthread.so.0
#15 0x00007ff42e6e0f23 in clone () from /lib64/libc.so.6

Comment 47 Fangge Jin 2020-11-19 06:16:39 UTC

Two more issues:
1. Can't disk migration go on rdma transport? And why?
# virsh migrate avocado-vt-vm1 qemu+unix:///system?socket=/tmp/test.sock --live --verbose  --compressed --copy-storage-all  --disks-uri rdma://192.168.100.6:56891 --listen-address  10.16.218.252
error: invalid argument: Unsupported scheme in disks URI: rdma

2.--tls-destination doesn't take effect for disk migration, I think this is a bug, please confirm:
# virsh migrate avocado-vt-vm1 qemu+unix:///system?socket=/tmp/test.sock --live --verbose  --compressed --p2p --migrateuri tcp://10.16.218.252:49156 --bandwidth 200  --tls --tls-destination <dest hostname> --copy-storage-all --disks-uri tcp://192.168.100.6:49156
error: internal error: unable to execute QEMU command 'blockdev-add': Certificate does not match the hostname 192.168.100.6

# virsh migrate avocado-vt-vm1 qemu+unix:///system?socket=/tmp/test.sock --live --verbose  --compressed  --migrateuri tcp://10.16.218.252:49156 --bandwidth 200  --tls --tls-destination <dest hostname> --copy-storage-all 
error: internal error: unable to execute QEMU command 'blockdev-add': Certificate does not match the hostname 10.16.218.252

Comment 48 Fangge Jin 2020-11-20 03:53:46 UTC

For documentation about --disks-uri:
"
This can be tcp://address:port to specify a listen address (which overrides --listen-address for the disk migration) 
"

How about change it to "which overrides both --migrate-uri and --listen-address for the disk migration" to make it more accurate?

Comment 49 Martin Kletzander 2020-11-20 20:06:32 UTC

(In reply to Fangge Jin from comment #48)
Good point.  I'll change that as well.

Comment 50 Fangge Jin 2020-11-25 02:11:52 UTC

(In reply to Fangge Jin from comment #47)

> 2.--tls-destination doesn't take effect for disk migration, I think this is
> a bug, please confirm:

I filed a separate bug for this issue as it is not related to current bug: 
Bug 1901394 - --tls-destination doesn't take effect for disk migration

Comment 51 Fangge Jin 2020-11-25 02:28:14 UTC

Another small issue:
When --migrateuri uses unix, and --disks-uri is not specified, migration will fail. How about add a logic in code so an more clear error(e.g. --disks-uri must be specified when unix is used for --migrateuri) can be printed in such situation. But I think it is not an important issue, feel free to give your opinion.
# virsh migrate avocado-vt-vm1 qemu+unix:///system?socket=/tmp/test.sock --live --verbose --p2p --migrateuri unix:///var/lib/libvirt/qemu/test-49151.sock --copy-storage-all
error: internal error: unable to execute QEMU command 'nbd-server-start': address resolution failed for /var/lib/libvirt/qemu/test-49151.sock:49152: Name or service not known

Comment 52 Fangge Jin 2020-11-26 09:33:28 UTC

The issue 1) in comment 42 can still be reproduced(about 20% reproducible) when I do migration on two fedora hosts each with a containerized libvirtd, while it can NOT be reproduced when I do migration on one single fedora host with two containerized libvirtd(using bind mount, no packet forwarding). Maybe something during packet forwarding. I guess I can verify this bug with two containerized libvirtd running on one fedora host, right?

Dst qemu log:
2020-11-26T10:23:07.648598Z qemu-kvm: failed to receive packet via multifd channel 0: multifd: received packet magic 5145564d expected 11223344
2020-11-26 10:23:19.604+0000: shutting down, reason=failed
2020-11-26T10:23:19.605835Z qemu-kvm: terminating on signal 15 from pid 11076 (/usr/sbin/libvirtd)

Comment 53 Fangge Jin 2020-11-26 09:35:08 UTC

Hi Martin

Please check comment 46, 47, 51 and 52. Thanks.

Comment 54 Martin Kletzander 2020-11-27 07:31:23 UTC

I'll fix most of them in this BZ, thanks for the thorough testing.  Regarding comment #52 I think those failures are happening simply because the forwarding script is just very, very simple.  By eliminating the the script from the equation you test really what should be tested - the feature added in this BZ, so I think that is also more appropriate.

Comment 55 Fangge Jin 2020-12-04 06:32:59 UTC

I tried to test migration with --parallel by using the migration proxy of kubevirt, and got same error as comment 52.
Maybe it is due to the fault of qemu-kvm? Kvm QE told me that combination of "--parallel" and other feature is not
supported by now. So I will just treat this as a known issue for now.

Comment 56 Fangge Jin 2020-12-04 06:36:27 UTC

Hi David

Migration over unix socket with "--parallel"(multifd in qemu-kvm) will meet error(see comment 52, comment 55),
is "--parallel" used in kubevirt? If not, I will just treat this as a known issue and won't block the verification
of this bug.

Comment 57 David Vossel 2020-12-07 02:27:04 UTC

Hey, we currently do not use the parallel flag. We are under some pressure to optimize migration times, which means it's possible the parallel option might be something we consider to help increase bandwidth at some point. I agree this shouldn't block this work though since we aren't utilizing the parallel functionality now.

Comment 59 Martin Kletzander 2020-12-16 10:27:20 UTC

Sorry for not getting back to this earlier, here are some updates and answers (and some of them repeated, so that it is summed up in one place):

(In reply to Fangge Jin from comment #46)
This looks like it has been an issue even before. But I have a patch for that, might as well include them with the other ones for this BZ.

(In reply to Fangge Jin from comment #47)
ad 1) No idea, you could ask someone who was adding support for rdma, but since I am not that familiar with it I'm not sure if there is some specific reason for that. From libvirt's point of view it is probably because QEMU does not support it.

(In reply to Fangge Jin from comment #48)
Fixed in another batch of patches.

(In reply to Fangge Jin from comment #50)
Yes, it is, but we need QEMU for that, more info in the bug you created.

(In reply to Fangge Jin from comment #51)
Yeah, that's not a big deal I think. It even is mentioned in the docs kind of. But I agree it would be nice to have a check for that.

(In reply to Fangge Jin from comment #52)
I'm quite sure that's because of the script I wrote. I wanted to quickly cook up something, so I did it in python, but I am not that very well versed in handling multiple I/O from python, so there might be some mishaps there. It might be possible to set the socket create label, create the socket and then spawn socat with fd: and give it the FD, although I *think* that would treat it just as a character stream and not a socket (i.e. not doing accept() on it). If it works with bind-mounted directories on a single host, then it is most certainly fine. You can then try it even with just RHEL so that the test does not depend on Fedora or anything with different component versions.

Comment 60 Martin Kletzander 2020-12-16 11:20:25 UTC

Fixes posted upstream:

https://www.redhat.com/archives/libvir-list/2020-December/msg00755.html

Comment 62 Martin Kletzander 2020-12-16 12:00:57 UTC

Pushed upstream, the additional 5 commits are:

commit 9e93d87c00e65211c584769bf27e7cdb74bd6df2
Author: Martin Kletzander <mkletzan>
Date:   Wed Nov 18 14:05:25 2020 +0100

    docs: Document SELinux caveats when migrating over UNIX sockets

commit 511013b57b50da7c800967cd990f8ae1ad5fa948
Author: Martin Kletzander <mkletzan>
Date:   Wed Nov 25 00:19:41 2020 +0100

    qemu: Tweak debug message for qemuMigrationSrcPerformPeer2Peer3

commit 5db1fc56022642e610c911efd28f3a931279e917
Author: Martin Kletzander <mkletzan>
Date:   Sun Dec 13 15:49:29 2020 +0100

    qemu: Fix possible segfault when migrating disks

commit b17eb7344606dcbe3ec6eee702009c93e46e4d8d
Author: Martin Kletzander <mkletzan>
Date:   Sun Dec 13 22:27:33 2020 +0100

    docs: Slightly alter disks-uri description in virsh man

commit 68164892fe6f5d1b5e4fbd4fe1a02d14c1384096
Author: Martin Kletzander <mkletzan>
Date:   Wed Dec 16 11:34:50 2020 +0100

    qemu: Extra check for NBD URI being specified

Comment 64 Fangge Jin 2020-12-17 08:29:09 UTC

Verified with libvirt-client-6.6.0-11.module+el8.3.1+9196+74a80ca4.x86_64
Test scenarios are uploaded in attachment.

Comment 65 Fangge Jin 2020-12-17 08:29:25 UTC

Created attachment 1739897 [details]
Test scenarios

Comment 67 errata-xmlrpc 2021-02-22 15:39:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0639

Note You need to log in before you can comment on or make changes to this bug.