Bug 680356

Summary: Live migration failed in ipv6 environment
Product: Red Hat Enterprise Linux 6 Reporter: Mike Cao <bcao>
Component: qemu-kvmAssignee: Amos Kong <akong>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 6.1CC: ailan, bcao, bsarathy, ehabkost, jasowang, jdenemar, juzhang, knoel, laine, michen, mkenneth, qzhang, rhod, shuang, shu, tburke, veillard, virt-maint
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.320.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:30:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580954, 680053, 782183, 798682, 804161, 810856    

Description Mike Cao 2011-02-25 08:44:11 UTC
Description of problem:


Version-Release number of selected component (if applicable):
kernel-2.6.32-117.el6
qemu-kvm-0.12.1.2-2.147.el6

How reproducible:
100%

Steps to Reproduce:
1.configure /etc/hosts.
#cat /etc/hosts
2312::8274 s2
2312::8273 s1
2.start VM on s1 host
eg: /usr/libexec/qemu-kvm -enable-kvm -m 512 -smp 2 -name rhel5U6 -uuid ddcbfb49-3411-1701-3c36-6bdbc00bedb9 -rtc clock=vm -boot c -drive file=/mnt/rhel6.raw,if=none,id=drive-ide0-0-0,boot=on,format=raw,cache=none -device virtio-blk-pci,drive=drive-ide0-0-0,id=drive-ide0-0-0 -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:50:a4:c2:c1 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc :2 -device virtio-balloon-pci,id=ballooning -monitor stdio
3.start listenning port on s2 host
<commandLine> -incoming tcp:0:5888
4.start live migration
  
Actual results:
(qemu)migrate:[2312::8274]:5888
migration failed
(qemu)migrate:s2:5888
migrate failed.

Expected results:
Migration should be successful

Additional info:
1.ip6tables all off for both hosts.
2.on host s1
#ping6 s2
PING s2(s2) 56 data bytes
64 bytes from s2: icmp_seq=1 ttl=124 time=0.227 ms
64 bytes from s2: icmp_seq=2 ttl=124 time=0.175 ms

#ping6 2312::0274 #s2's ipv6 addr
PING 2312::0274(2312::0274) 56 data bytes
64 bytes from 2312::0274: icmp_seq=1 ttl=124 time=1.5 ms
64 bytes from 2312::0274: icmp_seq=2 ttl=124 time=0.182 ms

Comment 2 Juan Quintela 2011-02-26 00:02:32 UTC
Code for migration and for socket char device don't have support for ipv6 addresses.  Working on how difficult is to fix it.

Comment 4 Dor Laor 2011-03-24 22:05:09 UTC
From Juan: "It is the recommended way of using addresses since 2001.  It is in
posix-2001<something>, and in all my testing, I found zero cases where
it failed to work as expected.

   man getaddrinfo

for a client server example

	http://www.akkadia.org/drepper/userapi-ipv6.html
"

Removing back to 6.1 and asking for excpetion

Comment 8 Mike Cao 2011-04-06 08:16:59 UTC
Tested on qemu-kvm-0.12.1.2-2.153.el6.x86_64

steps:
1.configure /etc/hosts.
#cat /etc/hosts
1212::1210 t1
1212::1211 t2

2.start VM on s1 host
3.start listenning port on t2 host
<commandLine> -incoming tcp:0:5888
4.start live migration

Actual results:
(qemu)migrate:[2312::8274]:5888
qemu: getaddrinfo: Servname not supported for ai_socktype
(qemu)migrate:t2:5888
migrate failed.

Expected results:
Migration should be successful

Additional info:
1.ip6tables all off for both hosts.
2.on host s1
#ping6 t2
PING t2(t2) 56 data bytes
64 bytes from s2: icmp_seq=1 ttl=124 time=0.227 ms
64 bytes from s2: icmp_seq=2 ttl=124 time=0.175 ms

ping6 1212::1211
PING 1212::1211(1212::1211) 56 data bytes
64 bytes from 1212::1211: icmp_seq=1 ttl=64 time=0.165 ms
64 bytes from 1212::1211: icmp_seq=2 ttl=64 time=0.171 ms

Based on above ,re-assign this issue.

Comment 9 Eduardo Habkost 2011-04-06 18:46:48 UTC
This new feature caused a regression. See https://bugzilla.redhat.com/show_bug.cgi?id=694196

Comment 10 Eduardo Habkost 2011-04-06 19:45:58 UTC
Patches reverted on qemu-kvm-0.12.1.2-2.156.el6 due to bug #694196.

Comment 17 Amos Kong 2012-02-08 07:46:18 UTC
Hi quintela,

After rechecking this bz and 694196, I found your original two patches is right, we should not revert them.

(qemu) migrate -d tcp:localhost:5200

localhost has multiple aliases  (127.0.0.1 and ::1)
getaddrinfo( ..,localhost,..) returns a address list, "::1" is the first item of the list,
"127.0.0.1" is the second item of the list.

In bz694196, they always use "-incoming tcp:0:$port" in the listen side,
the first item of getaddrinfo list is IPV4 address ("127.0.0.1").
But client side uses the first item in addr list, ipv6 addr("::1").

getaddrinfo() is used in tcp_start_common(), it will be used by server and client. the parse rule is same. So we need to use same hostname (both client and listen side)

Some Examples:
1) success
client side: qemu-kvm ...
             (qemu) migrate -d tcp:localhost:5200
listen side: qemu-kvm .... -incoming tcp:localhost:5200

2) success
client side: qemu-kvm ...
             (qemu) migrate -d tcp:0:5200
listen side: qemu-kvm .... -incoming tcp:0:5200

3) in Comment #8, it should listen 't2', not '0'
success
client side: qemu-kvm ...
             (qemu) migrate -d tcp:t2:5200
listen side: qemu-kvm .... -incoming tcp:t2:5200

4) in Comment #8
failed
client side: qemu-kvm ...
             (qemu) migrate -d tcp:2312::8274:5200
listen side: qemu-kvm .... -incoming tcp:2312::8274:5200

The only problem we have is parsing right hostname and service, IPV6 addr contains "::", so we should split string by last ":"

diff --git a/net.c b/net.c
index 7d44c31..3b2d4da 100644
--- a/net.c
+++ b/net.c
@@ -78,7 +78,7 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep)
     const char *p, *p1;
     int len;
     p = *pp;
-    p1 = strchr(p, sep);
+    p1 = strrchr(p, sep);

Comment 18 Amos Kong 2012-02-08 07:55:54 UTC
If my understand is right, I will re-post quintela's two patches and my little fix in comment #17

Comment 19 Laine Stump 2012-02-08 08:56:59 UTC
(In reply to comment #17)
> 4) in Comment #8
> failed
> client side: qemu-kvm ...
>              (qemu) migrate -d tcp:2312::8274:5200
> listen side: qemu-kvm .... -incoming tcp:2312::8274:5200
> 
> The only problem we have is parsing right hostname and service, IPV6 addr
> contains "::", so we should split string by last ":"


That method of representing an IPv6 address with a port is discouraged because of its ambiguity. The recommended (also most common, and required for URIs) format is:

     [2312::8274]:5200

(if there is no port specified, the brackets can be (and usually are) left out).

See Section 6 of RFC5952 for details.

Attempting to imply which ":" is a part of the address, and which (if any) is a separator will (probably sooner than later) lead to incorrect results. I STRONGLY recommend that qemu use the recommended format rather than what you outline above.

This can be easily accounted for in code by checking for a '[' character at the beginning of the address:port string - if it's there, the IP address will be at p+1, and the port (if there is one) will be at strchr(p+1, ']') + 2 (if *(strchr(p+1, ']')+1) == ':', of course). If the address:port string doesn't start with '[', the address starts at *p, and the port (if there is one) is at strchr(p, ':').

Comment 20 Amos Kong 2012-02-09 02:02:23 UTC
gethostbyname() could not analyze ipv6 address with brackets, so the brackets should be removed by parse code.

diff --git a/net.c b/net.c
index 9e1ef9e..9105e3b 100644
--- a/net.c
+++ b/net.c
@@ -88,6 +88,10 @@ static int get_str_sep(char *buf, int buf_size, const char **pp, int sep)
     if (!p1)
         return -1;
     len = p1 - p;
+    if (*p == '[' && *(p1-1) == ']') {
+        p += 1;
+        len -= 2;
+    }


test: Successed
client side: qemu-kvm ...
             (qemu) migrate -d tcp:[2312::8274]:5200
listen side: qemu-kvm .... -incoming tcp:[2312::8274]:5200

Comment 21 Amos Kong 2012-02-10 08:07:57 UTC
Confirmed with Quintela in IRC, his two patches is right.
But guest will hang when migrate with invalid hostname. have fixed this problem.
I have posted four patches to upstream.

http://marc.info/?l=kvm&m=132885521231661&w=2

Comment 23 Amos Kong 2012-05-11 04:58:45 UTC
Hi Jiri,

I posted a patchset to qemu upstream to make tcp migration with IPv6-addr works.
There is not qemu-monitor-cmd interface change.

I saw many kinds of migration here: http://libvirt.org/migration.html
My fix might just relate with this one (not sure):
  virsh migrate web1 qemu://desthost/system tcp://10.0.0.1/

Could you help to check if other changes are need in qemu / libvirt layer.


http://marc.info/?l=qemu-devel&m=133667372423238&w=2
[PATCH 0/4] support to migrate with IPv6 address

Comment 26 Jiri Denemark 2012-08-06 13:26:48 UTC
I checked migration code in libvirt and it's mostly IPv6-aware because it uses virURI* functions. However, URI handling in qemuMigrationPrepareDirect does not use any of those functions and the custom parsing is done in an IPv4-only way. I'll file a new bug for libvirt to fix this.

Comment 27 Jiri Denemark 2012-08-06 13:51:46 UTC
Bug 846013 filed to fix libvirt.

Comment 32 Qunfang Zhang 2012-10-09 03:36:59 UTC
Reproduced on qemu-kvm-0.12.1.2-2.316.el6 and verified pass on qemu-kvm-0.12.1.2-2.320.el6.

Details:
Source host IP: 2002:5::2
Destination host IP: 2002:5::3

Reproduction on version qemu-kvm-0.12.1.2-2.316.el6:
1. Boot a guest on source host.
 /usr/libexec/qemu-kvm -M rhel6.4.0 -cpu Conroe -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name rhel6.4-64 -uuid feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -k en-us -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-6.4-64-virtio.qcow2,if=none,id=disk0 -device virtio-scsi-pci,id=disk0 -device scsi-hd,drive=disk0,scsi-id=0,lun=0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:23:AE:7A:6E:10,bus=pci.0,addr=0x4  -monitor stdio -qmp tcp:0:6666,server,nowait -boot c -bios /usr/share/seabios/bios-pm.bin -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -drive if=none,id=drive-fdc0-0-0,readonly=on,format=raw -global isa-fdc.driveA=drive-fdc0-0-0 -vnc :10

2. Boot the guest that is located in the shared nfs server with listening mode:
/usr/libexec/qemu-kvm ...... -incoming tcp:0:5800

3. On source host:
(qemu) migrate -d tcp:[2002:5::3]:5800
migration failed

4. If boot guest on destination host with  -incoming tcp:[2002:5::3]:5800
Can not boot up and prompt:
invalid host/port combination: [2002:5::3]:5800
Migration failed. Exit code tcp:[2002:5::3]:5800(-22), exiting.

==========

Verification on fixed version qemu-kvm-0.12.1.2-2.320.el6:

1. Boot guest with the same command line as above.

2. Boot guest with listening mode on destination host:
/usr/libexec/qemu-kvm ...... -incoming tcp:[2002:5::3]:5800
QEMU 0.12.1 monitor - type 'help' for more information
(qemu)

3. On source host: 
(qemu) migrate -d tcp:[2002:5::3]:5800
(qemu) 
(qemu) info migrate 
Migration status: active
transferred ram: 47509 kbytes
remaining ram: 1831528 kbytes
total ram: 2113920 kbytes

(qemu) info migrate 
Migration status: completed

4. Ping pong migration 5 times between the two host.

5. Local host migration with ipv6 address.

Result: Migration finished successfully and both host and guest work well.

So, this bug is verified pass.

Comment 33 errata-xmlrpc 2013-02-21 07:30:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0527.html