Bug 869981

Summary: Cross version migration between different host with spice is broken
Product: Red Hat Enterprise Linux 6 Reporter: Qunfang Zhang <qzhang>
Component: qemu-kvmAssignee: Alon Levy <alevy>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: acathrow, areis, bili, bsarathy, dblechte, ddumas, dyasny, dyuan, hdegoede, juzhang, malittle, michen, minovotn, mkenneth, mzhan, ngalvin, owasserm, quintela, virt-maint, weizhan, zhpeng, zpeng
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.353.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:43:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 890050    

Description Qunfang Zhang 2012-10-25 09:44:31 UTC
Description of problem:
Cross version migration (for example from 6.3 to 6.4 host) with spice is broken. I'm using -M rhel6.3.0 on the both side. But migration load failed on the destination host. Actually this is broken long ago (like 6.0<->6.1), but it will be fine to support it as it's a very common scenario.

Version-Release number of selected component (if applicable):
RHEL6.3 host:
kernel-2.6.32-279.15.1.el6.x86_64
qemu-kvm-0.12.1.2-2.295.el6_3.5.x86_64
seabios-0.6.1.2-19.el6.x86_64
spice-server-0.10.1-10.el6.x86_64

RHEL6.4 host:
kernel-2.6.32-336.el6.x86_64
qemu-kvm-0.12.1.2-2.331.el6.x86_64
seabios-0.6.1.2-25.el6.x86_64
spice-server-0.12.0-1.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Boot guest on rhel6.3 host with spice:

 /usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Conroe -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name win2k8-r2 -uuid b6d6f013-2940-4124-946a-13da591a3fba -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=b6d6f013-2940-4124-946a-13da591a3fba -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/var/lib/libvirt/migrate/win2k8-r2-6.3.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,drive=disk0,id=disk0,scsi=off,bus=pci.0,addr=0x3 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1A:1A:4A:25:28,bus=pci.0,addr=0x4 -netdev tap,id=hostnet1 -device e1000,netdev=hostnet1,id=net1,mac=00:1A:1A:4A:25:20,bus=pci.0,addr=0x5 -netdev tap,id=hostnet2 -device rtl8139,netdev=hostnet2,id=net2,mac=00:1A:1A:4A:25:00,bus=pci.0,addr=0x6  -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -spice port=5930,disable-ticketing -global qxl-vga.vram_size=33554432 -k en-us -vga qxl -boot c

2. Boot the guest with same command line on rhel6.4 host with -incoming tcp:0:5800 (-M rhel6.3.0 as well)

3. Migrate guest to the dst host

  
Actual results:
(qemu) [Thread 0x7ffff0094700 (LWP 23602) exited]
qemu: warning: error while loading state for instance 0x0 of device 'ram'

Expected results:
Migration should succeed.

Additional info:

Comment 2 Orit Wasserman 2012-10-25 10:15:24 UTC
Live migration from 6.3 to 6.4 with spice and qxl (excluding seamless migration) should part of the test plan for RHEL 6.4

Comment 3 Orit Wasserman 2012-10-25 10:21:18 UTC
Could you try without global qxl-vga.vram_size=33554432 ?

Comment 4 Qunfang Zhang 2012-10-25 10:28:22 UTC
(In reply to comment #3)
> Could you try without global qxl-vga.vram_size=33554432 ?

Orit
It has the same result without "-global qxl-vga.vram_size=33554432".

#/usr/libexec/qemu-kvm ..... -spice port=5930,disable-ticketing -vga qxl -incoming tcp:0:5800
QEMU 0.12.1 monitor - type 'help' for more information
(qemu) qemu: warning: error while loading state for instance 0x0 of device 'ram'

Comment 5 Orit Wasserman 2012-10-25 10:50:38 UTC
by the way did you test with vnc ?

Comment 6 Qunfang Zhang 2012-10-26 02:23:23 UTC
(In reply to comment #5)
> by the way did you test with vnc ?

Cross migration with vnc finished successfully.

Comment 7 Alon Levy 2012-12-04 15:42:44 UTC
I couldn't reproduce while building source vm from source with 0.12.0-295, and when using package qemu-kvm-0.12.1.2-2.295.el6_3.5 I cannot use Conroe cpu:

 Unknown cpu model: Conroe

So are you sure that's the source qemu version & command line used?

Alon

Comment 8 Alon Levy 2012-12-04 17:19:05 UTC
OK, can reproduce when changed the cpu version to qemu64.

QEMU 0.12.1 monitor - type 'help' for more information
(qemu) qemu: warning: error while loading state for instance 0x0 of device 'ram'
load of migration failed

Comment 9 Alon Levy 2012-12-05 08:47:41 UTC
Forgot to remove the needinfo.

Comment 10 Qunfang Zhang 2012-12-05 08:53:36 UTC
(In reply to comment #9)
> Forgot to remove the needinfo.

Alon, hah, seems you added the needinfo again. I guess you have already reproduced this issue so clear it now. Please feel free to add comment if there's any thing I could do.

Comment 11 Qunfang Zhang 2012-12-06 08:31:48 UTC
Hi, Alon
Cross migration works between rhel6.1<->rhel6.3 and rhel6.2<->rhel6.3, so this is a regression. 
Refer to bug 698936 that is fixed in rhel6.3.

Comment 12 Alon Levy 2012-12-06 10:08:21 UTC
Thanks for the clue, I spent yesterday creating yet another script to automate the checks for bisection, but stepping through the code appears to have been faster (now that I tried it following your mention of bug 698936), now just to find where we set that size exactly (and to add this message to upstream/rhel for the future):

DEBUG: ram
length mismatch: 0000:00:02.0/qxl.vrom: 8192 in != 16384
qemu: warning: error while loading state for instance 0x0 of device 'ram'
load of migration failed

Comment 19 Qunfang Zhang 2013-01-04 02:13:10 UTC
Hi, Alon
Do you know when the build will be ready? As it is in a late stage of rhel6.4 and QE need to verify this issue and also some other additional test for it.


Thanks,
Qunfang

Comment 21 Ademar Reis 2013-01-17 20:27:45 UTC
Please test the brew-build provided by Alon for bug 876982:
https://bugzilla.redhat.com/show_bug.cgi?id=876982#c26

Alon, I don't see a downstream version of the patch, so it shouldn't be in POST. Please submit a patch to rhvirt-patches and ask for the 3 acks.

Comment 22 Qunfang Zhang 2013-01-18 02:41:36 UTC
(In reply to comment #21)
> Please test the brew-build provided by Alon for bug 876982:
> https://bugzilla.redhat.com/show_bug.cgi?id=876982#c26

Ok, will dive into it soon and update the result here. Thanks. 
> 
> Alon, I don't see a downstream version of the patch, so it shouldn't be in
> POST. Please submit a patch to rhvirt-patches and ask for the 3 acks.

Comment 23 Qunfang Zhang 2013-01-18 08:21:28 UTC
Hi, Ademar and Alon
I just tested this bug with the build in bug 876982#c26, and this bug can NOT be reproduced, cross migration between rhel6.3 and rhel6.4 host works well with spice+qxl.
As Alon said in bug 876982 that the build has no direct relation with this bug, so is there another patch for this bug itself? 
When the patch could get into the official build? 


Thanks,
Qunfang

Comment 24 Ademar Reis 2013-01-22 20:54:21 UTC
Just to clarify and make sure we're in sync, the fix for this bug is:

"""
Date: Tue, 22 Jan 2013 19:33:21 +0200
From: Alon Levy <alevy>
Subject: [PATCHv2 RHEL-6.4 qemu-kvm 0/2] fix qxl migration revision 4 bug
To: rhvirt-patches
Cc: hdegoede, mlureau, armbru, kraxel

v1->v2: (Markus +)
 remove unwanted whitespace fix
 disable added trace events by default like all the rest
 fill in commit message for "stop using non revision 4 rom fields"
 remove a non related trace event that got in there (there is no
   interface_client_monitors_config in RHEL 6.4)

Alon Levy (2):
  qxl: stop using non revision 4 rom fields for revision < 4
  qxl: change rom size to 8192

 hw/qxl.c     | 22 +++++++++++++++-------
 trace-events |  1 +
 2 files changed, 16 insertions(+), 7 deletions(-)
"""

v2 got an ACK from Markus, but is missing the ACKs from v1 (Gerd and Hans).

Comment 25 Qunfang Zhang 2013-01-23 02:38:27 UTC
Hi, Gerd and Hans
As Ademar's comment above, could you guys help ack from v1? It's planned to be include in snapshot 5. Currently it's urgent for QE as we need to verify this bug and also bug 733302 that is in the qemu-kvm errata.  And also, QE need to arrange function test for "stable guest abi" and "compatibility" that will take several days to finish after get the official build.

Thanks!
Qunfang

Comment 26 Qunfang Zhang 2013-01-23 02:48:44 UTC
If it misses the train of rhel6.4, cross version migration with spice from rhel6.4 host to older rhel host will be broken. That is a very common scenario. As we already have the patch now after lots of effort, so guys please help give it a push and let it proceed.  Thanks a lot.

Comment 27 Hans de Goede 2013-01-23 08:31:56 UTC
Acked v2.

Comment 35 errata-xmlrpc 2013-02-21 07:43:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0527.html