Bug 698936

Summary: Migrate failed from RHEL6.1 host to RHEL6.3 host with -M rhel6.1.0 (qxl and usb device related)
Product: Red Hat Enterprise Linux 6 Reporter: Joy Pu <ypu>
Component: qemu-kvmAssignee: Uri Lublin <uril>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: chayang, desktop-qa-list, flang, gcosta, juzhang, michen, minovotn, mkenneth, qzhang, shu, syeghiay, tburke, uril, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.293.el6 Doc Type: Bug Fix
Doc Text:
Cause: Migration to hosts with older versions (notably RHEL-6.1) failed. Consequence: Guest cannot be migrated to RHEL-6.1 hosts. Fix: This has been caused by incompatible QXL revision. Revision number has been changed to be the same for both versions. Result: Guest can be migrated successfully even to RHEL-6.1 hosts.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 11:33:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580954    

Description Joy Pu 2011-04-22 11:05:27 UTC
Description:
Do migrate in different version of RHEL 6.1 host, it is failed with this messgae:
(from qemu-kvm-0.12.1.2-2.153 to qemu-kvm-0.12.1.2-2.159)
Unknown savevm section or instance '0000:00:01.3/piix4_pm' 0
load of migration failed
or:
(from qemu-kvm-0.12.1.2-2.159 to qemu-kvm-0.12.1.2-2.153)
Unknown savevm section type 100
load of migration failed

Version-Release number of selected component (if applicable):
src:
kernel: 2.6.32-131.0.1.el6.x86_64
qemu: rpm -qa |grep qemu
gpxe-roms-qemu-0.9.7-6.4.el6.noarch
qemu-kvm-tools-0.12.1.2-2.153.el6.x86_64
qemu-img-0.12.1.2-2.153.el6.x86_64
qemu-kvm-0.12.1.2-2.153.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.153.el6.x86_64

dst:
kernel: 2.6.32-131.0.7.el6.x86_64
qemu: rpm -qa |grep qemu
qemu-kvm-tools-0.12.1.2-2.159.el6.x86_64
qemu-img-0.12.1.2-2.159.el6.x86_64
qemu-kvm-0.12.1.2-2.159.el6.x86_64
gpxe-roms-qemu-0.9.7-6.7.el6.noarch
qemu-kvm-debuginfo-0.12.1.2-2.159.el6.x86_64



How reproducible:
always

Steps to Reproduce:
1.Boot up a guest in src host
2.Boot up a guest in dst host with -incoming tcp:0:5888
3.Start migrate
(qemu) migrate -d tcp:$dst_ip:5888

and in dst, guest will quit with the error message.

Actual results:
Migrate failed
Expected results:
Migrate succuss

Additional info:
1. cmd line
qemu-kvm -name 'vm1' -chardev socket,id=monitor,path=/tmp/monitor-qmpmonitor1-20110421-151720-HWpY,server,nowait -mon chardev=qmp_monitor_id_qmpmonitor1,mode=readline -chardev socket,id=serial_id_20110421-151720-HWpY,path=/tmp/serial-20110421-151720-HWpY,server,nowait -device isa-serial,chardev=serial_id_20110421-151720-HWpY -drive file='/usr/auto/test/autotest-devel/client/tests/kvm/images/RHEL-Server-5.6-64-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,snapshot=on,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device e1000,netdev=idcG0TC3,mac=9a:2e:3f:52:e8:1a,id=ndev00idcG0TC3,bus=pci.0,addr=0x3 -netdev tap,id=idcG0TC3,ifname='t0-151720-HWpY',script='/usr/auto/test/autotest-devel/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -spice port=8000,disable-ticketing -vga qxl -rtc base=utc,clock=host,driftfix=none -M rhel6.1.0 -boot order=cdn,once=c,menu=off   -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm
2. cpu info in both src and dst:
processor	: 2
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: AMD Phenom(tm) 8750 Triple-Core Processor
stepping	: 3
cpu MHz		: 1200.000
cache size	: 512 KB
physical id	: 0
siblings	: 3
core id		: 2
cpu cores	: 3
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock
bogomips	: 4809.88
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

Comment 2 Chao Yang 2011-08-17 08:32:21 UTC
Hit same issue when testing stable guest ABI, and raising TestBlocker flag, cause it blocks stable guest ABI testing,

(qemu) qemu: warning: error while loading state for instance 0x0 of device '0000:00:02.0/qxl'
load of migration failed

SOURCE host info:
# uname -r
2.6.32-131.12.1.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64
# rpm -q vgabios
vgabios-0.6b-3.6.el6.noarch


DESTINATION host info:
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.183.el6.x86_64
# uname -r
2.6.32-188.el6.x86_64
# rpm -q vgabios
vgabios-0.6b-3.6.el6.noarch

CLI:
# /usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 4096 -smp 4 -name win2k8r2sp1 -uuid 1b6c1dc5-e8f6-412d-9ce1-b3edefcdea36 -rtc base=localtime,clock=host,driftfix=slew -boot c -drive file=/mnt/stable-guest-ABI/win2k8r2sp1.qcow2,if=none,id=drive-ide-0-0,media=disk,format=qcow2,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-ide-0-0,id=ide0-0-0,bootindex=1 -drive file=/mnt/stable-guest-ABI/attached-file.raw,if=none,id=drive-virtio-0-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-0-0,id=virt-0-0-0  -netdev tap,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=64:31:50:41:e1:13 -netdev tap,id=hostnet2,vhost=on -device virtio-net-pci,netdev=hostnet2,id=net2,mac=64:31:50:41:a1:a3 -netdev tap,id=hostnet3 -device e1000,netdev=hostnet3,id=net3,mac=64:31:50:41:b1:e3  -usb -device usb-tablet,id=input1 -spice port=9000,disable-ticketing -global qxl-vga.vram_size=67108864 -vga qxl -monitor stdio -device virtio-balloon-pci,id=ballooning -qmp tcp:0:8000,server,nowait -drive file=/mnt/stable-guest-ABI/en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_with_sp1_x64_dvd_617601.iso,media=cdrom,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/mnt/stable-guest-ABI/floppy.img,id=fda,if=none,format=raw,cache=none -global isa-fda.driveA=fda -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=17,bus=pci.0 -chardev socket,path=/mnt/stable-guest-ABI/socket,server,nowait,id=channel0 -device virtserialport,chardev=channel0,name=org.kvm.port.0,bus=virtio-serial0.0,id=port1 -device intel-hda,id=sound0,bus=pci.0 -device hda-duplex

Comment 6 Uri Lublin 2012-04-23 15:26:26 UTC
I was able to reproduce the bug as described in Description (#c0)
When migrating from qemu-kvm version -153 to -159 I always get:
   (qemu) Unknown savevm section or instance '0000:00:01.3/piix4_pm' 0
   load of migration failed

What is so special about version -153 ?
If I understand correctly -160 was released in RHEL-6.1. If that's correct, do we care about migrating between 153 and 159 ?

git bisect claims the problem is this patch:
    0b6269a9988  acpi_piix4: Maintain RHEL6.0 migration

Comment 7 Uri Lublin 2012-04-24 08:10:03 UTC
I was able to reproduce the bug as described in #c2: going from -160 to -183.
git bisect suggests the culprit is ...
    6fc33d6d0d2370935243a04b00abcd15d0aa8658   qxl: bump pci rev

The default qxl revision at -160 is 2 and at -183 is 3.
With -M rhel6.1.0 we should make the default qxl revision 2.

The following patch seems to solve that problem:
As part of PC_RHEL6_1_COMPAT definition.
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1587,6 +1587,14 @@ static void rhel_common_init(const char *type1_version,
             .property = "event_idx",\
             .value    = "off",\
         },{\
+            .driver   = "qxl-vga",\
+            .property = "revision",\
+            .value    = stringify(2),\
+        },{\
+            .driver   = "qxl",\
+            .property = "revision",\
+            .value    = stringify(2),\
+        },{\
             .driver   = "virtio-balloon",\
             .property = "event_idx",\
             .value    = "off",\

Comment 10 Qunfang Zhang 2012-05-02 11:01:51 UTC
Hi, Uri
Re-produced this issue when migrate guest from -160 to -183. And also can reproduce it when migrate guest from -160 to -282 (rhel6.3 build)

CLI:
Source host:
/usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -uuid 4c21b4e8-de85-4e31-89b6-8b746d1eeca2 -rtc base=localtime,driftfix=slew -m 2048 -smp 2 -name rhel6.3-64 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/opt/RHEL6.3-20120426.2-Server-x86_64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,werror=stop,rerror=stop -device ide-drive,bus=ide.0,unit=1,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/tmp/socket-1,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -monitor stdio -boot c -qmp tcp:0:5555,server,nowait -spice port=5930,disable-ticketing -global qxl-vga.vram_size=67108864 -vga qxl  -device virtio-balloon-pci,id=balloon0,bus=pci.0,id=0x6

Dst host:
Append "-incoming tcp:0:5800"

Result:
On the destination host side: 
(qemu) qemu: warning: error while loading state for instance 0x0 of device '0000:00:02.0/qxl'
load of migration failed

===================
Verification:

**********
With "-M rhel6.1.0"

(1)Migrate guest from qemu-kvm-160 (6.1 host) to qemu-kvm-289 (6.3 fixed version)
==> Still failed with the above error.

(2)Migrate guest from qemu-kvm-282 to qemu-kvm-283 (2 rhel6.3 version before fix)
==> Passed.

(3)Migrate guest from qemu-kvm-282 (unfixed version) to qemu-kvm-284 or -289 (fixed version)
==>  Still failed with the above error.

(4)Migrate guest from qemu-kvm-209 (6.2 released version) to qemu-kvm-284.
==> Failed.

**********

With "-M rhel6.2.0"
(1) Migrate guest from qemu-kvm-209 (6.2 released version) to qemu-kvm-284.
==> Passed

**********
With "-M rhel6.3.0"
(1) Migrate guest from qemu-kvm-282 (unfixed version) to qemu-kvm-284 (fixed version).
==> Passed.

===================

So, based on above, there's still problem. Re-assign this bug.

Comment 11 juzhang 2012-05-02 12:09:23 UTC
Hi, Uri

Would you please have a look comment10? any further testing,please let us now,thanks

Comment 12 Uri Lublin 2012-05-02 15:02:31 UTC
Works for me,  from -160 to -284.

I use a single machine for my testing with two different git trees, each of a different build.

What does the output of the following when run on -160 -283 and -284 or later:
  ((echo "info qtree"; echo "q") | /usr/libexec/qemu-kvm  -M rhel6.1.0 -m 64 -vga qxl -spice port=9000 -S -monitor stdio | sed -n '/dev: *qxl-vga/,/dev-prop: *revision/ p'  | grep 'revision')


I'm getting:
  -160   dev-prop: revision = 2
  -183   dev-prop: revision = 3
  -284   dev-prop: revision = 2





The qemu-kvm command line I used (for the migration src VM):
../ROOT160/bin/qemu-kvm -M rhel6.1.0 -enable-kvm -m 1024 -smp 4 -name win2k8r2sp1 -uuid 1b6c1dc5-e8f6-412d-9ce1-b3edefcdea36 -rtc base=localtime,clock=host,driftfix=slew -boot c -drive file=/tmp/r6.qcow2,if=none,id=drive-ide-0-0,media=disk,format=qcow2,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-ide-0-0,id=ide0-0-0,bootindex=1 -drive file=/tmp/img2.raw,if=none,id=drive-virtio-0-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-0-0,id=virt-0-0-0 -netdev tap,id=hostnet1,script=/etc/kvm/qemu-ifup,downscript=no -device rtl8139,netdev=hostnet1,id=net1,mac=64:31:50:41:e1:13 -netdev tap,id=hostnet3,script=/etc/kvm/qemu-ifup,downscript=no -device e1000,netdev=hostnet3,id=net3,mac=64:31:50:41:b1:e3 -usb -device usb-tablet,id=input1 -spice port=9000,disable-ticketing -vga qxl -monitor stdio -global qxl-vga.vram_size=67108864 -device qxl,vram_size=67108864 -device virtio-balloon-pci,id=ballooning -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=17,bus=pci.0 -device intel-hda,id=sound0,bus=pci.0 -device hda-duplex

Comment 13 Qunfang Zhang 2012-05-03 03:22:17 UTC
Hi, Uri
I found if attach the usb device part "-usb -device usb-tablet,id=input1" and re-test, it will pass when migrate guest from -160 to -284.
But if remove the "-usb -device usb-tablet,id=input1", it will fail. 

Could you check it again? 

btw, implement the following command you provide:

((echo "info qtree"; echo "q") | /usr/libexec/qemu-kvm  -M rhel6.1.0 -m 64
-vga qxl -spice port=9000 -S -monitor stdio | sed -n '/dev:
*qxl-vga/,/dev-prop: *revision/ p'  | grep 'revision')


I'm getting the same output as yours:
  -160   dev-prop: revision = 2
  -183   dev-prop: revision = 3
  -284   dev-prop: revision = 2

Comment 14 Michal Novotny 2012-05-03 17:09:33 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
Migration to hosts with older versions (notably RHEL-6.1) failed.

Consequence:
Guest cannot be migrated to RHEL-6.1 hosts.

Fix:
This has been caused by incompatible QXL revision. Revision number has been changed to be the same for both versions.

Result:
Guest can be migrated successfully even to RHEL-6.1 hosts.

Comment 15 Uri Lublin 2012-05-10 15:23:48 UTC
(In reply to comment #13)
> Hi, Uri
> I found if attach the usb device part "-usb -device usb-tablet,id=input1" and
> re-test, it will pass when migrate guest from -160 to -284.
> But if remove the "-usb -device usb-tablet,id=input1", it will fail. 
> 
> Could you check it again? 

Qunfang, thanks for letting me use your machines to debug this.

I found that qxl IO BAR size has changed between revision 2 and revision 3.
Migration failed when loading the qxl device PCI config area.

Comment 16 Qunfang Zhang 2012-05-11 03:36:08 UTC
No problem and anything need I try or if you need the environment please ping me, I will do that then.

Comment 17 Qunfang Zhang 2012-05-14 10:59:51 UTC
Verified this bug on qemu-kvm-0.12.1.2-2.293.el6, ping pong migration between a rhel6.3 host (qemu-293) and a rhel6.1 host with the below command line. Can not reproduce the original problem, so this problem is fixed. But during lots of time ping-pong migration, found there's some other issues:

Command line:
/usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 1024 -smp 2 -name rhel6.3-64 -uuid 4c21b4e8-de85-4e31-89b6-8b746d1eeca2 -rtc base=localtime,clock=host,driftfix=slew -boot c  -drive file=/opt/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,werror=stop,rerror=stop -device ide-drive,bus=ide.0,unit=1,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet1,script=/etc/qemu-ifup,downscript=no -device rtl8139,netdev=hostnet1,id=net1,mac=64:31:50:41:e1:13 -netdev tap,id=hostnet3,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet3,id=net3,mac=64:31:50:41:b1:e3  -monitor stdio -qmp tcp:0:5555,server,nowait -spice port=9000,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864 -device qxl,vram_size=67108864  -device virtio-balloon-pci,id=balloon0,bus=pci.0,id=0x6 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=17,bus=pci.0,addr=0x7 -device intel-hda,id=sound0,bus=pci.0 -device hda-duplex

(1) After several times ping-pong migration, guest hangs at the destination host. Mostly happens on the rhel6.3 host, and once I hit this problem on 6.1 host. And guest consumes 100% cpu, I can not input anything or move the mouse in guest. It's about 2/10 reproduced. 

So, I remove the "-vga qxl" and re-test, did not hit it after 15 times attempts.

(2) For only once, I hit the following problem:
After migrate guest from rhel6.3 host to rhel6.1 host, migration failed and prompt the following message:
(qemu) handle_dev_destroy_surfaces: 
qxl_worker_loadvm_commands: 
handle_dev_input: loadvm_commands
handle_dev_destroy_surfaces: 
savevm: unsupported version 2 for 'hda-audio' v1
load of migration failed

Remove the sound device "-device intel-hda,id=sound0,bus=pci.0 -device hda-duplex", did not hit it after 15 times attempts. 

Hi, Uri
Anyway, the original issue in this bug is fixed. And could you help have a look at the above issues?  Any question or additional scenarios need I test please ping me.

And could we verified this bug and open new bugs for tracking the above issue? 

Thanks,
Qunfang

Comment 19 Qunfang Zhang 2012-05-15 11:07:10 UTC
According to Comment 17, the original bug is fixed so I would like to change the status to VERIFIED. For the 2 issues mentioned, file new bug 821655 and 821692.

Comment 22 errata-xmlrpc 2012-06-20 11:33:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0746.html