Bug 596028 - qemu-kvm stuck at 100% cpu after live migration with spice,then guest time goes very fast.
qemu-kvm stuck at 100% cpu after live migration with spice,then guest time go...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.0
All Linux
medium Severity medium
: rc
: ---
Assigned To: Gerd Hoffmann
Virtualization Bugs
: TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-26 03:17 EDT by Mike Cao
Modified: 2013-01-09 17:37 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-06-04 07:51:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mike Cao 2010-05-26 03:17:24 EDT
Description of problem:
start Windows 7 64 bit with -spice and -vga
qemu-kvm stuck at 100% cpu after live migration,then Guest time goes very fast.


Version-Release number of selected component (if applicable):
# uname -r
2.6.32-28.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.62.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Start win7_64bit guest with spice.
CLI: /usr/libexec/qemu-kvm -rtc-td-hack -no-hpet -usbdevice tablet -drive file=window7_64bit_ide.qcow2,if=ide,cache=none,werror=stop,rerror=stop -net nic,macaddr=20:aa:30:aa:40:aa,model=virtio,vlan=0 -net tap,script=/etc/qemu-ifup,vlan=0 -uuid `uuidgen` -boot c -cpu qemu64,+sse2 -smp 2 -m 4G -spice port=5930,disable-ticketing -vga qxl -balloon none -monitor stdio 
2.do live migration

Actual results:
After live migration ,the vm stuck at 100% CPU for more a less 3 mininuts
then the guest live,but time goes very fast.


Expected results:


Additional info:
I also test it with -vnc
steps:
1.start win7_64bit guest with "-vnc:10" ,then do live migration
2.start the same image with "-spice port=5930,disable-ticketing -vga qxl" then do live migration.
3.repeat step 1

Actual results:
after step 1,the VM in the dest host can work well.
after step 2,the VM hit this issue.
after step 3,the VM still hit this issue.
Comment 2 Mike Cao 2010-05-26 06:14:42 EDT
1.Start win7_64bit guest with spice.
CLI: /usr/libexec/qemu-kvm -rtc-td-hack -no-hpet -usbdevice tablet -drive
file=window7_64bit_ide.qcow2,if=ide,cache=none,werror=stop,rerror=stop -net
nic,macaddr=20:aa:30:aa:40:aa,model=virtio,vlan=0 -net
tap,script=/etc/qemu-ifup,vlan=0 -uuid `uuidgen` -boot c -cpu qemu64,+sse2 -smp
2 -m 4G -spice port=5930,disable-ticketing -vga qxl -balloon none -monitor
stdio 
2.do migration via compressed file

Actual results:
still hit this issue.
Comment 4 Amit Shah 2010-05-28 06:45:38 EDT
Does the guest get stuck if spice is not used?
Comment 5 Mike Cao 2010-05-30 21:24:53 EDT
(In reply to comment #4)
> Does the guest get stuck if spice is not used?    

Just as the addtional info .
Create a new images using -vnc instead of "-spice" ,live migration can be done successfully.
Start VM with "-spice",then do live migration,will hit this issue.Then start the SAME image with "-vnc" instead of "-spice",during live migration,the VM still got stuck.
Comment 6 Gerd Hoffmann 2010-06-01 06:17:41 EDT
Does this also happen with '-vga std' ?
Do you have guest drivers installed ?
Comment 7 Gerd Hoffmann 2010-06-01 12:39:14 EDT
Fetching win7 64bit iso still in progress, tested with win7 32bit meanwhile.

I see the guest becoming stuck too.  For me it blocks the moment the migration starts, and the guest never recovers from it.  It doesn't happen each time I try.  Not sure what triggers it.  Moving around the mouse seems to make it more likely to happen.  And it seems to stick: when it happened once migration stops working until I reinstall the guest.

Oh, and it isn't related to spice at all.  I see this happen with vnc and cirrus too.
Comment 8 Mike Cao 2010-06-01 22:06:33 EDT
(In reply to comment #6)
> Does this also happen with '-vga std' ?
> Do you have guest drivers installed ?    

I only install virtio-net driver for virtual nic.
using "-vga std" cause segmentation fault ,the following is the steps.

1.start VM in sourse host:
#/usr/libexec/qemu-kvm -rtc-td-hack -no-hpet -usbdevice tablet -drive file=win7-64_1.qcow2,if=ide,cache=none,werror=stop,rerror=stop -net nic,macaddr=20:aa:30:aa:40:aa,model=virtio,vlan=0 -net tap,script=/etc/qemu-ifup,vlan=0 -uuid `uuidgen` -boot c -cpu qemu64,+sse2 -smp 2 -m 4G -spice port=5930,disable-ticketing -vga std -balloon none -monitor stdio
2.start VM in dest host:
# gdb /usr/libexec/qemu-kvm
(gdb) r-rtc-td-hack -no-hpet -usbdevice tablet -drive file=win7-64_1.qcow2,if=ide,cache=none,werror=stop,rerror=stop -net nic,macaddr=20:aa:30:aa:40:aa,model=virtio,vlan=0 -net tap,script=/etc/qemu-ifup,vlan=0 -uuid `uuidgen` -boot c -cpu qemu64,+sse2 -smp 2 -m 4G -spice port=5940,disable-ticketing -vga std -balloon none -monitor stdio -incoming tcp:0:5888
3.Do live migration..

Actual Result:
After live migration done.the guest in the dest host cause segmentation fault.

(gdb) bt
#0  0x0000003ff5e83e8b in memcpy () from /lib64/libc.so.6
#1  0x0000003ffbe20ea3 in ?? () from /usr/lib64/libspice-server.so.0
#2  0x0000003ffbe72df2 in quic_encode () from /usr/lib64/libspice-server.so.0
#3  0x0000003ffbe245e0 in ?? () from /usr/lib64/libspice-server.so.0
#4  0x0000003ffbe2b7c3 in ?? () from /usr/lib64/libspice-server.so.0
#5  0x0000003ffbe2bf07 in ?? () from /usr/lib64/libspice-server.so.0
#6  0x0000003ffbe2cf37 in ?? () from /usr/lib64/libspice-server.so.0
#7  0x0000003ffbe316be in red_worker_main () from /usr/lib64/libspice-server.so.0
#8  0x0000003ff6607761 in start_thread () from /lib64/libpthread.so.0
#9  0x0000003ff5ee14fd in clone () from /lib64/libc.so.6
Comment 9 Gerd Hoffmann 2010-06-02 15:03:31 EDT
What does "do migration via compressed file" mean?
How does your migrate monitor command look like?
Comment 10 Mike Cao 2010-06-02 21:36:09 EDT
(In reply to comment #9)
> What does "do migration via compressed file" mean?
> How does your migrate monitor command look like?    

Maybe the summary is not suitable and need to be modified .
I tried both live migration and migration via compressed file,and both of them trigger this bug.

livemigrate (qemu)migrate -d tcp:<ip>:5888
dest host: <command Line > -incoming tcp:0:5888


offline migration via compressed file :
(qemu)stop
(qemu)migrate_set_speed 4095m
(qemu)migrate "exec:gzip -c > test.gz"
dest host: <command Line> -incoming "exec:gzip -c -d test.gz"
Comment 11 Qunfang Zhang 2010-06-03 05:18:54 EDT
winXP-32 has this problem,too.
build:RHEL6.0-20100527.2
qemu-kvm:qemu-kvm-0.12.1.2-2.68.el6.x86_64
Comment 12 Qunfang Zhang 2010-06-03 05:22:35 EDT
(In reply to comment #11)
> winXP-32 has this problem,too.
> build:RHEL6.0-20100527.2
> qemu-kvm:qemu-kvm-0.12.1.2-2.68.el6.x86_64    

In fact, after migration, guest time goes very fast. But I haven't found qemu-kvm stuck at 100% cpu.
Comment 13 Gerd Hoffmann 2010-06-03 07:22:56 EDT
Ok guys.  On time running fast:  With the -rtc-td-hack switch qemu will compensate time drift between host and guest.  So guest time running fast is expected behaviour in case guest time lags behind the host.  It should stop running fast when the guest did catch up and is back in sync with the host.

The guest being stuck for three minutes can certainly make it lag behind three minutes and time go forward fast for a while.  The time drift compensation logic will also kick in after migration in case the time of the two host machines isn't in sync.  So I think the "time goes fast" you are seeing is normal behavior.

I wasn't successfull reproducing the other issues: neither the three minute hang nor the segfault in comment #8, using tcp live migration.

Oh, and note that exec: migration seems to be broken (see bug #585195).
The issues mentioned in comment #7 are most likely just that.  If you have trouble with migrate "exec:gzip ..." check this bug too please.
Comment 14 Gerd Hoffmann 2010-06-03 16:34:36 EDT
Tested win7 64bit too (additionally to 32bit win7+winxp).
Still not reproducible.

Can you please double-check the time synchronisation of the host machines?
Comment 15 Mike Cao 2010-06-04 06:04:18 EDT
(In reply to comment #14)
> Tested win7 64bit too (additionally to 32bit win7+winxp).
> Still not reproducible.
> 
> Can you please double-check the time synchronisation of the host machines?    

hi ,Gerd hoffmann,

using "-vga std" can cause qemu-kvm Segmentation fault.I still can reproduce it and open a new bug reffering to bug #600205.

I think the description in comment #13 for why "time goes fast" and "occasionally the qmeu-kvm being stuck " is right .It seems that I migrate from host whose time is earlier than the dest host's very time. 

Making the 2 host's time sync with clock.redhat.com before live migration will not hit the issue.

But When I migrate from the source host whose time is later than the time of the dest host(in my test case,I make dest host's time 5 minutes earlier than the source host's time).After live migration ,the guest will hang for a very long time. Is this still a BUG ?Or should live migration be done in the hosts whose time must be the same?

Addition info :
src host:#/usr/libexec/qemu-kvm -rtc-td-hack -no-hpet -usbdevice tablet -drive file=win7_2.qcow,if=ide,cache=none,werror=stop,rerror=stop -net nic,macaddr=20:aa:30:aa:40:aa,model=e1000,vlan=0 -net tap,script=/etc/qemu-ifup,vlan=0 -uuid `uuidgen` -boot c -cpu qemu64,+sse2 -smp 2 -m 2G -spice port=5930,disable-ticketing -vga qxl -balloon none -monitor stdio

dest host:#/usr/libexec/qemu-kvm -rtc-td-hack -no-hpet -usbdevice tablet -drive file=win7_2.qcow,if=ide,cache=none,werror=stop,rerror=stop -net nic,macaddr=20:aa:30:aa:40:aa,model=e1000,vlan=0 -net tap,script=/etc/qemu-ifup,vlan=0 -uuid `uuidgen` -boot c -cpu qemu64,+sse2 -smp 2 -m 2G -spice port=5930,disable-ticketing -vga qxl -balloon none -monitor stdio -incoming tcp:0:5888

(qemu)migrate -d tcp:<ip>:5888
Comment 16 Gerd Hoffmann 2010-06-04 07:51:43 EDT
Yes, having the host clocks synchronized is required.

Note You need to log in before you can comment on or make changes to this bug.