Bug 862401

Summary: Migration does not end in some situations or takes unapropriate time
Product: Red Hat Enterprise Linux 6 Reporter: Marian Krcmarik <mkrcmari>
Component: qemu-kvmAssignee: Orit Wasserman <owasserm>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.4CC: acathrow, alevy, areis, bsarathy, cfergeau, dblechte, dyasny, hhuang, juzhang, mkenneth, qzhang, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-17 08:41:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marian Krcmarik 2012-10-02 19:15:33 UTC
Description of problem:
Seamless migration does not end in some situations or takes unpropriatte time (means 10 and more minutes). Qemu monitor command "info spice" shows that guest is not migrated (Migrated: false).
I reproduced in following situations:
- Constant resolution change in RHEL guest, for example running this command during migration:
while true; do xrandr --output qxl-0 --mode 1024x768; sleep 1; xrandr --output qxl-0 --mode 1600x900; sleep 1; done

- HD video playback on Windows guest.

Not sure how to debug that more. Once I kill the resolution change command or stop video playback, migration is finished immediatelly.

Version-Release number of selected component (if applicable):
spice-gtk-python-0.14-2.el6.x86_64
spice-gtk-debuginfo-0.13.29-1.el6.x86_64
spice-gtk-0.14-2.el6.x86_64
spice-server-0.12.0-1.el6.x86_64
virt-viewer-0.5.2-11.el6.x86_64
spice-server-debuginfo-0.12.0-1.el6.x86_64
virt-viewer-debuginfo-0.5.2-10.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.308.el6.alon.bz770842.v1.x86_64
spice-gtk-tools-0.14-2.el6.x86_64
qemu-kvm-0.12.1.2-2.319.el6.x86_64

How reproducible:
90%

Steps to Reproduce:
1. Connect to a guest with remote-viewer.
2. Run for example continous resoltuion change or HD video on the guest.
3. Perform seamless migration.
  
Actual results:
Guest is still in migration, does not end or takes long time.

Expected results:
Finished successful migration.

Additional info:

Comment 2 Marian Krcmarik 2012-10-12 15:27:56 UTC
I am moving this to qemu, It does happen even when spice client is not connected and even without seamless migration.
I am not observing this on RHEL6.3 qemu, The same guest with the same movie playing takes maybe 30 seconds to migrate on (295), but with used qemu (-319)  I was willing to wait 1 hour the most. Destination monitor seems to be unresposive during migrtion too.

Source monitor command "info migrate" outputs active status.

Comment 3 Orit Wasserman 2012-10-16 09:39:25 UTC
are you using libvirt to run Qemu ?
can you provide the command line used to run Qemu ?

Comment 4 Marian Krcmarik 2012-10-16 09:46:25 UTC
(In reply to comment #3)
> are you using libvirt to run Qemu ?
Nope
> can you provide the command line used to run Qemu ?
/usr/libexec/qemu-kvm -m 1024 -smp 1 -vga qxl -enable-kvm -spice port=3002,disable-ticketing,seamless-migration=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device AC97 /dev/rootvg/Windows7_test -monitor stdio

Comment 5 Alon Levy 2012-10-16 10:14:25 UTC
Hi Marian,

 I tried to reproduce with Orit, and couldn't using the xrandr invocation. Using a video we couldn't either, but there we set the migration speed to 1G. In other words, using xrandr while loop both 319 and 295 qemu converged quickly, and using gst-launch to play a 1080 webm video (tears of steel) both 295 and 319 didn't converge quickly enough (not in 30 seconds) until we set "migrate_set_speed 1G" and then both migrated almost immediately (1 second).

 One difference is that we used semi-seamless migration, and not seamless. Did you see non-convergence when using semi-seamless?

 Can you provide perhaps more details of the exact command line you used, version of qxl driver in guest (assuming it is related), did this happen when using -vga cirrus (to rule out spice/qxl involvement) (specifically for the video test case, since I'm not sure you can use xrandr to switch resolution in cirrus, at least not to 1600x1200)?

Alon

Comment 6 Alon Levy 2012-10-16 10:33:25 UTC
Further information:

 Took the command line you provided (I was using 512 m and some extra devices), added seamless_migration (for 319, 295 doesn't support it), same result for xrandr - 319 completes migration in ~10 seconds.

 For video, 200M migrate speed is enough (for 319), less then that it doesn't converge.

 I was running without audio though (no -device AC97, and playing video only, no audio), so I'll try that now.

 Which app did you use to play the video? (slim chance it is relevant)

Alon

Comment 7 Alon Levy 2012-10-16 10:49:52 UTC
Ran with audio device, no difference - 30M no converge both 295 & 319, 200M bandwidth does converge.

Perhaps the older version you used was not 295 but earlier?

Alon

Comment 8 Alon Levy 2012-10-16 10:57:55 UTC
Further info 2:
 Tried with -vga cirrus and got the same result, no convergence with the default bandwidth of 30M/s, this time it did converge with 100M and not 200M, but I waited longer. This is with 295, didn't test 319 (since we are looking for a regression I don't think it's required).

 So I don't think it has anything to do with qxl, and if there is a regression it isn't between 295 and 319, or it requires different guests / commandlines then what I've been using. Guest is fedora 18 with custom kernel, same for all runs.

Alon

Comment 9 Marian Krcmarik 2012-10-16 14:08:51 UTC
Alon,

Thanks for all those tests, It looks like my mistake in the end, I did not set good migration speed, first of all I believed the default was unlimited and then I tried 100M which seems to be not good enough.
As well It looks like I was just lucky not to hit it with 295, It's harder to hit it for me with 295 but I managed to get migration which took 30 minutes and still was running with 295 once.
And since libvirt and vdsm default is unlimited speed, I guess we do not need to be worried.