Bug 862401 - Migration does not end in some situations or takes unapropriate time
Migration does not end in some situations or takes unapropriate time
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.4
Unspecified Unspecified
unspecified Severity high
: rc
: ---
Assigned To: Orit Wasserman
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-02 15:15 EDT by Marian Krcmarik
Modified: 2014-03-03 19:24 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-10-17 04:41:01 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Marian Krcmarik 2012-10-02 15:15:33 EDT
Description of problem:
Seamless migration does not end in some situations or takes unpropriatte time (means 10 and more minutes). Qemu monitor command "info spice" shows that guest is not migrated (Migrated: false).
I reproduced in following situations:
- Constant resolution change in RHEL guest, for example running this command during migration:
while true; do xrandr --output qxl-0 --mode 1024x768; sleep 1; xrandr --output qxl-0 --mode 1600x900; sleep 1; done

- HD video playback on Windows guest.

Not sure how to debug that more. Once I kill the resolution change command or stop video playback, migration is finished immediatelly.

Version-Release number of selected component (if applicable):
spice-gtk-python-0.14-2.el6.x86_64
spice-gtk-debuginfo-0.13.29-1.el6.x86_64
spice-gtk-0.14-2.el6.x86_64
spice-server-0.12.0-1.el6.x86_64
virt-viewer-0.5.2-11.el6.x86_64
spice-server-debuginfo-0.12.0-1.el6.x86_64
virt-viewer-debuginfo-0.5.2-10.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.308.el6.alon.bz770842.v1.x86_64
spice-gtk-tools-0.14-2.el6.x86_64
qemu-kvm-0.12.1.2-2.319.el6.x86_64

How reproducible:
90%

Steps to Reproduce:
1. Connect to a guest with remote-viewer.
2. Run for example continous resoltuion change or HD video on the guest.
3. Perform seamless migration.
  
Actual results:
Guest is still in migration, does not end or takes long time.

Expected results:
Finished successful migration.

Additional info:
Comment 2 Marian Krcmarik 2012-10-12 11:27:56 EDT
I am moving this to qemu, It does happen even when spice client is not connected and even without seamless migration.
I am not observing this on RHEL6.3 qemu, The same guest with the same movie playing takes maybe 30 seconds to migrate on (295), but with used qemu (-319)  I was willing to wait 1 hour the most. Destination monitor seems to be unresposive during migrtion too.

Source monitor command "info migrate" outputs active status.
Comment 3 Orit Wasserman 2012-10-16 05:39:25 EDT
are you using libvirt to run Qemu ?
can you provide the command line used to run Qemu ?
Comment 4 Marian Krcmarik 2012-10-16 05:46:25 EDT
(In reply to comment #3)
> are you using libvirt to run Qemu ?
Nope
> can you provide the command line used to run Qemu ?
/usr/libexec/qemu-kvm -m 1024 -smp 1 -vga qxl -enable-kvm -spice port=3002,disable-ticketing,seamless-migration=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device AC97 /dev/rootvg/Windows7_test -monitor stdio
Comment 5 Alon Levy 2012-10-16 06:14:25 EDT
Hi Marian,

 I tried to reproduce with Orit, and couldn't using the xrandr invocation. Using a video we couldn't either, but there we set the migration speed to 1G. In other words, using xrandr while loop both 319 and 295 qemu converged quickly, and using gst-launch to play a 1080 webm video (tears of steel) both 295 and 319 didn't converge quickly enough (not in 30 seconds) until we set "migrate_set_speed 1G" and then both migrated almost immediately (1 second).

 One difference is that we used semi-seamless migration, and not seamless. Did you see non-convergence when using semi-seamless?

 Can you provide perhaps more details of the exact command line you used, version of qxl driver in guest (assuming it is related), did this happen when using -vga cirrus (to rule out spice/qxl involvement) (specifically for the video test case, since I'm not sure you can use xrandr to switch resolution in cirrus, at least not to 1600x1200)?

Alon
Comment 6 Alon Levy 2012-10-16 06:33:25 EDT
Further information:

 Took the command line you provided (I was using 512 m and some extra devices), added seamless_migration (for 319, 295 doesn't support it), same result for xrandr - 319 completes migration in ~10 seconds.

 For video, 200M migrate speed is enough (for 319), less then that it doesn't converge.

 I was running without audio though (no -device AC97, and playing video only, no audio), so I'll try that now.

 Which app did you use to play the video? (slim chance it is relevant)

Alon
Comment 7 Alon Levy 2012-10-16 06:49:52 EDT
Ran with audio device, no difference - 30M no converge both 295 & 319, 200M bandwidth does converge.

Perhaps the older version you used was not 295 but earlier?

Alon
Comment 8 Alon Levy 2012-10-16 06:57:55 EDT
Further info 2:
 Tried with -vga cirrus and got the same result, no convergence with the default bandwidth of 30M/s, this time it did converge with 100M and not 200M, but I waited longer. This is with 295, didn't test 319 (since we are looking for a regression I don't think it's required).

 So I don't think it has anything to do with qxl, and if there is a regression it isn't between 295 and 319, or it requires different guests / commandlines then what I've been using. Guest is fedora 18 with custom kernel, same for all runs.

Alon
Comment 9 Marian Krcmarik 2012-10-16 10:08:51 EDT
Alon,

Thanks for all those tests, It looks like my mistake in the end, I did not set good migration speed, first of all I believed the default was unlimited and then I tried 100M which seems to be not good enough.
As well It looks like I was just lucky not to hit it with 295, It's harder to hit it for me with 295 but I managed to get migration which took 30 minutes and still was running with 295 once.
And since libvirt and vdsm default is unlimited speed, I guess we do not need to be worried.

Note You need to log in before you can comment on or make changes to this bug.