Red Hat Bugzilla – Bug 862401
Migration does not end in some situations or takes unapropriate time
Last modified: 2014-03-03 19:24:44 EST
Description of problem:
Seamless migration does not end in some situations or takes unpropriatte time (means 10 and more minutes). Qemu monitor command "info spice" shows that guest is not migrated (Migrated: false).
I reproduced in following situations:
- Constant resolution change in RHEL guest, for example running this command during migration:
while true; do xrandr --output qxl-0 --mode 1024x768; sleep 1; xrandr --output qxl-0 --mode 1600x900; sleep 1; done
- HD video playback on Windows guest.
Not sure how to debug that more. Once I kill the resolution change command or stop video playback, migration is finished immediatelly.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Connect to a guest with remote-viewer.
2. Run for example continous resoltuion change or HD video on the guest.
3. Perform seamless migration.
Guest is still in migration, does not end or takes long time.
Finished successful migration.
I am moving this to qemu, It does happen even when spice client is not connected and even without seamless migration.
I am not observing this on RHEL6.3 qemu, The same guest with the same movie playing takes maybe 30 seconds to migrate on (295), but with used qemu (-319) I was willing to wait 1 hour the most. Destination monitor seems to be unresposive during migrtion too.
Source monitor command "info migrate" outputs active status.
are you using libvirt to run Qemu ?
can you provide the command line used to run Qemu ?
(In reply to comment #3)
> are you using libvirt to run Qemu ?
> can you provide the command line used to run Qemu ?
/usr/libexec/qemu-kvm -m 1024 -smp 1 -vga qxl -enable-kvm -spice port=3002,disable-ticketing,seamless-migration=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device AC97 /dev/rootvg/Windows7_test -monitor stdio
I tried to reproduce with Orit, and couldn't using the xrandr invocation. Using a video we couldn't either, but there we set the migration speed to 1G. In other words, using xrandr while loop both 319 and 295 qemu converged quickly, and using gst-launch to play a 1080 webm video (tears of steel) both 295 and 319 didn't converge quickly enough (not in 30 seconds) until we set "migrate_set_speed 1G" and then both migrated almost immediately (1 second).
One difference is that we used semi-seamless migration, and not seamless. Did you see non-convergence when using semi-seamless?
Can you provide perhaps more details of the exact command line you used, version of qxl driver in guest (assuming it is related), did this happen when using -vga cirrus (to rule out spice/qxl involvement) (specifically for the video test case, since I'm not sure you can use xrandr to switch resolution in cirrus, at least not to 1600x1200)?
Took the command line you provided (I was using 512 m and some extra devices), added seamless_migration (for 319, 295 doesn't support it), same result for xrandr - 319 completes migration in ~10 seconds.
For video, 200M migrate speed is enough (for 319), less then that it doesn't converge.
I was running without audio though (no -device AC97, and playing video only, no audio), so I'll try that now.
Which app did you use to play the video? (slim chance it is relevant)
Ran with audio device, no difference - 30M no converge both 295 & 319, 200M bandwidth does converge.
Perhaps the older version you used was not 295 but earlier?
Further info 2:
Tried with -vga cirrus and got the same result, no convergence with the default bandwidth of 30M/s, this time it did converge with 100M and not 200M, but I waited longer. This is with 295, didn't test 319 (since we are looking for a regression I don't think it's required).
So I don't think it has anything to do with qxl, and if there is a regression it isn't between 295 and 319, or it requires different guests / commandlines then what I've been using. Guest is fedora 18 with custom kernel, same for all runs.
Thanks for all those tests, It looks like my mistake in the end, I did not set good migration speed, first of all I believed the default was unlimited and then I tried 100M which seems to be not good enough.
As well It looks like I was just lucky not to hit it with 295, It's harder to hit it for me with 295 but I managed to get migration which took 30 minutes and still was running with 295 once.
And since libvirt and vdsm default is unlimited speed, I guess we do not need to be worried.