Description of problem:
If you invoke an 'exec' based migration of a guest, the moment it starts, the entire guest stops responding to interaction over VNC. The guest remains dead after the migration has completed
Version-Release number of selected component (if applicable):
Steps to Reproduce:
/usr/libexec/qemu-kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name f14i686 -uuid 0874d900-61db-1dac-9396-59fc79090815 -nodefaults -chardev stdio,id=monitor -mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive file=/var/lib/libvirt/images/f14i686.img,if=none,id=drive-virtio-disk0,boot=on,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -k en-us -vga cirrus
2. Type 'cont' in monitor
3. Connect with VNC and interact with the guest
4. Type in monitor:
migrate "exec:cat >>/root/test.dump 2>/dev/null"
You can no longer interact with the guest OS in VNC. Further guest interactions such as querying memory balloon also hang.
Migration is live & guest is fully responsive at all times.
Tcp based migration does not appear to suffer this problem.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release. Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release. This request is not yet committed for
- Running -no-kvm does not suffer the problem
- Running with kvm, but with -no-kvm-irqchip does not suffer the problem
Thus looks like a flaw in the kernel IRQ chip integration somehow impacting only exec based migration. Perhaps its a timing problem, since the exec based migration is much slower than tcp migration which doesn't show this hang
Could you recheck with latest kernel/qemu-kvm?
I am not able to reproduce with:
Reproducable with qemu-kvm-0.12.1.2-2.68.el6.x86_64.
Not 100% for me. Load in the guest seems to make it more likely.
FYI, in my testing I was triggering the migration start while the guest was in the middle of initscripts bootup sequence, hence it almost certainly had quite high load, both CPU + disk I/O.
cf upstream discussion that is probably relevant http://firstname.lastname@example.org/msg35799.html
The root cause of this is probably this kernel bug https://bugzilla.redhat.com/show_bug.cgi?id=601192
I am not able to reproduce it locally. I can do a migration in the middle of the boot sequence (once we enter X to have a mouse), and I can move the mouse as fast as I can, no problem here.
Wasn't able to reproduce it here on a 4GB guest:
/usr/libexec/qemu-kvm -m 4G -smp 2 -drive file=/var/lib/libvirt/images/f12.img,if=ide,cache=none,boot=on -net nic,model=virtio,vlan=1,macaddr=02:00:40:3F:20:10 -net tap,vlan=1,script=/etc/kvm-ifup.sh -boot c -uuid 17544ecc-d3a1-4d3c-a386-12daf50015f1 -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -balloon none -startdate now -name f12-source -vnc 127.0.0.1:0
Guest stayed responsive pretty much until the migration command completed.
*** This bug has been marked as a duplicate of bug 601192 ***