Bug 729621
Summary: | ASSERT worker->running failed on source qemu during migration with Spice session | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Marian Krcmarik <mkrcmari> |
Component: | qemu-kvm | Assignee: | Alon Levy <alevy> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.2 | CC: | alevy, bcao, cmeadors, dblechte, gyue, juzhang, kraxel, mkenneth, qzhang, tburke, virt-maint, xwei |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-0.12.1.2-2.192.el6 | Doc Type: | Bug Fix |
Doc Text: |
Cause
pausing all vcpus was done as a special registered handler in the kvm_vm_state_change_handler list
Consequence
on migration, a source vm having been moved to the stopped condition would receive an io exit after it's state change handler was called but before the vcpus are paused, causing an assert on worker->running.
Fix
make vcpu pause (unpause) happen after (before) all handlers are called, respectively.
Result
migration doesn't cause this assert.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2011-12-06 15:56:01 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 743047 |
Description
Marian Krcmarik
2011-08-10 11:50:03 UTC
Reproduced this issue on qemu-kvm-0.12.1.2-2.183 spice-client-0.8.2-3 spice-server-0.8.2.3 *note* :only can reproduced on linux guest (rhel6u1 64bit ) ,tried with windows guest ,can not reproduce. steps: 1.start guest with following CLI: /usr/libexec/qemu-kvm -M rhel6.2.0 -cpu Westmere -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name RHEL6 -uuid 196160f2-7e48-48e1-ab51-8c21e79d74d5 -monitor stdio -rtc base=localtime -boot c -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/e390e371-703e-4e36-a95b-cd85aaf0e34b/677134f8-9e35-4bad-a4cb-0f8a088d7756/images/c79d47f2-057c-44e8-97ed-2ff1000826be/2d1f64db-cbb0-4693-9647-f31d0ceb2e65,if=none,id=drive-virtio-disk0,format=raw,serial=e8-97ed-2ff1000826be,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/rhev/data-center/e390e371-703e-4e36-a95b-cd85aaf0e34b/71bd942b-e339-427f-b528-1a45f337483d/images/11111111-1111-1111-1111-111111111111/RHEL6.1-20110510.1-Server-x86_64-DVD1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:03,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/RHEL6.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -usb -spice port=5901,disable-ticketing -k en-us -vga qxl -global qxl-vga.vram_size=67108864 2.start listening port 3.start spice client . 4 in the guest ,open http://www.jiujiuba.com/picnews/shownews.asp?infoid=1363 ,then scroll the mouse 5,during step4 ,doing live migration. Actual Results: migration failed . additional info : bt #0 0x0000003591e32a45 in raise () from /lib64/libc.so.6 #1 0x0000003591e34225 in abort () from /lib64/libc.so.6 #2 0x0000003627230ad9 in handle_dev_update (listener=0x7ff620bb4c00, events=<value optimized out>) at red_worker.c:9725 #3 handle_dev_input (listener=0x7ff620bb4c00, events=<value optimized out>) at red_worker.c:9982 #4 0x000000362722fa75 in red_worker_main (arg=<value optimized out>) at red_worker.c:10304 #5 0x00000035926077e1 in start_thread () from /lib64/libpthread.so.0 #6 0x0000003591ee68ed in clone () from /lib64/libc.so.6 I have a scratchbuild that I'd like you to test: https://brewweb.devel.redhat.com/taskinfo?taskID=3630214 Alon The patch it applies (it also applies a bunch of unrelated patches to fix compilation under fedora 16 because of the newer gcc, which are unrelated) is: commit 00505234300724c18d5758b5b957648f08a127d8 Author: Alon Levy <alevy> Date: Tue Sep 13 15:10:18 2011 +0300 qemu-kvm: pause threads before calling other vmstate handlers RHBZ: 729621 upstream: not relevant (upstream always uses io thread since recently) There are two different paths for handling vm_stop: 1. with io thread - not interesting since it isn't used by RHEL-6.2 qemu-kvm.spec 2. without io thread. Without io thread: during kvm_create_context (via kvm_init_ap) kvm_vm_state_change_handler is registered. This handler is run after any handlers already registered. It is responsible for stopping the vcpu threads (signaling the threads and waiting on a condition variable). The problem: kvm_vm_state_change_handler is run after qxl_state_change_handler, which sets the spice worker thread to stopped state. But since the vcpus are not yet stopped, an io exit then occurs, that leads to a call to the spice thread, which asserts because it is stopped. Possible solutions: 1. register the kvm_vm_state_change_handler before all others. This seems fragile. 2. Use the already existing pause_all_vcpus in vl.c (and corresponding resume_all_vcpus), currenty empty when io thread is not in use, to call the kvm specific functions. This is what this patch does. RFC on: solution. exporting kvm_pause_all_threads and kvm_resume_all_threads (1. is it a good solution? 2. is it the right place to put them in the headers) diff --git a/qemu-kvm.c b/qemu-kvm.c index 732d6ec..6e9c400 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1874,7 +1874,7 @@ static int all_threads_paused(void) return 1; } -static void pause_all_threads(void) +void kvm_pause_all_threads(void) { CPUState *penv = first_cpu; @@ -1894,7 +1894,7 @@ static void pause_all_threads(void) qemu_cond_wait(&qemu_pause_cond); } -static void resume_all_threads(void) +void kvm_resume_all_threads(void) { CPUState *penv = first_cpu; @@ -1908,14 +1908,6 @@ static void resume_all_threads(void) } } -static void kvm_vm_state_change_handler(void *context, int running, int reason) -{ - if (running) - resume_all_threads(); - else - pause_all_threads(); -} - static void setup_kernel_sigmask(CPUState *env) { sigset_t set; @@ -1937,7 +1929,7 @@ static void qemu_kvm_system_reset(void) { CPUState *penv = first_cpu; - pause_all_threads(); + kvm_pause_all_threads(); qemu_system_reset(); @@ -1946,7 +1938,7 @@ static void qemu_kvm_system_reset(void) penv = (CPUState *) penv->next_cpu; } - resume_all_threads(); + kvm_resume_all_threads(); } static void process_irqchip_events(CPUState *env) @@ -2056,8 +2048,6 @@ int kvm_init_ap(void) { struct sigaction action; - qemu_add_vm_change_state_handler(kvm_vm_state_change_handler, NULL); - signal(SIG_IPI, sig_ipi_handler); memset(&action, 0, sizeof(action)); @@ -2221,7 +2211,7 @@ int kvm_main_loop(void) } bdrv_close_all(); - pause_all_threads(); + kvm_pause_all_threads(); pthread_mutex_unlock(&qemu_mutex); return 0; diff --git a/qemu-kvm.h b/qemu-kvm.h index 1b905f8..0047c16 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -1167,4 +1167,8 @@ int kvm_tpr_enable_vapic(CPUState *env); #endif +// TODO - put it somewhere appropriate +void kvm_pause_all_threads(void); +void kvm_resume_all_threads(void); + #endif diff --git a/vl.c b/vl.c index 5cec717..2c4906c 100644 --- a/vl.c +++ b/vl.c @@ -3397,10 +3397,16 @@ int qemu_cpu_self(void *env) static void resume_all_vcpus(void) { + if (kvm_allowed) { + kvm_resume_all_threads(); + } } static void pause_all_vcpus(void) { + if (kvm_allowed) { + kvm_pause_all_threads(); + } } void qemu_cpu_kick(void *env) (In reply to comment #4) > I have a scratchbuild that I'd like you to test: > > https://brewweb.devel.redhat.com/taskinfo?taskID=3630214 > > Alon I've tested the scratch build and I've not been able to reproduce the assert fail, I could reproduce it on qemu-kvm-0.12.1.2-2.184, but not on the Alon's scratch build. So It fixes the problem for me. Thanks Reproduced at the second time migration according to steps in Comment 2. CLI: /usr/libexec/qemu-kvm -M rhel6.2.0 -cpu Westmere -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name RHEL6 -uuid 196160f2-7e48-48e1-ab51-8c21e79d74d5 -monitor stdio -rtc base=localtime -boot c -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/media/RHEL6.1-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:03,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/tmp/foo,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -usb -spice port=5901,disable-ticketing -k en-us -vga qxl -global qxl-vga.vram_size=67108864 Result: handle_dev_update: ASSERT worker->running failed Aborted (core dumped) Verified with: kernel-2.6.32-206.el6.x86_64 qemu-kvm-0.12.1.2-2.195.el6.x86_64 Test with the same steps and command line, tried more than 20 times ping-pong migration, did not hit the issue any more. So, this bug is fixed. *** Bug 736631 has been marked as a duplicate of this bug. *** I can reproduce this bug on qemu-kvm-0.12.1.2-2.198.el6.x86_64 doing migration when guest rebooting #0 0x0000003e8e832885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #0 0x0000003e8e832885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003e8e834065 in abort () at abort.c:92 #2 0x0000003e7c231639 in handle_dev_update (listener=0x7f7e72e29c00, events=<value optimized out>) at red_worker.c:9725 #3 handle_dev_input (listener=0x7f7e72e29c00, events=<value optimized out>) at red_worker.c:9982 #4 0x0000003e7c2305d5 in red_worker_main (arg=<value optimized out>) at red_worker.c:10304 #5 0x0000003e8f0077f1 in start_thread (arg=0x7f7e72ffe700) at pthread_create.c:301 #6 0x0000003e8e8e570d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 According to comment #13 and comment #14 ,Bug 736631 is not as same as this one . Move this bug's status to VERIFIED according to comment #10. Re-assign BUg 736631 according to comment #13 and comment #14 Mike Adding technical note. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause pausing all vcpus was done as a special registered handler in the kvm_vm_state_change_handler list Consequence on migration, a source vm having been moved to the stopped condition would receive an io exit after it's state change handler was called but before the vcpus are paused, causing an assert on worker->running. Fix make vcpu pause (unpause) happen after (before) all handlers are called, respectively. Result migration doesn't cause this assert. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1531.html |