Bug 733993
Summary: | migration target can crash (assert(d->ssd.running)) | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Yonit Halperin <yhalperi> | |
Component: | qemu-kvm | Assignee: | Yonit Halperin <yhalperi> | |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 6.1 | CC: | alevy, bcao, dblechte, juzhang, mkenneth, shuang, tburke, virt-maint, xfu | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-0.12.1.2-2.192.el6 | Doc Type: | Bug Fix | |
Doc Text: |
Cause
qxl->ssd.running=true was set after telling the target spice server to start.
Spice server thread can call qxl_send_events while qxl->ssd.running is still false.
Consequence
target qemu aborts on assert(d->ssd.running)
Fix
set qxl->ssd.running=true before telling spice to start
Result
target qemu don't crash
|
Story Points: | --- | |
Clone Of: | ||||
: | 734784 (view as bug list) | Environment: | ||
Last Closed: | 2011-12-06 15:56:44 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 734784, 743047 |
Description
Yonit Halperin
2011-08-29 06:01:40 UTC
(In reply to comment #0) > Description of problem: > After migration completes, the target can crash with assert(d->ssd.running) > in qxl_send_events > > When migration completes and the target guest is started, the following > occurs: > qemu_spice_vm_change_state_handler is called > 1.1) qemu_spice_vm_change_state_handler calls qemu_spice_start > 1.2) qemu_spice_vm_change_state_handle sets ssd->running = true > > The problem is ssd->running is accessed both from spice's red_worker thread and > qemu thread. > 1) qemu thread: qemu_spice_start (but doesn't set ssd->running=true yet) > 2) red_worker thread: red_worker starts > 3) red_worker thread: calls qxl->interface_get_command and triggers > qxl_send_events > 4) assert(d->ssd.running) > The simplest solution is to just set ssd.running = true, before calling > qemu_spice_start. Alternatively, we can use locks. correction: we can't just move ssd.running: until start/stop are actually performed in the red_worker, the worker can perform other operations which trigger qxl_send_events, for example, and the ssd->running must be synchronized with the current worker state. In addition, I think that qemu_spice_start should be changed in spice-server to be synchronous. According to comment7 and comment8,would you please tell us these infos can make this issue as verified? Reporoduce this issue on qemu-kvm-0.12.1.2-2.159.el6 Verified this issue on qemu-kvm-0.12.1.2-2.200.el6 steps: 1.start guest with -spice CLI: /usr/libexec/qemu-kvm -M rhel6.2.0 -cpu Westmere -enable-kvm -m 2G -smp 2G -name rhel6 -uuid 716f1b4a-32f7-494a-ae38-d6371b7642c8 -monitor stdio -rtc base=utc -boot dc -drive file=/home/rhel6u2,if=none,id=drive-virtio0-0-0,format=raw,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virtio0-0-0 -netdev tap,script=/etc/qemu-ifup,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d0:4d:60,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -spice port=8002,disable-ticketing -vga qxl -global qxl-vga.vram_size=9437184 -device virtio-balloon-pci,id=balloon1 -usb -device usb-tablet,id=input0 2.install flash-plugin in the guest 3.Open http://v.youku.com/v_show/id_XMjc0MzU3OTUy.html 4.during step 3 ,do live migration Actual Results: on qemu-kvm-0.12.1.2-2.159.el6 qemu-kvm: /builddir/build/BUILD/qemu-kvm-0.12.1.2/hw/qxl.c:684: qxl_check_state: Assertion `((&ram->cmd_ring)->cons == (&ram->cmd_ring)->prod)' failed. (gdb) bt #0 0x00000033d8432885 in raise () from /lib64/libc.so.6 #1 0x00000033d8434065 in abort () from /lib64/libc.so.6 #2 0x00000033d842b9fe in __assert_fail_base () from /lib64/libc.so.6 #3 0x00000033d842bac0 in __assert_fail () from /lib64/libc.so.6 #4 0x000000000047552b in qxl_check_state (d=0x2aa7840) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:684 #5 qxl_soft_reset (d=0x2aa7840) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:707 #6 0x0000000000475ea5 in qxl_hard_reset (d=0x2aa7840, loadvm=1) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:733 #7 0x00000000004764ed in qxl_pre_load (opaque=0x2aa7840) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1469 #8 0x00000000004c1d4c in vmstate_load_state (f=0x2b1f0a0, vmsd=0x8d7e60, opaque=0x2aa7840, version_id=21) at savevm.c:1301 #9 0x00000000004c2399 in qemu_loadvm_state (f=0x2b1f0a0) at savevm.c:1784 #10 0x00000000004baaf9 in process_incoming_migration (f=<value optimized out>) at migration.c:73 #11 0x00000000004bae0f in tcp_accept_incoming_migration (opaque=<value optimized out>) at migration-tcp.c:165 #12 0x000000000040ba2f in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4430 #13 0x000000000042b52a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2164 #14 0x000000000040ef55 in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4640 #15 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6845 on qemu-kvm-0.12.1.2-2.200.el6 ,no coredump during migration. Hi, Yonit Could you view my comment #11 ? Does the results in -159 means I reproduce the issue successfully ? TIA, Mike (In reply to comment #12) > Hi, Yonit > > Could you view my comment #11 ? Does the results in -159 means I reproduce the > issue successfully ? Sorry, no. You reproduced bug 728984. > > TIA, > Mike Hi Yonit According to comment 10 and comment13,we both failed to reproduce this issue but reproduced bz728984 and bz729621 respectively.seems same scenario cause different bugs,would you please double check our steps are right?if our steps are right,can we repeat repeat comment11's steps 1000 times using script in fixed qemu-kvm version(qemu-kvm-0.12.1.2-2.200.el6).if we pass all migration iterations,can change this issue as verified? Best Regards, Junyi (In reply to comment #14) > Hi Yonit > > According to comment 10 and comment13,we both failed to reproduce this issue > but reproduced bz728984 and bz729621 respectively.seems same scenario cause > different bugs,would you please double check our steps are right?if our steps > are right,can we repeat repeat comment11's steps 1000 times using script in > fixed qemu-kvm version(qemu-kvm-0.12.1.2-2.200.el6).if we pass all migration > iterations,can change this issue as verified? > Yes, you can change it to verified if the script passes. > Best Regards, > Junyi try to verify this bug with qemu-kvm-0.12.1.2-2.200.el6.x86_64. Execute a script to repeat migration 1000 times. guest always core dump when playing video during the migrate, repeat to run 4 times the script and every time will get the same below bt file. this issue compare with bug 744518 via trace file, Always seems to reproduce this new bug 744518. (gdb) bt #0 0x00000032c2c32945 in raise () from /lib64/libc.so.6 #1 0x00000032c2c34125 in abort () from /lib64/libc.so.6 #2 0x0000003519831639 in handle_dev_update (listener=0x7f87b5851c00, events=<value optimized out>) at red_worker.c:9725 #3 handle_dev_input (listener=0x7f87b5851c00, events=<value optimized out>) at red_worker.c:9982 #4 0x00000035198305d5 in red_worker_main (arg=<value optimized out>) at red_worker.c:10304 #5 0x00000032c34077e1 in start_thread () from /lib64/libpthread.so.0 #6 0x00000032c2ce57bd in clone () from /lib64/libc.so.6 According to comment16,I will set this issue as verified and track bz744518.after bz744518 fixed,we will run more than 1000 times as well. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause qxl->ssd.running=true was set after telling the target spice server to start. Spice server thread can call qxl_send_events while qxl->ssd.running is still false. Consequence target qemu aborts on assert(d->ssd.running) Fix set qxl->ssd.running=true before telling spice to start Result target qemu don't crash Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1531.html |