Bug 874574
Summary: | VM terminates when changing display configuration during migration | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Tomas Jamrisko <tjamrisk> | ||||
Component: | qemu-kvm | Assignee: | Yonit Halperin <yhalperi> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.4 | CC: | acathrow, areis, bsarathy, cfergeau, dblechte, dyasny, juzhang, kraxel, mazhang, minovotn, mkenneth, owasserm, quintela, qzhang, sluo, virt-maint, yhalperi | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-0.12.1.2-2.340.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-02-21 07:44:16 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Tomas Jamrisko
2012-11-08 13:23:32 UTC
Any chance you can provide stack trace or core dump for this one? It falls on the new host, and this is the trace: [Switching to Thread 0x7fe9f90b8700 (LWP 1203)] 0x00007fea0740f8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007fea0740f8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007fea07411085 in abort () at abort.c:92 #2 0x00007fea07c26b67 in validate_virt (info=<value optimized out>, virt=<value optimized out>, slot_id=<value optimized out>, add_size=<value optimized out>, group_id=<value optimized out>) at red_memslots.c:86 #3 0x00007fea07c26c0c in get_virt (info=<value optimized out>, addr=<value optimized out>, add_size=<value optimized out>, group_id=1) at red_memslots.c:125 #4 0x00007fea07c31d53 in dev_create_primary_surface (worker=0x7fe9f8ee06c0, surface_id=<value optimized out>, surface=...) at red_worker.c:10577 #5 0x00007fea07c32133 in handle_dev_create_primary_surface_async (opaque=<value optimized out>, payload=<value optimized out>) at red_worker.c:10778 #6 0x00007fea07c24af3 in dispatcher_handle_single_read (dispatcher=0x7fea0b1754f8) at dispatcher.c:120 #7 dispatcher_handle_recv_read (dispatcher=0x7fea0b1754f8) at dispatcher.c:143 #8 0x00007fea07c3eabc in red_worker_main (arg=<value optimized out>) at red_worker.c:11335 #9 0x00007fea093e2851 in start_thread (arg=0x7fe9f90b8700) at pthread_create.c:301 #10 0x00007fea074c511d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Looks simliar to bug 870972. I guess this is again spice-server processing stuff while worker->running == FALSE, this time a display configure request before qemu finished loading vmstate and reinitializing memslots. Anything on stderr (i.e. /var/log/libvirt/qemu/$guest.log)? IIRC on this assert memslot info is dumped to stderr before aborting. Last few lines of the mentioned log: handle_dev_display_connect: connect handle_new_display_channel: add display channel client handle_new_display_channel: New display (client 0x7fea0ace3870) dcc 0x7fe9f00fd480 stream 0x7fea0b1b6390 handle_new_display_channel: jpeg disabled handle_new_display_channel: zlib-over-glz disabled listen_to_new_client_channel: NEW ID = 0 red_dispatcher_set_cursor_peer: main_channel_handle_parsed: agent start display_channel_client_wait_for_init: creating encoder with id == 1 display_channel_client_wait_for_init: creating encoder with id == 0 handle_dev_cursor_connect: cursor connect red_connect_cursor: add cursor channel client listen_to_new_client_channel: NEW ID = 1 handle_dev_set_mouse_mode: mouse mode 2 handle_dev_set_mouse_mode: mouse mode 2 display_channel_release_item: not pushed (101) Domain id=29 is tainted: custom-monitor id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0 id 2, group 1, virt start 7fe99b600000, virt end 7fe99f600000, generation 0, delta 7fe99b600000 validate_virt: panic: virtual address out of range virt=0x0+0x1d4c00 slot_id=1 group_id=1 slot=0x0-0x0 delta=0x0 2012-11-14 10:35:21.618+0000: shutting down > id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0
> id 2, group 1, virt start 7fe99b600000, virt end 7fe99f600000, generation 0,
> delta 7fe99b600000
> validate_virt: panic: virtual address out of range
> virt=0x0+0x1d4c00 slot_id=1 group_id=1
> slot=0x0-0x0 delta=0x0
Now that is really strange: memory slot #2 is present but #1 (referenced by the command) isn't.
A trace with all qxl_* tracepoints enabled could help giving a clue about what is going on.
Created attachment 648384 [details]
logs from hosts + the qxl_* trace
Adding logs from hosts on which the crash appeared (hyper04, and hyper06):
logs from 20121115 are when it crashed on hyper04 as source, the newer ones are when it crashed on hyper06 as source (search for panic).
hyper06 folder also contains a file "vmout" which should contain bt of all qxl_* calls
Hi Tomas, Does this crash happens only with active migration? According to the logs, the crash happens on the src side, and it happens before the vm has been stopped. Can you try to reproduce it without migration? Also, you use an old spice-server. The latest build is 0.12.0-4.el6 The bug is in qemu in hw/qxl.c : the active memslots are not being reloaded if the device is in UNDEFINED mode. qxl is in UNDEFINED mode after the primary surface is destroyed and before it is recreated (which happens during resolution changes, disabling monitors, etc.). If migration occurs when the qxl enters to UNDEFINED mode, the destination side doesn't reload the devram memslot (which is added by the driver only once) and the described crash occurs when spice-server tries to access the memslot. Seems reproduce this issue, I change the display configuration after migration finished, and then the VM terminates. Package version: qemu-kvm-0.12.1.2-2.337.el6.x86_64 spice-server-0.12.0-5.el6.x86_64 kernel-2.6.32-344.el6.x86_64 seabios-0.6.1.2-25.el6.x86_64 1. Boot guest with multiple monitors: (gdb) r -boot menu=on -m 2G -smp 2,cores=2,sockets=1,threads=1 -M rhel6.4.0 -cpu SandyBridge -drive file=/home/win7-32-virtio.raw,format=raw,if=none,id=drive-ide0,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-ide0,id=test0 -netdev tap,id=hostnet1,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet1,mac=00:12:1a:21:62:02,bus=pci.0,addr=0x4,id=virtio-net-pci1 -uuid ac64c74a-a8d5-4c24-9839-fcc491439493 -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -name win7-32 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -chardev socket,id=charchannel0,path=/tmp/qzhang-1,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=5,chardev=charchannel0,id=channel0,name=port-1 -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -boot c -drive if=none,werror=stop,rerror=stop,media=cdrom,id=drive-cdrom -device ide-drive,drive=drive-cdrom,id=cdrom -spice port=5930,disable-ticketing -vga qxl -global qxl-vga.vram_size=33554432 -usb -device usb-tablet -monitor stdio -drive file=/usr/share/virtio-win/virtio-win-1.5.4.vfd,if=none,id=drive-fdc0-0-0,readonly=on,format=raw -global isa-fdc.driveA=drive-fdc0-0-0 -qmp tcp:0:5555,server,nowait -device qxl,id=video1,vram_size=67108864,bus=pci.0,addr=0x7 2. Boot the guest with listening mode on the dst host. 3. Migrate guest to host B. 4. Change the display configuration, enabled the secondary display and save change. Result: (gdb) bt #0 0x00007ffff57458a5 in raise () from /lib64/libc.so.6 #1 0x00007ffff5747085 in abort () from /lib64/libc.so.6 #2 0x00007ffff5f9ff75 in spice_logv (log_domain=0x7ffff6016f8e "Spice", log_level=SPICE_LOG_LEVEL_CRITICAL, strloc=0x7ffff601a85a "red_memslots.c:94", function=0x7ffff601a93f "validate_virt", format=0x7ffff601a668 "virtual address out of range\n virt=0x%lx+0x%x slot_id=%d group_id=%d\n slot=0x%lx-0x%lx delta=0x%lx", args=0x7fff5bbfc890) at log.c:109 #3 0x00007ffff5fa00aa in spice_log (log_domain=<value optimized out>, log_level=<value optimized out>, strloc=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at log.c:123 #4 0x00007ffff5f603e3 in validate_virt (info=<value optimized out>, virt=0, slot_id=1, add_size=1228800, group_id=1) at red_memslots.c:90 #5 0x00007ffff5f60533 in get_virt (info=<value optimized out>, addr=<value optimized out>, add_size=<value optimized out>, group_id=1, error=0x7fff5bbfca7c) at red_memslots.c:142 #6 0x00007ffff5f6f6a7 in dev_create_primary_surface (worker=0x7fff500008c0, surface_id=<value optimized out>, surface=...) at red_worker.c:10997 #7 0x00007ffff5f6fc53 in handle_dev_create_primary_surface_async (opaque=<value optimized out>, payload=<value optimized out>) at red_worker.c:11241 #8 0x00007ffff5f5dca7 in dispatcher_handle_single_read (dispatcher=0x7ffff91a3608) at dispatcher.c:139 #9 dispatcher_handle_recv_read (dispatcher=0x7ffff91a3608) at dispatcher.c:162 #10 0x00007ffff5f7e8ee in red_worker_main (arg=<value optimized out>) at red_worker.c:11835 #11 0x00007ffff773c851 in start_thread () from /lib64/libpthread.so.0 #12 0x00007ffff57fb90d in clone () from /lib64/libc.so.6 A patch was posted and acked upstream http://patchwork.ozlabs.org/patch/202472/ *** Bug 871306 has been marked as a duplicate of this bug. *** Reproduced on qemu-kvm-0.12.1.2-2.339.el6 and verified pass on qemu-kvm-0.12.1.2-2.346.el6 with the command line and steps in comment 12. In the fixed version, after migration, change the display configuration and enabled the secondary display, guest will work well, no hang or crash issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0527.html |