Bug 874574

Summary: VM terminates when changing display configuration during migration
Product: Red Hat Enterprise Linux 6 Reporter: Tomas Jamrisko <tjamrisk>
Component: qemu-kvmAssignee: Yonit Halperin <yhalperi>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4CC: acathrow, areis, bsarathy, cfergeau, dblechte, dyasny, juzhang, kraxel, mazhang, minovotn, mkenneth, owasserm, quintela, qzhang, sluo, virt-maint, yhalperi
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.340.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:44:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs from hosts + the qxl_* trace none

Description Tomas Jamrisko 2012-11-08 13:23:32 UTC
Description of problem:

VM crashes when guest display configuration changes in later stages of migration 

Version-Release number of selected component (if applicable):

qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64
spice-server-0.10.1-10.el6.x86_64
vdsm-4.9.6-36.0.el6_3.x86_64
virt-viewer-0.5.2-16.el6.x86_64


How reproducible:
100% (if you're fast enough)

Steps to Reproduce:
1. Start a windows 7 VM with 2 monitors
2. Connect to it using virt viewer
3. Disable one of the displays
4. Open Screen resolution in your VM
5. Disable you secondary monitor
5. Start migration
----you will have to repeat these steps repeatedly and as quickly as possible, because the crash occurs only in later stages of migration.
6. Click on the secondary display and click "Extend desktop to this display" 
7. Click revert changes
8. repeat until it crashes, or the VM gets migrated. 

  
Actual results:
The VM stops, completely

Expected results:
The VM should keep on working.

Comment 3 Gerd Hoffmann 2012-11-14 09:12:43 UTC
Any chance you can provide stack trace or core dump for this one?

Comment 4 Tomas Jamrisko 2012-11-14 10:40:51 UTC
It falls on the new host, and this is the trace:

[Switching to Thread 0x7fe9f90b8700 (LWP 1203)]
0x00007fea0740f8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007fea0740f8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fea07411085 in abort () at abort.c:92
#2  0x00007fea07c26b67 in validate_virt (info=<value optimized out>, virt=<value optimized out>, 
    slot_id=<value optimized out>, add_size=<value optimized out>, group_id=<value optimized out>)
    at red_memslots.c:86
#3  0x00007fea07c26c0c in get_virt (info=<value optimized out>, addr=<value optimized out>, 
    add_size=<value optimized out>, group_id=1) at red_memslots.c:125
#4  0x00007fea07c31d53 in dev_create_primary_surface (worker=0x7fe9f8ee06c0, 
    surface_id=<value optimized out>, surface=...) at red_worker.c:10577
#5  0x00007fea07c32133 in handle_dev_create_primary_surface_async (opaque=<value optimized out>, 
    payload=<value optimized out>) at red_worker.c:10778
#6  0x00007fea07c24af3 in dispatcher_handle_single_read (dispatcher=0x7fea0b1754f8)
    at dispatcher.c:120
#7  dispatcher_handle_recv_read (dispatcher=0x7fea0b1754f8) at dispatcher.c:143
#8  0x00007fea07c3eabc in red_worker_main (arg=<value optimized out>) at red_worker.c:11335
#9  0x00007fea093e2851 in start_thread (arg=0x7fe9f90b8700) at pthread_create.c:301
#10 0x00007fea074c511d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 5 Gerd Hoffmann 2012-11-14 12:19:31 UTC
Looks simliar to bug 870972.  I guess this is again spice-server processing stuff while worker->running == FALSE, this time a display configure request before qemu finished loading vmstate and reinitializing memslots.

Anything on stderr (i.e. /var/log/libvirt/qemu/$guest.log)?  IIRC on this assert memslot info is dumped to stderr before aborting.

Comment 6 Tomas Jamrisko 2012-11-14 13:54:21 UTC
Last few lines of the mentioned log: 

handle_dev_display_connect: connect
handle_new_display_channel: add display channel client
handle_new_display_channel: New display (client 0x7fea0ace3870) dcc 0x7fe9f00fd480 stream 0x7fea0b1b6390
handle_new_display_channel: jpeg disabled
handle_new_display_channel: zlib-over-glz disabled
listen_to_new_client_channel: NEW ID = 0
red_dispatcher_set_cursor_peer:
main_channel_handle_parsed: agent start
display_channel_client_wait_for_init: creating encoder with id == 1
display_channel_client_wait_for_init: creating encoder with id == 0
handle_dev_cursor_connect: cursor connect
red_connect_cursor: add cursor channel client
listen_to_new_client_channel: NEW ID = 1
handle_dev_set_mouse_mode: mouse mode 2
handle_dev_set_mouse_mode: mouse mode 2
display_channel_release_item: not pushed (101)
Domain id=29 is tainted: custom-monitor
id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0
id 2, group 1, virt start 7fe99b600000, virt end 7fe99f600000, generation 0, delta 7fe99b600000
validate_virt: panic: virtual address out of range
    virt=0x0+0x1d4c00 slot_id=1 group_id=1
    slot=0x0-0x0 delta=0x0
2012-11-14 10:35:21.618+0000: shutting down

Comment 7 Gerd Hoffmann 2012-11-14 16:03:40 UTC
> id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0
> id 2, group 1, virt start 7fe99b600000, virt end 7fe99f600000, generation 0,
> delta 7fe99b600000
> validate_virt: panic: virtual address out of range
>     virt=0x0+0x1d4c00 slot_id=1 group_id=1
>     slot=0x0-0x0 delta=0x0

Now that is really strange: memory slot #2 is present but #1 (referenced by the command) isn't.

A trace with all qxl_* tracepoints enabled could help giving a clue about what is going on.

Comment 8 Tomas Jamrisko 2012-11-20 10:54:30 UTC
Created attachment 648384 [details]
logs from hosts + the qxl_* trace

Adding logs from hosts on which the crash appeared (hyper04, and hyper06):
logs from 20121115 are when it crashed on hyper04 as source, the newer ones are when it crashed on hyper06 as source (search for panic). 

hyper06 folder also contains a file "vmout" which should contain bt of all qxl_* calls

Comment 9 Yonit Halperin 2012-11-26 16:37:18 UTC
Hi Tomas,

Does this crash happens only with active migration? According to the logs, the crash happens on the src side, and it happens before the vm has been stopped.
Can you try to reproduce it without migration?
Also, you use an old spice-server. The latest build is 0.12.0-4.el6

Comment 10 Yonit Halperin 2012-11-27 18:18:45 UTC
The bug is in qemu in hw/qxl.c : the active memslots are not being reloaded if the device is in UNDEFINED mode. qxl is in UNDEFINED mode after the primary surface is destroyed and before it is recreated (which happens during resolution changes, disabling monitors, etc.). 

If migration occurs when the qxl enters to UNDEFINED mode, the destination side doesn't reload the devram memslot (which is added by the driver only once) and the described crash occurs when spice-server tries to access the memslot.

Comment 12 Qunfang Zhang 2012-11-29 12:36:34 UTC
Seems reproduce this issue, I change the display configuration after migration finished, and then the VM terminates.

Package version:
qemu-kvm-0.12.1.2-2.337.el6.x86_64
spice-server-0.12.0-5.el6.x86_64
kernel-2.6.32-344.el6.x86_64
seabios-0.6.1.2-25.el6.x86_64

1. Boot guest with multiple monitors:

(gdb) r -boot menu=on -m 2G -smp 2,cores=2,sockets=1,threads=1 -M rhel6.4.0 -cpu SandyBridge -drive file=/home/win7-32-virtio.raw,format=raw,if=none,id=drive-ide0,cache=none,werror=stop,rerror=stop  -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-ide0,id=test0  -netdev tap,id=hostnet1,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet1,mac=00:12:1a:21:62:02,bus=pci.0,addr=0x4,id=virtio-net-pci1 -uuid ac64c74a-a8d5-4c24-9839-fcc491439493 -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection  -name win7-32  -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5  -chardev socket,id=charchannel0,path=/tmp/qzhang-1,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=5,chardev=charchannel0,id=channel0,name=port-1   -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel1,id=channel1,name=com.redhat.spice.0  -boot c -drive if=none,werror=stop,rerror=stop,media=cdrom,id=drive-cdrom -device ide-drive,drive=drive-cdrom,id=cdrom -spice port=5930,disable-ticketing -vga qxl -global qxl-vga.vram_size=33554432  -usb -device usb-tablet -monitor stdio -drive file=/usr/share/virtio-win/virtio-win-1.5.4.vfd,if=none,id=drive-fdc0-0-0,readonly=on,format=raw -global isa-fdc.driveA=drive-fdc0-0-0 -qmp tcp:0:5555,server,nowait -device qxl,id=video1,vram_size=67108864,bus=pci.0,addr=0x7 

2. Boot the guest with listening mode on the dst host.

3. Migrate guest to host B.

4. Change the display configuration, enabled the secondary display and save change.

Result:

(gdb) bt
#0  0x00007ffff57458a5 in raise () from /lib64/libc.so.6
#1  0x00007ffff5747085 in abort () from /lib64/libc.so.6
#2  0x00007ffff5f9ff75 in spice_logv (log_domain=0x7ffff6016f8e "Spice", 
    log_level=SPICE_LOG_LEVEL_CRITICAL, strloc=0x7ffff601a85a "red_memslots.c:94", 
    function=0x7ffff601a93f "validate_virt", 
    format=0x7ffff601a668 "virtual address out of range\n    virt=0x%lx+0x%x slot_id=%d group_id=%d\n    slot=0x%lx-0x%lx delta=0x%lx", args=0x7fff5bbfc890) at log.c:109
#3  0x00007ffff5fa00aa in spice_log (log_domain=<value optimized out>, log_level=<value optimized out>, 
    strloc=<value optimized out>, function=<value optimized out>, format=<value optimized out>)
    at log.c:123
#4  0x00007ffff5f603e3 in validate_virt (info=<value optimized out>, virt=0, slot_id=1, 
    add_size=1228800, group_id=1) at red_memslots.c:90
#5  0x00007ffff5f60533 in get_virt (info=<value optimized out>, addr=<value optimized out>, 
    add_size=<value optimized out>, group_id=1, error=0x7fff5bbfca7c) at red_memslots.c:142
#6  0x00007ffff5f6f6a7 in dev_create_primary_surface (worker=0x7fff500008c0, 
    surface_id=<value optimized out>, surface=...) at red_worker.c:10997
#7  0x00007ffff5f6fc53 in handle_dev_create_primary_surface_async (opaque=<value optimized out>, 
    payload=<value optimized out>) at red_worker.c:11241
#8  0x00007ffff5f5dca7 in dispatcher_handle_single_read (dispatcher=0x7ffff91a3608) at dispatcher.c:139
#9  dispatcher_handle_recv_read (dispatcher=0x7ffff91a3608) at dispatcher.c:162
#10 0x00007ffff5f7e8ee in red_worker_main (arg=<value optimized out>) at red_worker.c:11835
#11 0x00007ffff773c851 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff57fb90d in clone () from /lib64/libc.so.6

Comment 13 Yonit Halperin 2012-11-29 14:54:57 UTC
A patch was posted and acked upstream
 http://patchwork.ozlabs.org/patch/202472/

Comment 14 Sibiao Luo 2012-12-03 01:48:37 UTC
*** Bug 871306 has been marked as a duplicate of this bug. ***

Comment 19 Qunfang Zhang 2012-12-17 06:49:37 UTC
Reproduced on qemu-kvm-0.12.1.2-2.339.el6 and verified pass on qemu-kvm-0.12.1.2-2.346.el6 with the command line and steps in comment 12.

In the fixed version, after migration, change the display configuration and enabled the secondary display, guest will work well, no hang or crash issue.

Comment 21 errata-xmlrpc 2013-02-21 07:44:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0527.html