874574 – VM terminates when changing display configuration during migration

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 874574 - VM terminates when changing display configuration during migration

Summary: VM terminates when changing display configuration during migration

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Yonit Halperin
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	871306 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-11-08 13:23 UTC by Tomas Jamrisko
Modified:	2013-02-21 07:44 UTC (History)
CC List:	17 users (show)
Fixed In Version:	qemu-kvm-0.12.1.2-2.340.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-02-21 07:44:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
*logs from hosts + the qxl_ trace** (117.95 KB, application/x-gzip) 2012-11-20 10:54 UTC, Tomas Jamrisko	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:0527	0	normal	SHIPPED_LIVE	qemu-kvm bug fix and enhancement update	2013-02-20 21:51:08 UTC

Description Tomas Jamrisko 2012-11-08 13:23:32 UTC

Description of problem:

VM crashes when guest display configuration changes in later stages of migration 

Version-Release number of selected component (if applicable):

qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64
spice-server-0.10.1-10.el6.x86_64
vdsm-4.9.6-36.0.el6_3.x86_64
virt-viewer-0.5.2-16.el6.x86_64


How reproducible:
100% (if you're fast enough)

Steps to Reproduce:
1. Start a windows 7 VM with 2 monitors
2. Connect to it using virt viewer
3. Disable one of the displays
4. Open Screen resolution in your VM
5. Disable you secondary monitor
5. Start migration
----you will have to repeat these steps repeatedly and as quickly as possible, because the crash occurs only in later stages of migration.
6. Click on the secondary display and click "Extend desktop to this display" 
7. Click revert changes
8. repeat until it crashes, or the VM gets migrated. 

  
Actual results:
The VM stops, completely

Expected results:
The VM should keep on working.

Comment 3 Gerd Hoffmann 2012-11-14 09:12:43 UTC

Any chance you can provide stack trace or core dump for this one?

Comment 4 Tomas Jamrisko 2012-11-14 10:40:51 UTC

It falls on the new host, and this is the trace:

[Switching to Thread 0x7fe9f90b8700 (LWP 1203)]
0x00007fea0740f8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007fea0740f8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fea07411085 in abort () at abort.c:92
#2  0x00007fea07c26b67 in validate_virt (info=<value optimized out>, virt=<value optimized out>, 
    slot_id=<value optimized out>, add_size=<value optimized out>, group_id=<value optimized out>)
    at red_memslots.c:86
#3  0x00007fea07c26c0c in get_virt (info=<value optimized out>, addr=<value optimized out>, 
    add_size=<value optimized out>, group_id=1) at red_memslots.c:125
#4  0x00007fea07c31d53 in dev_create_primary_surface (worker=0x7fe9f8ee06c0, 
    surface_id=<value optimized out>, surface=...) at red_worker.c:10577
#5  0x00007fea07c32133 in handle_dev_create_primary_surface_async (opaque=<value optimized out>, 
    payload=<value optimized out>) at red_worker.c:10778
#6  0x00007fea07c24af3 in dispatcher_handle_single_read (dispatcher=0x7fea0b1754f8)
    at dispatcher.c:120
#7  dispatcher_handle_recv_read (dispatcher=0x7fea0b1754f8) at dispatcher.c:143
#8  0x00007fea07c3eabc in red_worker_main (arg=<value optimized out>) at red_worker.c:11335
#9  0x00007fea093e2851 in start_thread (arg=0x7fe9f90b8700) at pthread_create.c:301
#10 0x00007fea074c511d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 5 Gerd Hoffmann 2012-11-14 12:19:31 UTC

Looks simliar to bug 870972.  I guess this is again spice-server processing stuff while worker->running == FALSE, this time a display configure request before qemu finished loading vmstate and reinitializing memslots.

Anything on stderr (i.e. /var/log/libvirt/qemu/$guest.log)?  IIRC on this assert memslot info is dumped to stderr before aborting.

Comment 6 Tomas Jamrisko 2012-11-14 13:54:21 UTC

Last few lines of the mentioned log: 

handle_dev_display_connect: connect
handle_new_display_channel: add display channel client
handle_new_display_channel: New display (client 0x7fea0ace3870) dcc 0x7fe9f00fd480 stream 0x7fea0b1b6390
handle_new_display_channel: jpeg disabled
handle_new_display_channel: zlib-over-glz disabled
listen_to_new_client_channel: NEW ID = 0
red_dispatcher_set_cursor_peer:
main_channel_handle_parsed: agent start
display_channel_client_wait_for_init: creating encoder with id == 1
display_channel_client_wait_for_init: creating encoder with id == 0
handle_dev_cursor_connect: cursor connect
red_connect_cursor: add cursor channel client
listen_to_new_client_channel: NEW ID = 1
handle_dev_set_mouse_mode: mouse mode 2
handle_dev_set_mouse_mode: mouse mode 2
display_channel_release_item: not pushed (101)
Domain id=29 is tainted: custom-monitor
id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0
id 2, group 1, virt start 7fe99b600000, virt end 7fe99f600000, generation 0, delta 7fe99b600000
validate_virt: panic: virtual address out of range
    virt=0x0+0x1d4c00 slot_id=1 group_id=1
    slot=0x0-0x0 delta=0x0
2012-11-14 10:35:21.618+0000: shutting down

Comment 7 Gerd Hoffmann 2012-11-14 16:03:40 UTC

> id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0
> id 2, group 1, virt start 7fe99b600000, virt end 7fe99f600000, generation 0,
> delta 7fe99b600000
> validate_virt: panic: virtual address out of range
>     virt=0x0+0x1d4c00 slot_id=1 group_id=1
>     slot=0x0-0x0 delta=0x0

Now that is really strange: memory slot #2 is present but #1 (referenced by the command) isn't.

A trace with all qxl_* tracepoints enabled could help giving a clue about what is going on.

Comment 8 Tomas Jamrisko 2012-11-20 10:54:30 UTC

Created attachment 648384 [details]
logs from hosts + the qxl_* trace

Adding logs from hosts on which the crash appeared (hyper04, and hyper06):
logs from 20121115 are when it crashed on hyper04 as source, the newer ones are when it crashed on hyper06 as source (search for panic). 

hyper06 folder also contains a file "vmout" which should contain bt of all qxl_* calls

Comment 9 Yonit Halperin 2012-11-26 16:37:18 UTC

Hi Tomas,

Does this crash happens only with active migration? According to the logs, the crash happens on the src side, and it happens before the vm has been stopped.
Can you try to reproduce it without migration?
Also, you use an old spice-server. The latest build is 0.12.0-4.el6

Comment 10 Yonit Halperin 2012-11-27 18:18:45 UTC

The bug is in qemu in hw/qxl.c : the active memslots are not being reloaded if the device is in UNDEFINED mode. qxl is in UNDEFINED mode after the primary surface is destroyed and before it is recreated (which happens during resolution changes, disabling monitors, etc.). 

If migration occurs when the qxl enters to UNDEFINED mode, the destination side doesn't reload the devram memslot (which is added by the driver only once) and the described crash occurs when spice-server tries to access the memslot.

Comment 12 Qunfang Zhang 2012-11-29 12:36:34 UTC

Seems reproduce this issue, I change the display configuration after migration finished, and then the VM terminates.

Package version:
qemu-kvm-0.12.1.2-2.337.el6.x86_64
spice-server-0.12.0-5.el6.x86_64
kernel-2.6.32-344.el6.x86_64
seabios-0.6.1.2-25.el6.x86_64

1. Boot guest with multiple monitors:

(gdb) r -boot menu=on -m 2G -smp 2,cores=2,sockets=1,threads=1 -M rhel6.4.0 -cpu SandyBridge -drive file=/home/win7-32-virtio.raw,format=raw,if=none,id=drive-ide0,cache=none,werror=stop,rerror=stop  -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-ide0,id=test0  -netdev tap,id=hostnet1,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet1,mac=00:12:1a:21:62:02,bus=pci.0,addr=0x4,id=virtio-net-pci1 -uuid ac64c74a-a8d5-4c24-9839-fcc491439493 -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection  -name win7-32  -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5  -chardev socket,id=charchannel0,path=/tmp/qzhang-1,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=5,chardev=charchannel0,id=channel0,name=port-1   -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel1,id=channel1,name=com.redhat.spice.0  -boot c -drive if=none,werror=stop,rerror=stop,media=cdrom,id=drive-cdrom -device ide-drive,drive=drive-cdrom,id=cdrom -spice port=5930,disable-ticketing -vga qxl -global qxl-vga.vram_size=33554432  -usb -device usb-tablet -monitor stdio -drive file=/usr/share/virtio-win/virtio-win-1.5.4.vfd,if=none,id=drive-fdc0-0-0,readonly=on,format=raw -global isa-fdc.driveA=drive-fdc0-0-0 -qmp tcp:0:5555,server,nowait -device qxl,id=video1,vram_size=67108864,bus=pci.0,addr=0x7 

2. Boot the guest with listening mode on the dst host.

3. Migrate guest to host B.

4. Change the display configuration, enabled the secondary display and save change.

Result:

(gdb) bt
#0  0x00007ffff57458a5 in raise () from /lib64/libc.so.6
#1  0x00007ffff5747085 in abort () from /lib64/libc.so.6
#2  0x00007ffff5f9ff75 in spice_logv (log_domain=0x7ffff6016f8e "Spice", 
    log_level=SPICE_LOG_LEVEL_CRITICAL, strloc=0x7ffff601a85a "red_memslots.c:94", 
    function=0x7ffff601a93f "validate_virt", 
    format=0x7ffff601a668 "virtual address out of range\n    virt=0x%lx+0x%x slot_id=%d group_id=%d\n    slot=0x%lx-0x%lx delta=0x%lx", args=0x7fff5bbfc890) at log.c:109
#3  0x00007ffff5fa00aa in spice_log (log_domain=<value optimized out>, log_level=<value optimized out>, 
    strloc=<value optimized out>, function=<value optimized out>, format=<value optimized out>)
    at log.c:123
#4  0x00007ffff5f603e3 in validate_virt (info=<value optimized out>, virt=0, slot_id=1, 
    add_size=1228800, group_id=1) at red_memslots.c:90
#5  0x00007ffff5f60533 in get_virt (info=<value optimized out>, addr=<value optimized out>, 
    add_size=<value optimized out>, group_id=1, error=0x7fff5bbfca7c) at red_memslots.c:142
#6  0x00007ffff5f6f6a7 in dev_create_primary_surface (worker=0x7fff500008c0, 
    surface_id=<value optimized out>, surface=...) at red_worker.c:10997
#7  0x00007ffff5f6fc53 in handle_dev_create_primary_surface_async (opaque=<value optimized out>, 
    payload=<value optimized out>) at red_worker.c:11241
#8  0x00007ffff5f5dca7 in dispatcher_handle_single_read (dispatcher=0x7ffff91a3608) at dispatcher.c:139
#9  dispatcher_handle_recv_read (dispatcher=0x7ffff91a3608) at dispatcher.c:162
#10 0x00007ffff5f7e8ee in red_worker_main (arg=<value optimized out>) at red_worker.c:11835
#11 0x00007ffff773c851 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff57fb90d in clone () from /lib64/libc.so.6

Comment 13 Yonit Halperin 2012-11-29 14:54:57 UTC

A patch was posted and acked upstream
 http://patchwork.ozlabs.org/patch/202472/

Comment 14 Sibiao Luo 2012-12-03 01:48:37 UTC

*** Bug 871306 has been marked as a duplicate of this bug. ***

Comment 19 Qunfang Zhang 2012-12-17 06:49:37 UTC

Reproduced on qemu-kvm-0.12.1.2-2.339.el6 and verified pass on qemu-kvm-0.12.1.2-2.346.el6 with the command line and steps in comment 12.

In the fixed version, after migration, change the display configuration and enabled the secondary display, guest will work well, no hang or crash issue.

Comment 21 errata-xmlrpc 2013-02-21 07:44:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0527.html

Note You need to log in before you can comment on or make changes to this bug.