Bug 1052856

Summary: boot vm -M rhel6.0.0 with qxl would cause qemu crash
Product: Red Hat Enterprise Linux 7 Reporter: xhan
Component: spiceAssignee: Default Assignee for SPICE Bugs <rh-spice-bugs>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.0CC: cfergeau, dblechte, hhuang, juzhang, marcandre.lureau, mazhang, michen, qzhang, rbalakri, shuang, tpelka, virt-maint, xhan, xuhan
Target Milestone: rcKeywords: OtherQA, Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: spice-0.12.4-8.el7 Doc Type: Bug Fix
Doc Text:
Previously, invalid drawing commands from guests using older computer types could cause QEMU to terminate unexpectedly. To fix this bug, detection of drawing commands of invalid bounding box has been introduced and they are now being rejected. As a result, QEMU no longer terminates in this situation.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 07:56:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1135372    
Bug Blocks:    
Attachments:
Description Flags
bt_full none

Description xhan 2014-01-14 08:55:33 UTC
Description of problem:

Boot vm with -M rhel6.0.0 and qxl, qemu would crash.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-1.5.3-34.el7.x86_64
kernel-3.10.0-67.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot vm with -M rhel6.0.0 and qxl
/usr/libexec/qemu-kvm \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -M rhel6.0.0  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140112-142311-ZB032Q8J,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20140112-142311-ZB032Q8J,path=/tmp/seabios-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140112-142311-ZB032Q8J,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
    -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=RHEL-Server-6.5-64-virtio.qcow2 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
    -device virtio-net-pci,mac=9a:68:69:6a:6b:6c,id=idI4O3KO,netdev=idJrP7Mj,bus=pci.0,addr=05  \
    -netdev tap,id=idJrP7Mj,vhost=on,vhostfd=26,fd=25  \
    -m 2048  \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2  \
    -cpu 'Opteron_G2',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0 \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off  \
    -no-kvm-pit-reinjection \
    -bios /usr/share/seabios/bios.bin \
    -enable-kvm


2. 
3.

Actual results:
qemu crashes.
(gdb) bt 
#0  0x00007f13a30a22b2 in __memcpy_sse2 () from /lib64/libc.so.6
#1  0x00007f13a84f805e in memcpy (__len=216, __src=0x7f1407fffba8, __dest=<optimized out>) at /usr/include/bits/string3.h:51
#2  qxl_blit (rect=0x7f13a9739458, qxl=0x7f13a9727ae0) at hw/display/qxl-render.c:51
#3  qxl_render_update_area_unlocked (qxl=qxl@entry=0x7f13a9727ae0) at hw/display/qxl-render.c:140
#4  0x00007f13a84f83d0 in qxl_render_update_area_bh (opaque=0x7f13a9727ae0) at hw/display/qxl-render.c:182
#5  0x00007f13a845df1a in aio_bh_poll (ctx=ctx@entry=0x7f13a9550530) at async.c:70
#6  0x00007f13a845dae8 in aio_poll (ctx=0x7f13a9550530, blocking=blocking@entry=false) at aio-posix.c:185
#7  0x00007f13a845de10 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at async.c:167
#8  0x00007f13a7899af6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#9  0x00007f13a8559c4a in glib_pollfds_poll () at main-loop.c:187
#10 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:232
#11 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:464
#12 0x00007f13a8459470 in main_loop () at vl.c:1984
#13 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4343


Expected results:
qemu should work normally.

Additional info:

Comment 3 Gerd Hoffmann 2014-01-15 15:31:46 UTC
trace:

qxl_io_write: qid=0x0 mode=0x7fdace4e61a6 addr=0x6 val=0x19 size=0x1 async=0x0
qxl_set_mode: qid=0x0 modenr=0x19 x_res=0x400 y_res=0x300 bits=0x20 devmem=0xf8000000

guests sets mode (old spice 0.4 way, which is what rhel6.0.0 aka qxl rev-1 supports).  1024x768

[ ... events snipped ... ]

qxl_spice_update_area: qid=0x0 surface_id=0x0 left=0x0 right=0x400 top=0x0 bottom=0x300
qxl_spice_update_area_rest: qid=0x0 num_dirty_rects=0x0 clear_dirty_region=0x1
qxl_interface_update_area_complete: qid=0x0 surface_id=0x0 dirty_left=0x135 dirty_right=0x2ca dirty_top=0x123 dirty_bottom=0x1dc
qxl_interface_update_area_complete_rest: qid=0x0 num_updated_rects=0x1
qxl_interface_update_area_complete_schedule_bh: qid=0x0 num_dirty=0x1
qxl_render_update_area_done: cookie=0x7fdacfce8800
qxl_render_blit: stride=0xfffffffffffff000 left=0x135 right=0x2ca top=0x123 bottom=0x1dc

one screen update cycle (probably requested by vnc server via update_hw)

qxl_spice_update_area: qid=0x0 surface_id=0x0 left=0x0 right=0x400 top=0x0 bottom=0x300
qxl_spice_update_area_rest: qid=0x0 num_dirty_rects=0x0 clear_dirty_region=0x1
qxl_interface_update_area_complete: qid=0x0 surface_id=0x0 dirty_left=0x135 dirty_right=0x2ca dirty_top=0x123 dirty_bottom=0x1dc
qxl_interface_update_area_complete_rest: qid=0x0 num_updated_rects=0x5
qxl_interface_update_area_complete_schedule_bh: qid=0x0 num_dirty=0x5
qxl_render_update_area_done: cookie=0x7fdacfce8800
qxl_render_blit: stride=0xfffffffffffff000 left=0x135 right=0x2ca top=0x123 bottom=0x1dc
qxl_render_blit: stride=0xfffffffffffff000 left=0x0 right=0x400 top=0x2fe bottom=0x300
qxl_render_blit: stride=0xfffffffffffff000 left=0x31e right=0x354 top=0x300 bottom=0x320

Next screen update cycle.  Third dirty rectangle returned by spice-server has out-of-bounds rectangle (bottom=0x320 > y_res=0x300).

Comment 4 Marc-Andre Lureau 2014-03-05 16:05:32 UTC
(In reply to Gerd Hoffmann from comment #3)
> qxl_spice_update_area: qid=0x0 surface_id=0x0 left=0x0 right=0x400 top=0x0
> bottom=0x300
> qxl_spice_update_area_rest: qid=0x0 num_dirty_rects=0x0
> clear_dirty_region=0x1
> qxl_interface_update_area_complete: qid=0x0 surface_id=0x0 dirty_left=0x135
> dirty_right=0x2ca dirty_top=0x123 dirty_bottom=0x1dc
> qxl_interface_update_area_complete_rest: qid=0x0 num_updated_rects=0x5
> qxl_interface_update_area_complete_schedule_bh: qid=0x0 num_dirty=0x5
> qxl_render_update_area_done: cookie=0x7fdacfce8800
> qxl_render_blit: stride=0xfffffffffffff000 left=0x135 right=0x2ca top=0x123
> bottom=0x1dc
> qxl_render_blit: stride=0xfffffffffffff000 left=0x0 right=0x400 top=0x2fe
> bottom=0x300
> qxl_render_blit: stride=0xfffffffffffff000 left=0x31e right=0x354 top=0x300
> bottom=0x320
> 
> Next screen update cycle.  Third dirty rectangle returned by spice-server
> has out-of-bounds rectangle (bottom=0x320 > y_res=0x300).

where did you get the y_res from? did you manage to reproduce it?

Comment 5 Marc-Andre Lureau 2014-03-05 16:09:19 UTC
I am using a slightly modified command line, and I can't reproduce:
qemu-kvm-1.5.3-47.el7.x86_64
spice-server-0.12.4-5.el7.x86_64

Can you reproduce with the following command line?
thanks

/usr/libexec/qemu-kvm \
    -snapshot \
    -name 'virt-tests-vm1'\
    -sandbox off \
    -M rhel6.0.0 \
    -nodefaults \
    -vga qxl \
    -global qxl-vga.vram_size=33554432 \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140112-142311-ZB032Q8J,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-serial,chardev=serial_id_serial0 \
    -chardev socket,id=seabioslog_id_20140112-142311-ZB032Q8J,path=/tmp/seabios-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140112-142311-ZB032Q8J,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
    -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=/var/lib/libvirt/images/rhel6 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
    -m 2048 \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2 \
    -cpu 'Opteron_G2',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -rtc base=utc,clock=host,driftfix=slew \
    -boot order=cdn,once=c,menu=off \
    -no-kvm-pit-reinjection \
    -bios /usr/share/seabios/bios.bin \
    -enable-kvm

Comment 6 xhan 2014-03-20 09:24:21 UTC
use the command in #c5, also hit this crash on host:

qemu-kvm-rhev-1.5.3-52.el7.x86_64
kernel-3.10.0-107.el7.x86_64



(gdb) bt
#0  0x00007f2fe5f28ef9 in __memcpy_ssse3_back ()
   from /lib64/libc.so.6
#1  0x00007f2feb2b6c3e in qxl_render_update_area_unlocked ()
#2  0x00007f2feb2b6f30 in qxl_render_update_area_bh ()
#3  0x00007f2feb22fae7 in aio_bh_poll ()
#4  0x00007f2feb22f738 in aio_poll ()
#5  0x00007f2feb22f9f0 in aio_ctx_dispatch ()
#6  0x00007f2fea66bac6 in g_main_context_dispatch ()
   from /lib64/libglib-2.0.so.0
#7  0x00007f2feb307d1a in main_loop_wait ()
#8  0x00007f2feb22b460 in main ()

To reproduce this problem, it need wait for around 10 minutes after launching guest using qemu-kvm command line, then input some command to view if it is running, such as "info status". 

I suggest using -S and -monitor stdio \ in the command line to monitor the vm status.

Comment 7 Christophe Fergeau 2014-03-24 13:22:27 UTC
I haven't been able to reproduce this either, though X would fail to start with -M rhel-6.0.0 on the f20 livecd I tried.

Couple of questions:
- which guest OS are you testing with?
- do you connect a client to the VM after starting it, or does it happen even without a client connection?
- in comment #6, you mention using -monitor stdio and typing 'info status' in order to reproduce, is it required to type 'info status' in the QEMU monitor to trigger the crash?
- in comment #6 you suggest using -S, I'm not sure what is the next step that should be followed to reproduce the bug after starting qemu with -S?

Could you also try reducing the command line size?
/usr/libexec/qemu-kvm \
    -snapshot \
    -name 'virt-tests-vm1'\
    -sandbox off \
    -M rhel6.0.0 \
    -nodefaults \
    -vga qxl \
    -global qxl-vga.vram_size=33554432 \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140112-142311-ZB032Q8J,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-serial,chardev=serial_id_serial0 \
    -chardev socket,id=seabioslog_id_20140112-142311-ZB032Q8J,path=/tmp/seabios-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140112-142311-ZB032Q8J,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
    -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=/var/lib/libvirt/images/rhel6 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
    -m 2048 \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2 \
    -cpu 'Opteron_G2',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -rtc base=utc,clock=host,driftfix=slew \
    -boot order=cdn,once=c,menu=off \
    -no-kvm-pit-reinjection \
    -bios /usr/share/seabios/bios.bin \
    -enable-kvm

eg, is the bug still happening if you remove -sandbox off ? if you only keep something like
/usr/libexec/qemu-kvm \
    -snapshot \
    -sandbox off \
    -M rhel6.0.0 \
    -nodefaults \
    -vga qxl \
    -global qxl-vga.vram_size=33554432 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
    -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=/var/lib/libvirt/images/rhel6 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
    -m 2048 
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -boot order=cdn,once=c,menu=off \
    -enable-kvm
 (assuming qemu starts at all)? If it no longer happens with this command line, can you find out which option is required? If the bug is still reproducible this way, can you try removing more things? (-global qxl-vga.vram_size=33554432 -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 , ...) You could also try to replace     -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=/var/lib/libvirt/images/rhel6 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
with -drive file=/var/lib/libvirt/images/rhel6

In short, the shorter you can make the qemu line needed to reproduce, the better ;)

Comment 9 Christophe Fergeau 2014-03-25 13:42:13 UTC
Fwiw, the shorter qemu commandline I could use to reproduce is 
/usr/libexec/qemu-kvm \
    -M rhel6.0.0 \
    -nodefaults \
    -vga qxl \
    -drive file=rhel6 \
    -vnc :0 \
    -m 2048 \
    -enable-kvm
(haven't checked if I could get rid of -nodefault or -enable-kvm fwiw). The -m 2048 seems required to get the crash as I could not reproduce without it or with -m 512.

Comment 10 Marc-Andre Lureau 2014-03-25 14:49:12 UTC
and it doesn't crash with -M rhel6.1.0

Comment 11 xhan 2014-03-26 09:04:39 UTC
The -M is crucial argument for the crash. So the problem would be why with -M rhel6.0.0 would cause crash.

Comment 12 RHEL Program Management 2014-04-03 05:48:29 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 15 Marc-Andre Lureau 2014-07-10 22:41:11 UTC
Although I have not identified the root cause yet, this is due to qxl device being set to revision 1, and thus usage of COMPAT flag, altough xorg-qxl guest driver does not support it since 0.1.1-9 in rhel6. I am not sure this regression was intentional (bug 1078390)

What is the version of qxl driver in your guest?


It could be that guest compat structure is different (layout), since xorg driver doesn't use the definition from spice-protocol.

It could also be that the qemu ring isn't flushed, and some qemu QXLDrawable are wrongly cast to QXLCompatDrawable.

Qemu should probably do surface bound checking before doing qxl_blit()

still investigating this

Comment 16 Xu Han 2014-07-11 08:30:18 UTC
(In reply to Marc-Andre Lureau from comment #15)
> Although I have not identified the root cause yet, this is due to qxl device
> being set to revision 1, and thus usage of COMPAT flag, altough xorg-qxl
> guest driver does not support it since 0.1.1-9 in rhel6. I am not sure this
> regression was intentional (bug 1078390)
> 
> What is the version of qxl driver in your guest?

I checked this issue with two guest(rhel 6.5 and rhel 7.0).

Well, booting up rhel 6.5 guest with following command line then got the segfault described in above comments:
/usr/libexec/qemu-kvm \
    -M rhel6.0.0 \
    -nodefaults \
    -vga qxl \
    -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=/home/RHEL-Server-6.5-64-virtio.qcow2 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
    -vnc :0 \
    -m 2048 \
    -enable-kvm \
    -monitor stdio

However, it seems going well by using spice protocol.

And the rhel 7 guest will hit Bug 1043851, no matter which protocol being used.

xorg-qxl version in each guest:
    rhel 6.5 - xorg-x11-drv-qxl-0.1.0-7.el6.x86_64
    rhel 7.0 - xorg-x11-drv-qxl-0.1.1-9.el7.x86_64

Comment 17 Marc-Andre Lureau 2014-07-11 15:15:17 UTC
(In reply to Xu Han from comment #16)
> (In reply to Marc-Andre Lureau from comment #15)
> > Although I have not identified the root cause yet, this is due to qxl device
> > being set to revision 1, and thus usage of COMPAT flag, altough xorg-qxl
> > guest driver does not support it since 0.1.1-9 in rhel6. I am not sure this
> > regression was intentional (bug 1078390)
> > 
> > What is the version of qxl driver in your guest?

>     rhel 6.5 - xorg-x11-drv-qxl-0.1.0-7.el6.x86_64

that version should support qxlpci version 1

>     rhel 7.0 - xorg-x11-drv-qxl-0.1.1-9.el7.x86_64

that version no longer supports qxlpci version 1.

Ie we have the same bug with rhel 6.6 due to rebase in bug 1078390

Comment 18 Marc-Andre Lureau 2014-07-11 15:35:42 UTC
I can't reproduce the crash with 6.6 and xorg-x11-drv-qxl-0.1.0-7.el6.x86_64, but the display doesn't work either (it resizes 3 times with a gray gdm rectangle)

Comment 19 Marc-Andre Lureau 2014-07-18 14:38:28 UTC
* With xorg-x11-drv-qxl-0.1.0-7

the gray gdm screen comes from X crashing with gdm. (see /var/log/gdm/:0.log it is missing fbCopyRegion) even though starting Xorg manually works, the symbol no longer exists, as can be seen in compilation warnings too.

also interestingly, I haven't been able to reproduce the crash that easily lately, only <5%...

I think the compat code has been long unmaintained and untested, we should declare it officially deprecated.

* With xorg-x11-drv-qxl-0.1.1-12 (from rebase 1078390)

Black screen, no crash or X exit.

Is there really any interest in maintaining the rhel6.0 machine type with spice?

Comment 20 juzhang 2014-07-21 01:35:24 UTC
Hi Xu,

Can you have a look comment18 and comment19?

Best Regards,
Junyi

Comment 22 David Blechter 2014-07-22 17:25:20 UTC
closing as WONTFIX, as no customers have reported using rhel 6.0 guest.

Comment 23 Marc-Andre Lureau 2014-09-01 16:16:07 UTC
reopening, as there is a similar bug 1135372 in rhel6 and this is potentially a security issue

Comment 30 errata-xmlrpc 2015-03-05 07:56:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0335.html