Bug 1052856

Summary:

boot vm -M rhel6.0.0 with qxl would cause qemu crash

Product:

Red Hat Enterprise Linux 7

Reporter:

xhan

Component:

spice

Assignee:

Default Assignee for SPICE Bugs <rh-spice-bugs>

Status:

CLOSED ERRATA

QA Contact:

Desktop QE <desktop-qa-list>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

7.0

CC:

cfergeau, dblechte, hhuang, juzhang, marcandre.lureau, mazhang, michen, qzhang, rbalakri, shuang, tpelka, virt-maint, xhan, xuhan

Target Milestone:

Keywords:

OtherQA, Reopened

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

spice-0.12.4-8.el7

Doc Type:

Bug Fix

Doc Text:

Previously, invalid drawing commands from guests using older computer types could cause QEMU to terminate unexpectedly. To fix this bug, detection of drawing commands of invalid bounding box has been introduced and they are now being rejected. As a result, QEMU no longer terminates in this situation.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-03-05 07:56:06 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1135372

Bug Blocks:

Attachments:

Description	Flags
bt_full	none

Description xhan 2014-01-14 08:55:33 UTC

Description of problem:

Boot vm with -M rhel6.0.0 and qxl, qemu would crash.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-1.5.3-34.el7.x86_64
kernel-3.10.0-67.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot vm with -M rhel6.0.0 and qxl
/usr/libexec/qemu-kvm \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -M rhel6.0.0  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140112-142311-ZB032Q8J,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20140112-142311-ZB032Q8J,path=/tmp/seabios-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140112-142311-ZB032Q8J,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
    -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=RHEL-Server-6.5-64-virtio.qcow2 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
    -device virtio-net-pci,mac=9a:68:69:6a:6b:6c,id=idI4O3KO,netdev=idJrP7Mj,bus=pci.0,addr=05  \
    -netdev tap,id=idJrP7Mj,vhost=on,vhostfd=26,fd=25  \
    -m 2048  \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2  \
    -cpu 'Opteron_G2',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0 \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off  \
    -no-kvm-pit-reinjection \
    -bios /usr/share/seabios/bios.bin \
    -enable-kvm


2. 
3.

Actual results:
qemu crashes.
(gdb) bt 
#0  0x00007f13a30a22b2 in __memcpy_sse2 () from /lib64/libc.so.6
#1  0x00007f13a84f805e in memcpy (__len=216, __src=0x7f1407fffba8, __dest=<optimized out>) at /usr/include/bits/string3.h:51
#2  qxl_blit (rect=0x7f13a9739458, qxl=0x7f13a9727ae0) at hw/display/qxl-render.c:51
#3  qxl_render_update_area_unlocked (qxl=qxl@entry=0x7f13a9727ae0) at hw/display/qxl-render.c:140
#4  0x00007f13a84f83d0 in qxl_render_update_area_bh (opaque=0x7f13a9727ae0) at hw/display/qxl-render.c:182
#5  0x00007f13a845df1a in aio_bh_poll (ctx=ctx@entry=0x7f13a9550530) at async.c:70
#6  0x00007f13a845dae8 in aio_poll (ctx=0x7f13a9550530, blocking=blocking@entry=false) at aio-posix.c:185
#7  0x00007f13a845de10 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at async.c:167
#8  0x00007f13a7899af6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#9  0x00007f13a8559c4a in glib_pollfds_poll () at main-loop.c:187
#10 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:232
#11 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:464
#12 0x00007f13a8459470 in main_loop () at vl.c:1984
#13 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4343


Expected results:
qemu should work normally.

Additional info:

Comment 2 xhan 2014-01-14 09:15:57 UTC

Created attachment 849819 [details]
bt_full

core file:
http://fileshare.englab.nay.redhat.com/pub/section2/coredump/var/crash/bz1052856.tar

Comment 3 Gerd Hoffmann 2014-01-15 15:31:46 UTC

trace:

qxl_io_write: qid=0x0 mode=0x7fdace4e61a6 addr=0x6 val=0x19 size=0x1 async=0x0
qxl_set_mode: qid=0x0 modenr=0x19 x_res=0x400 y_res=0x300 bits=0x20 devmem=0xf8000000

guests sets mode (old spice 0.4 way, which is what rhel6.0.0 aka qxl rev-1 supports).  1024x768

[ ... events snipped ... ]

qxl_spice_update_area: qid=0x0 surface_id=0x0 left=0x0 right=0x400 top=0x0 bottom=0x300
qxl_spice_update_area_rest: qid=0x0 num_dirty_rects=0x0 clear_dirty_region=0x1
qxl_interface_update_area_complete: qid=0x0 surface_id=0x0 dirty_left=0x135 dirty_right=0x2ca dirty_top=0x123 dirty_bottom=0x1dc
qxl_interface_update_area_complete_rest: qid=0x0 num_updated_rects=0x1
qxl_interface_update_area_complete_schedule_bh: qid=0x0 num_dirty=0x1
qxl_render_update_area_done: cookie=0x7fdacfce8800
qxl_render_blit: stride=0xfffffffffffff000 left=0x135 right=0x2ca top=0x123 bottom=0x1dc

one screen update cycle (probably requested by vnc server via update_hw)

qxl_spice_update_area: qid=0x0 surface_id=0x0 left=0x0 right=0x400 top=0x0 bottom=0x300
qxl_spice_update_area_rest: qid=0x0 num_dirty_rects=0x0 clear_dirty_region=0x1
qxl_interface_update_area_complete: qid=0x0 surface_id=0x0 dirty_left=0x135 dirty_right=0x2ca dirty_top=0x123 dirty_bottom=0x1dc
qxl_interface_update_area_complete_rest: qid=0x0 num_updated_rects=0x5
qxl_interface_update_area_complete_schedule_bh: qid=0x0 num_dirty=0x5
qxl_render_update_area_done: cookie=0x7fdacfce8800
qxl_render_blit: stride=0xfffffffffffff000 left=0x135 right=0x2ca top=0x123 bottom=0x1dc
qxl_render_blit: stride=0xfffffffffffff000 left=0x0 right=0x400 top=0x2fe bottom=0x300
qxl_render_blit: stride=0xfffffffffffff000 left=0x31e right=0x354 top=0x300 bottom=0x320

Next screen update cycle.  Third dirty rectangle returned by spice-server has out-of-bounds rectangle (bottom=0x320 > y_res=0x300).

Comment 4 Marc-Andre Lureau 2014-03-05 16:05:32 UTC

(In reply to Gerd Hoffmann from comment #3)
> qxl_spice_update_area: qid=0x0 surface_id=0x0 left=0x0 right=0x400 top=0x0
> bottom=0x300
> qxl_spice_update_area_rest: qid=0x0 num_dirty_rects=0x0
> clear_dirty_region=0x1
> qxl_interface_update_area_complete: qid=0x0 surface_id=0x0 dirty_left=0x135
> dirty_right=0x2ca dirty_top=0x123 dirty_bottom=0x1dc
> qxl_interface_update_area_complete_rest: qid=0x0 num_updated_rects=0x5
> qxl_interface_update_area_complete_schedule_bh: qid=0x0 num_dirty=0x5
> qxl_render_update_area_done: cookie=0x7fdacfce8800
> qxl_render_blit: stride=0xfffffffffffff000 left=0x135 right=0x2ca top=0x123
> bottom=0x1dc
> qxl_render_blit: stride=0xfffffffffffff000 left=0x0 right=0x400 top=0x2fe
> bottom=0x300
> qxl_render_blit: stride=0xfffffffffffff000 left=0x31e right=0x354 top=0x300
> bottom=0x320
> 
> Next screen update cycle.  Third dirty rectangle returned by spice-server
> has out-of-bounds rectangle (bottom=0x320 > y_res=0x300).

where did you get the y_res from? did you manage to reproduce it?

Comment 5 Marc-Andre Lureau 2014-03-05 16:09:19 UTC

I am using a slightly modified command line, and I can't reproduce:
qemu-kvm-1.5.3-47.el7.x86_64
spice-server-0.12.4-5.el7.x86_64

Can you reproduce with the following command line?
thanks

/usr/libexec/qemu-kvm \
    -snapshot \
    -name 'virt-tests-vm1'\
    -sandbox off \
    -M rhel6.0.0 \
    -nodefaults \
    -vga qxl \
    -global qxl-vga.vram_size=33554432 \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140112-142311-ZB032Q8J,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-serial,chardev=serial_id_serial0 \
    -chardev socket,id=seabioslog_id_20140112-142311-ZB032Q8J,path=/tmp/seabios-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140112-142311-ZB032Q8J,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
    -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=/var/lib/libvirt/images/rhel6 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
    -m 2048 \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2 \
    -cpu 'Opteron_G2',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -rtc base=utc,clock=host,driftfix=slew \
    -boot order=cdn,once=c,menu=off \
    -no-kvm-pit-reinjection \
    -bios /usr/share/seabios/bios.bin \
    -enable-kvm

Comment 6 xhan 2014-03-20 09:24:21 UTC

use the command in #c5, also hit this crash on host:

qemu-kvm-rhev-1.5.3-52.el7.x86_64
kernel-3.10.0-107.el7.x86_64



(gdb) bt
#0  0x00007f2fe5f28ef9 in __memcpy_ssse3_back ()
   from /lib64/libc.so.6
#1  0x00007f2feb2b6c3e in qxl_render_update_area_unlocked ()
#2  0x00007f2feb2b6f30 in qxl_render_update_area_bh ()
#3  0x00007f2feb22fae7 in aio_bh_poll ()
#4  0x00007f2feb22f738 in aio_poll ()
#5  0x00007f2feb22f9f0 in aio_ctx_dispatch ()
#6  0x00007f2fea66bac6 in g_main_context_dispatch ()
   from /lib64/libglib-2.0.so.0
#7  0x00007f2feb307d1a in main_loop_wait ()
#8  0x00007f2feb22b460 in main ()

To reproduce this problem, it need wait for around 10 minutes after launching guest using qemu-kvm command line, then input some command to view if it is running, such as "info status". 

I suggest using -S and -monitor stdio \ in the command line to monitor the vm status.

Comment 7 Christophe Fergeau 2014-03-24 13:22:27 UTC

I haven't been able to reproduce this either, though X would fail to start with -M rhel-6.0.0 on the f20 livecd I tried.

Couple of questions:
- which guest OS are you testing with?
- do you connect a client to the VM after starting it, or does it happen even without a client connection?
- in comment #6, you mention using -monitor stdio and typing 'info status' in order to reproduce, is it required to type 'info status' in the QEMU monitor to trigger the crash?
- in comment #6 you suggest using -S, I'm not sure what is the next step that should be followed to reproduce the bug after starting qemu with -S?

Could you also try reducing the command line size?
/usr/libexec/qemu-kvm \
    -snapshot \
    -name 'virt-tests-vm1'\
    -sandbox off \
    -M rhel6.0.0 \
    -nodefaults \
    -vga qxl \
    -global qxl-vga.vram_size=33554432 \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140112-142311-ZB032Q8J,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-serial,chardev=serial_id_serial0 \
    -chardev socket,id=seabioslog_id_20140112-142311-ZB032Q8J,path=/tmp/seabios-20140112-142311-ZB032Q8J,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20140112-142311-ZB032Q8J,iobase=0x402 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
    -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=/var/lib/libvirt/images/rhel6 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
    -m 2048 \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2 \
    -cpu 'Opteron_G2',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -rtc base=utc,clock=host,driftfix=slew \
    -boot order=cdn,once=c,menu=off \
    -no-kvm-pit-reinjection \
    -bios /usr/share/seabios/bios.bin \
    -enable-kvm

eg, is the bug still happening if you remove -sandbox off ? if you only keep something like
/usr/libexec/qemu-kvm \
    -snapshot \
    -sandbox off \
    -M rhel6.0.0 \
    -nodefaults \
    -vga qxl \
    -global qxl-vga.vram_size=33554432 \
    -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
    -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=/var/lib/libvirt/images/rhel6 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
    -m 2048 
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -vnc :0 \
    -boot order=cdn,once=c,menu=off \
    -enable-kvm
 (assuming qemu starts at all)? If it no longer happens with this command line, can you find out which option is required? If the bug is still reproducible this way, can you try removing more things? (-global qxl-vga.vram_size=33554432 -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 , ...) You could also try to replace     -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=/var/lib/libvirt/images/rhel6 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
with -drive file=/var/lib/libvirt/images/rhel6

In short, the shorter you can make the qemu line needed to reproduce, the better ;)

Comment 9 Christophe Fergeau 2014-03-25 13:42:13 UTC

Fwiw, the shorter qemu commandline I could use to reproduce is 
/usr/libexec/qemu-kvm \
    -M rhel6.0.0 \
    -nodefaults \
    -vga qxl \
    -drive file=rhel6 \
    -vnc :0 \
    -m 2048 \
    -enable-kvm
(haven't checked if I could get rid of -nodefault or -enable-kvm fwiw). The -m 2048 seems required to get the crash as I could not reproduce without it or with -m 512.

Comment 10 Marc-Andre Lureau 2014-03-25 14:49:12 UTC

and it doesn't crash with -M rhel6.1.0

Comment 11 xhan 2014-03-26 09:04:39 UTC

The -M is crucial argument for the crash. So the problem would be why with -M rhel6.0.0 would cause crash.

Comment 12 RHEL Program Management 2014-04-03 05:48:29 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 15 Marc-Andre Lureau 2014-07-10 22:41:11 UTC

Although I have not identified the root cause yet, this is due to qxl device being set to revision 1, and thus usage of COMPAT flag, altough xorg-qxl guest driver does not support it since 0.1.1-9 in rhel6. I am not sure this regression was intentional (bug 1078390)

What is the version of qxl driver in your guest?


It could be that guest compat structure is different (layout), since xorg driver doesn't use the definition from spice-protocol.

It could also be that the qemu ring isn't flushed, and some qemu QXLDrawable are wrongly cast to QXLCompatDrawable.

Qemu should probably do surface bound checking before doing qxl_blit()

still investigating this

Comment 16 Xu Han 2014-07-11 08:30:18 UTC

(In reply to Marc-Andre Lureau from comment #15)
> Although I have not identified the root cause yet, this is due to qxl device
> being set to revision 1, and thus usage of COMPAT flag, altough xorg-qxl
> guest driver does not support it since 0.1.1-9 in rhel6. I am not sure this
> regression was intentional (bug 1078390)
> 
> What is the version of qxl driver in your guest?

I checked this issue with two guest(rhel 6.5 and rhel 7.0).

Well, booting up rhel 6.5 guest with following command line then got the segfault described in above comments:
/usr/libexec/qemu-kvm \
    -M rhel6.0.0 \
    -nodefaults \
    -vga qxl \
    -device ahci,id=ahci0,bus=pci.0,addr=04 \
    -drive id=drive_image1,if=none,cache=unsafe,snapshot=off,aio=native,file=/home/RHEL-Server-6.5-64-virtio.qcow2 \
    -device ide-hd,id=image1,drive=drive_image1,bus=ahci0.0,unit=0 \
    -vnc :0 \
    -m 2048 \
    -enable-kvm \
    -monitor stdio

However, it seems going well by using spice protocol.

And the rhel 7 guest will hit Bug 1043851, no matter which protocol being used.

xorg-qxl version in each guest:
    rhel 6.5 - xorg-x11-drv-qxl-0.1.0-7.el6.x86_64
    rhel 7.0 - xorg-x11-drv-qxl-0.1.1-9.el7.x86_64

Comment 17 Marc-Andre Lureau 2014-07-11 15:15:17 UTC

(In reply to Xu Han from comment #16)
> (In reply to Marc-Andre Lureau from comment #15)
> > Although I have not identified the root cause yet, this is due to qxl device
> > being set to revision 1, and thus usage of COMPAT flag, altough xorg-qxl
> > guest driver does not support it since 0.1.1-9 in rhel6. I am not sure this
> > regression was intentional (bug 1078390)
> > 
> > What is the version of qxl driver in your guest?

>     rhel 6.5 - xorg-x11-drv-qxl-0.1.0-7.el6.x86_64

that version should support qxlpci version 1

>     rhel 7.0 - xorg-x11-drv-qxl-0.1.1-9.el7.x86_64

that version no longer supports qxlpci version 1.

Ie we have the same bug with rhel 6.6 due to rebase in bug 1078390

Comment 18 Marc-Andre Lureau 2014-07-11 15:35:42 UTC

I can't reproduce the crash with 6.6 and xorg-x11-drv-qxl-0.1.0-7.el6.x86_64, but the display doesn't work either (it resizes 3 times with a gray gdm rectangle)

Comment 19 Marc-Andre Lureau 2014-07-18 14:38:28 UTC

* With xorg-x11-drv-qxl-0.1.0-7

the gray gdm screen comes from X crashing with gdm. (see /var/log/gdm/:0.log it is missing fbCopyRegion) even though starting Xorg manually works, the symbol no longer exists, as can be seen in compilation warnings too.

also interestingly, I haven't been able to reproduce the crash that easily lately, only <5%...

I think the compat code has been long unmaintained and untested, we should declare it officially deprecated.

* With xorg-x11-drv-qxl-0.1.1-12 (from rebase 1078390)

Black screen, no crash or X exit.

Is there really any interest in maintaining the rhel6.0 machine type with spice?

Comment 20 juzhang 2014-07-21 01:35:24 UTC

Hi Xu,

Can you have a look comment18 and comment19?

Best Regards,
Junyi

Comment 22 David Blechter 2014-07-22 17:25:20 UTC

closing as WONTFIX, as no customers have reported using rhel 6.0 guest.

Comment 23 Marc-Andre Lureau 2014-09-01 16:16:07 UTC

reopening, as there is a similar bug 1135372 in rhel6 and this is potentially a security issue

Comment 24 Christophe Fergeau 2014-09-24 10:53:46 UTC

Upstream commit is http://cgit.freedesktop.org/spice/spice/commit/?id=e270edcbfd958d764e84cdbca6d403ff24fef610

Comment 30 errata-xmlrpc 2015-03-05 07:56:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0335.html