Bug 865767 - qemu crashed when rhel6.3 64 bit guest reboots
qemu crashed when rhel6.3 64 bit guest reboots
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.3
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Gerd Hoffmann
Virtualization Bugs
: Regression
: 865718 873214 (view as bug list)
Depends On:
Blocks: 867403 869982
  Show dependency treegraph
 
Reported: 2012-10-12 07:44 EDT by Xiaoqing Wei
Modified: 2013-02-21 02:40 EST (History)
11 users (show)

See Also:
Fixed In Version: qemu-kvm-0.12.1.2-2.336.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 867403 869982 (view as bug list)
Environment:
Last Closed: 2013-02-21 02:40:13 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
thread apply all bt full (21.36 KB, text/plain)
2012-10-18 00:24 EDT, Xiaoqing Wei
no flags Details
backtrace info (19.90 KB, text/plain)
2012-10-29 03:48 EDT, xu
no flags Details
qemu-kvm backtrace (6.26 KB, text/plain)
2012-10-30 01:45 EDT, Qingtang Zhou
no flags Details
glibc backtrace report (41.39 KB, text/plain)
2012-10-30 01:47 EDT, Qingtang Zhou
no flags Details

  None (edit)
Description Xiaoqing Wei 2012-10-12 07:44:51 EDT
Description of problem:

qemu crashed when rhel6.3.x86_64 guest reboot

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.322.el6.x86_64
spice-server-0.12.0-1.el6.x86_64

How reproducible:
1 / 4

Steps to Reproduce:
1.reboot a rhel.6.3 64bit guest on rhel.6.4 host w/ qemu-kvm.*-322
2.
3.
  
Actual results:
qemu crashes

Expected results:
guest work well and qemu didn't crash.

Additional info:

*NOTE* no knowing whether this is dup as #bz 865718, gdb bt info looks diff, so I report it as a seperate bug,
if it's dup, pls feel free to close.

(gdb) bt
#0  0x00007f4c64dc87e2 in _int_malloc (av=0x7f4c650dde80, bytes=<value optimized out>) at malloc.c:4512
#1  0x00007f4c64dc9b91 in __libc_malloc (bytes=288) at malloc.c:3664
#2  0x00007f4c6744cbb5 in qemu_malloc (size=288) at qemu-malloc.c:57
#3  0x00007f4c6744cca6 in qemu_mallocz (size=288) at qemu-malloc.c:76
#4  0x00007f4c674a49db in qemu_spice_create_one_update (ssd=0x7f4c69f9bab0, rect=0x7fff7c0d26e0) at ui/spice-display.c:184
#5  0x00007f4c674a4e97 in qemu_spice_create_update (ssd=<value optimized out>) at ui/spice-display.c:304
#6  0x00007f4c674a50b8 in qemu_spice_display_refresh (ssd=0x7f4c69f9bab0) at ui/spice-display.c:455
#7  0x00007f4c6741af7e in dpy_refresh (opaque=0x7f4c6959de60) at /usr/src/debug/qemu-kvm-0.12.1.2/console.h:268
#8  gui_update (opaque=0x7f4c6959de60) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3155
#9  0x00007f4c6741a8d0 in qemu_run_timers (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:1323
#10 main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4028
#11 0x00007f4c6743c31a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#12 0x00007f4c6741d315 in main_loop (argc=20, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4206
#13 main (argc=20, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6443
Comment 4 Alon Levy 2012-10-14 05:32:19 EDT
Just to be clear, is the order of events to reproduce:
1. boot RHEL 6.3
2. launch X with qxl driver
3. reboot

?
Comment 6 Alon Levy 2012-10-15 03:24:29 EDT
Hi Gerd,

 Assigning this to you as it seems to be in code you know best,

Thanks,
Alon
Comment 7 Gerd Hoffmann 2012-10-15 03:43:17 EDT
*** Bug 865718 has been marked as a duplicate of this bug. ***
Comment 8 Gerd Hoffmann 2012-10-15 04:26:30 EDT
Looks like use-after-free or buffer overflow killed malloc data structures.
Does it reproduce outside autotest?
Comment 9 Gerd Hoffmann 2012-10-15 06:08:30 EDT
Hmm, didn't reproduce locally so far, even with a screendump loop like autotest does (stack traces look like this could be involved).

Any chance you can run the autotest job with the electricfence malloc debugger?

(1) install ElectricFence
(2) make sure EF_ALLOW_MALLOC_0 environment variable is set to 1
    dunno how to do that with autotest
(3) use ef wrapper script to start qemu
    qemu_binary = /usr/bin/ef /usr/libexec/qemu-kvm

This way we should get a stacktrace showing the place where the memory corruption actually happens rather than the place where malloc is tripped up by the corruption.
Comment 10 Xiaoqing Wei 2012-10-15 07:31:44 EDT
(In reply to comment #8)
> Looks like use-after-free or buffer overflow killed malloc data structures.
> Does it reproduce outside autotest?

Hmmm, I booted a vm manually,
tried 100 times of reboot, vm still alive and didn't core dump.
on qemu-kvm-rhev-0.12.1.2-2.323.el6.x86_64
cmd:

qemu-kvm -name RHEL.6.3.64.REBOOT -nodefaults -monitor stdio -chardev socket,id=serial_id_20120913-134744-NKxy,path=/tmp/serial-20120913-134744-NKxy,server,nowait -device isa-serial,chardev=serial_id_20120913-134744-NKxy -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 -drive file=/root/staf-kvm/autotest/client/tests/kvm/images/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=off,snapshot=off,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idUbqACX,mac=9a:ef:9d:77:de:06,id=ndev00idUbqACX,bus=pci.0,addr=0x3 -netdev tap,id=idUbqACX,vhost=on -m 4096 -smp 2,cores=1,threads=1,sockets=2 -cpu SandyBridge -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -spice port=3000,password=123456,addr=0,image-compression=auto_glz,jpeg-wan-compression=auto,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4 -vga qxl -global qxl-vga.vram_size=33554432 -rtc base=utc,clock=host,driftfix=slew -M rhel6.4.0 -boot order=cdn,once=c,menu=off -no-kvm-pit-reinjection -bios /usr/share/seabios/bios-pm.bin -enable-kvm
Comment 11 Xiaoqing Wei 2012-10-17 05:46:01 EDT
(In reply to comment #9)
> Hmm, didn't reproduce locally so far, even with a screendump loop like
> autotest does (stack traces look like this could be involved).
> 
> Any chance you can run the autotest job with the electricfence malloc
> debugger?
> 
> (1) install ElectricFence
> (2) make sure EF_ALLOW_MALLOC_0 environment variable is set to 1
>     dunno how to do that with autotest
> (3) use ef wrapper script to start qemu
>     qemu_binary = /usr/bin/ef /usr/libexec/qemu-kvm
> 
> This way we should get a stacktrace showing the place where the memory
> corruption actually happens rather than the place where malloc is tripped up
> by the corruption.

Hi Gerd,

no knowing if this meets your requirement:
I append this line to tests.cfg
qemu_binary = ";EF_ALLOW_MALLOC_0=1 /usr/bin/ef `which qemu-kvm`"

then autotest will set that variable to 1 before launching the vm:
EF_ALLOW_MALLOC_0=1 /usr/bin/ef `which qemu-kvm` -name 'vm1' -nodefaults -chardev socket,id=qmp_monitor_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20121017-173104-PFVf,server,nowait -mon chardev=qmp_monitor_id_qmpmonitor1,mode=control -chardev socket,id=serial_id_20121017-173104-PFVf,path=/tmp/serial-20121017-173104-PFVf,server,nowait -device isa-serial,chardev=serial_id_20121017-173104-PFVf -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 -drive file='/root/staf-kvm/autotest/client/tests/kvm/images/RHEL-Server-6.3-64-virtio.qcow2',if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=off,snapshot=off,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idLUd7lP,mac=9a:a2:ad:6e:91:ca,id=ndev00idLUd7lP,bus=pci.0,addr=0x3 -netdev tap,id=idLUd7lP,vhost=on,fd=19 -m 4096 -smp 2,cores=1,threads=1,sockets=2 -cpu 'Penryn' -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -spice port=3001,password=123456,addr=0,image-compression=auto_glz,jpeg-wan-compression=auto,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4 -vga qxl -global qxl-vga.vram_size=33554432 -rtc base=utc,clock=host,driftfix=slew -M rhel6.4.0 -boot order=cdn,once=c,menu=off    -no-kvm-pit-reinjection -bios /usr/share/seabios/bios.bin -enable-kvm

and when the core dump happens, it prints:

17:43:13 INFO | [qemu output] /usr/bin/ef: line 20:  3835 Segmentation fault      (core dumped) ( export LD_PRELOAD=libefence.so.0.0; exec "$@" )
17:43:13 INFO | [qemu output] (Process terminated with status 139)
Comment 12 Gerd Hoffmann 2012-10-17 07:41:40 EDT
Looks good, try "tread apply all bt" on the core dump produced (autotest collects it, right?).
Comment 13 Xiaoqing Wei 2012-10-18 00:24:44 EDT
Created attachment 629143 [details]
thread apply all bt full
Comment 14 xu 2012-10-29 03:48:04 EDT
Created attachment 634904 [details]
backtrace info

Meet the same issue with the package qemu-kvm-0.12.1.2-2.331.el6.x86_64 in run autotest job;
Comment 15 Qingtang Zhou 2012-10-30 01:45:51 EDT
Created attachment 635335 [details]
qemu-kvm backtrace

Hi, I guess I hit this issue also, qemu-kvm crashed when I boot a RHEL5.8 guest.

qemu version: qemu-kvm-0.12.1.2-2.331.el6.x86_64
Comment 16 Qingtang Zhou 2012-10-30 01:47:31 EDT
Created attachment 635336 [details]
glibc backtrace report

Attach the glibc backtrace log in case someone like it.
Comment 17 Gerd Hoffmann 2012-10-30 05:27:39 EDT
(In reply to comment #14)
> Created attachment 634904 [details]
> backtrace info
> 
> Meet the same issue with the package qemu-kvm-0.12.1.2-2.331.el6.x86_64 in
> run autotest job;

Looks helpful.  Do you still have the core dump?  Can you upload it somewhere?
Comment 18 Gerd Hoffmann 2012-10-30 05:39:10 EDT
Given this happens in autotest I guess you don't need a spice client connected to trigger it, correct?
Comment 19 xu 2012-10-30 08:06:36 EDT
(In reply to comment #17)
> (In reply to comment #14)
> > Created attachment 634904 [details]
> > backtrace info
> > 
> > Meet the same issue with the package qemu-kvm-0.12.1.2-2.331.el6.x86_64 in
> > run autotest job;
> 
> Looks helpful.  Do you still have the core dump?  Can you upload it
> somewhere?

you can download it from http://fileshare.englab.nay.redhat.com/pub/section2/kvm/xu/bz865767/core
Comment 21 Gerd Hoffmann 2012-11-01 08:25:56 EDT
http://patchwork.ozlabs.org/patch/196184/
Comment 23 Gerd Hoffmann 2012-11-16 10:07:15 EST
Patch posted.
Comment 25 Gerd Hoffmann 2012-11-19 03:31:01 EST
*** Bug 873214 has been marked as a duplicate of this bug. ***
Comment 31 errata-xmlrpc 2013-02-21 02:40:13 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0527.html

Note You need to log in before you can comment on or make changes to this bug.