Bug 865767

Summary: qemu crashed when rhel6.3 64 bit guest reboots
Product: Red Hat Enterprise Linux 6 Reporter: Xiaoqing Wei <xwei>
Component: qemu-kvmAssignee: Gerd Hoffmann <kraxel>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: acathrow, alevy, bsarathy, dyasny, juzhang, knoel, michen, mkenneth, shuang, virt-maint, xutian
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.336.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 867403 869982 (view as bug list) Environment:
Last Closed: 2013-02-21 07:40:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 867403, 869982    
Attachments:
Description Flags
thread apply all bt full
none
backtrace info
none
qemu-kvm backtrace
none
glibc backtrace report none

Description Xiaoqing Wei 2012-10-12 11:44:51 UTC
Description of problem:

qemu crashed when rhel6.3.x86_64 guest reboot

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.322.el6.x86_64
spice-server-0.12.0-1.el6.x86_64

How reproducible:
1 / 4

Steps to Reproduce:
1.reboot a rhel.6.3 64bit guest on rhel.6.4 host w/ qemu-kvm.*-322
2.
3.
  
Actual results:
qemu crashes

Expected results:
guest work well and qemu didn't crash.

Additional info:

*NOTE* no knowing whether this is dup as #bz 865718, gdb bt info looks diff, so I report it as a seperate bug,
if it's dup, pls feel free to close.

(gdb) bt
#0  0x00007f4c64dc87e2 in _int_malloc (av=0x7f4c650dde80, bytes=<value optimized out>) at malloc.c:4512
#1  0x00007f4c64dc9b91 in __libc_malloc (bytes=288) at malloc.c:3664
#2  0x00007f4c6744cbb5 in qemu_malloc (size=288) at qemu-malloc.c:57
#3  0x00007f4c6744cca6 in qemu_mallocz (size=288) at qemu-malloc.c:76
#4  0x00007f4c674a49db in qemu_spice_create_one_update (ssd=0x7f4c69f9bab0, rect=0x7fff7c0d26e0) at ui/spice-display.c:184
#5  0x00007f4c674a4e97 in qemu_spice_create_update (ssd=<value optimized out>) at ui/spice-display.c:304
#6  0x00007f4c674a50b8 in qemu_spice_display_refresh (ssd=0x7f4c69f9bab0) at ui/spice-display.c:455
#7  0x00007f4c6741af7e in dpy_refresh (opaque=0x7f4c6959de60) at /usr/src/debug/qemu-kvm-0.12.1.2/console.h:268
#8  gui_update (opaque=0x7f4c6959de60) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3155
#9  0x00007f4c6741a8d0 in qemu_run_timers (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:1323
#10 main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4028
#11 0x00007f4c6743c31a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#12 0x00007f4c6741d315 in main_loop (argc=20, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4206
#13 main (argc=20, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6443

Comment 4 Alon Levy 2012-10-14 09:32:19 UTC
Just to be clear, is the order of events to reproduce:
1. boot RHEL 6.3
2. launch X with qxl driver
3. reboot

?

Comment 6 Alon Levy 2012-10-15 07:24:29 UTC
Hi Gerd,

 Assigning this to you as it seems to be in code you know best,

Thanks,
Alon

Comment 7 Gerd Hoffmann 2012-10-15 07:43:17 UTC
*** Bug 865718 has been marked as a duplicate of this bug. ***

Comment 8 Gerd Hoffmann 2012-10-15 08:26:30 UTC
Looks like use-after-free or buffer overflow killed malloc data structures.
Does it reproduce outside autotest?

Comment 9 Gerd Hoffmann 2012-10-15 10:08:30 UTC
Hmm, didn't reproduce locally so far, even with a screendump loop like autotest does (stack traces look like this could be involved).

Any chance you can run the autotest job with the electricfence malloc debugger?

(1) install ElectricFence
(2) make sure EF_ALLOW_MALLOC_0 environment variable is set to 1
    dunno how to do that with autotest
(3) use ef wrapper script to start qemu
    qemu_binary = /usr/bin/ef /usr/libexec/qemu-kvm

This way we should get a stacktrace showing the place where the memory corruption actually happens rather than the place where malloc is tripped up by the corruption.

Comment 10 Xiaoqing Wei 2012-10-15 11:31:44 UTC
(In reply to comment #8)
> Looks like use-after-free or buffer overflow killed malloc data structures.
> Does it reproduce outside autotest?

Hmmm, I booted a vm manually,
tried 100 times of reboot, vm still alive and didn't core dump.
on qemu-kvm-rhev-0.12.1.2-2.323.el6.x86_64
cmd:

qemu-kvm -name RHEL.6.3.64.REBOOT -nodefaults -monitor stdio -chardev socket,id=serial_id_20120913-134744-NKxy,path=/tmp/serial-20120913-134744-NKxy,server,nowait -device isa-serial,chardev=serial_id_20120913-134744-NKxy -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 -drive file=/root/staf-kvm/autotest/client/tests/kvm/images/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=off,snapshot=off,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idUbqACX,mac=9a:ef:9d:77:de:06,id=ndev00idUbqACX,bus=pci.0,addr=0x3 -netdev tap,id=idUbqACX,vhost=on -m 4096 -smp 2,cores=1,threads=1,sockets=2 -cpu SandyBridge -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -spice port=3000,password=123456,addr=0,image-compression=auto_glz,jpeg-wan-compression=auto,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4 -vga qxl -global qxl-vga.vram_size=33554432 -rtc base=utc,clock=host,driftfix=slew -M rhel6.4.0 -boot order=cdn,once=c,menu=off -no-kvm-pit-reinjection -bios /usr/share/seabios/bios-pm.bin -enable-kvm

Comment 11 Xiaoqing Wei 2012-10-17 09:46:01 UTC
(In reply to comment #9)
> Hmm, didn't reproduce locally so far, even with a screendump loop like
> autotest does (stack traces look like this could be involved).
> 
> Any chance you can run the autotest job with the electricfence malloc
> debugger?
> 
> (1) install ElectricFence
> (2) make sure EF_ALLOW_MALLOC_0 environment variable is set to 1
>     dunno how to do that with autotest
> (3) use ef wrapper script to start qemu
>     qemu_binary = /usr/bin/ef /usr/libexec/qemu-kvm
> 
> This way we should get a stacktrace showing the place where the memory
> corruption actually happens rather than the place where malloc is tripped up
> by the corruption.

Hi Gerd,

no knowing if this meets your requirement:
I append this line to tests.cfg
qemu_binary = ";EF_ALLOW_MALLOC_0=1 /usr/bin/ef `which qemu-kvm`"

then autotest will set that variable to 1 before launching the vm:
EF_ALLOW_MALLOC_0=1 /usr/bin/ef `which qemu-kvm` -name 'vm1' -nodefaults -chardev socket,id=qmp_monitor_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20121017-173104-PFVf,server,nowait -mon chardev=qmp_monitor_id_qmpmonitor1,mode=control -chardev socket,id=serial_id_20121017-173104-PFVf,path=/tmp/serial-20121017-173104-PFVf,server,nowait -device isa-serial,chardev=serial_id_20121017-173104-PFVf -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 -drive file='/root/staf-kvm/autotest/client/tests/kvm/images/RHEL-Server-6.3-64-virtio.qcow2',if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=off,snapshot=off,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idLUd7lP,mac=9a:a2:ad:6e:91:ca,id=ndev00idLUd7lP,bus=pci.0,addr=0x3 -netdev tap,id=idLUd7lP,vhost=on,fd=19 -m 4096 -smp 2,cores=1,threads=1,sockets=2 -cpu 'Penryn' -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -spice port=3001,password=123456,addr=0,image-compression=auto_glz,jpeg-wan-compression=auto,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4 -vga qxl -global qxl-vga.vram_size=33554432 -rtc base=utc,clock=host,driftfix=slew -M rhel6.4.0 -boot order=cdn,once=c,menu=off    -no-kvm-pit-reinjection -bios /usr/share/seabios/bios.bin -enable-kvm

and when the core dump happens, it prints:

17:43:13 INFO | [qemu output] /usr/bin/ef: line 20:  3835 Segmentation fault      (core dumped) ( export LD_PRELOAD=libefence.so.0.0; exec "$@" )
17:43:13 INFO | [qemu output] (Process terminated with status 139)

Comment 12 Gerd Hoffmann 2012-10-17 11:41:40 UTC
Looks good, try "tread apply all bt" on the core dump produced (autotest collects it, right?).

Comment 13 Xiaoqing Wei 2012-10-18 04:24:44 UTC
Created attachment 629143 [details]
thread apply all bt full

Comment 14 Xu Tian 2012-10-29 07:48:04 UTC
Created attachment 634904 [details]
backtrace info

Meet the same issue with the package qemu-kvm-0.12.1.2-2.331.el6.x86_64 in run autotest job;

Comment 15 Qingtang Zhou 2012-10-30 05:45:51 UTC
Created attachment 635335 [details]
qemu-kvm backtrace

Hi, I guess I hit this issue also, qemu-kvm crashed when I boot a RHEL5.8 guest.

qemu version: qemu-kvm-0.12.1.2-2.331.el6.x86_64

Comment 16 Qingtang Zhou 2012-10-30 05:47:31 UTC
Created attachment 635336 [details]
glibc backtrace report

Attach the glibc backtrace log in case someone like it.

Comment 17 Gerd Hoffmann 2012-10-30 09:27:39 UTC
(In reply to comment #14)
> Created attachment 634904 [details]
> backtrace info
> 
> Meet the same issue with the package qemu-kvm-0.12.1.2-2.331.el6.x86_64 in
> run autotest job;

Looks helpful.  Do you still have the core dump?  Can you upload it somewhere?

Comment 18 Gerd Hoffmann 2012-10-30 09:39:10 UTC
Given this happens in autotest I guess you don't need a spice client connected to trigger it, correct?

Comment 19 Xu Tian 2012-10-30 12:06:36 UTC
(In reply to comment #17)
> (In reply to comment #14)
> > Created attachment 634904 [details]
> > backtrace info
> > 
> > Meet the same issue with the package qemu-kvm-0.12.1.2-2.331.el6.x86_64 in
> > run autotest job;
> 
> Looks helpful.  Do you still have the core dump?  Can you upload it
> somewhere?

you can download it from http://fileshare.englab.nay.redhat.com/pub/section2/kvm/xu/bz865767/core

Comment 21 Gerd Hoffmann 2012-11-01 12:25:56 UTC
http://patchwork.ozlabs.org/patch/196184/

Comment 23 Gerd Hoffmann 2012-11-16 15:07:15 UTC
Patch posted.

Comment 25 Gerd Hoffmann 2012-11-19 08:31:01 UTC
*** Bug 873214 has been marked as a duplicate of this bug. ***

Comment 31 errata-xmlrpc 2013-02-21 07:40:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0527.html