865767 – qemu crashed when rhel6.3 64 bit guest reboots

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 865767 - qemu crashed when rhel6.3 64 bit guest reboots

Summary: qemu crashed when rhel6.3 64 bit guest reboots

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Gerd Hoffmann
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	865718 873214 (view as bug list)
Depends On:
Blocks:	867403 869982
TreeView+	depends on / blocked

Reported:	2012-10-12 11:44 UTC by Xiaoqing Wei
Modified:	2013-02-21 07:40 UTC (History)
CC List:	11 users (show)
Fixed In Version:	qemu-kvm-0.12.1.2-2.336.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	867403 869982 (view as bug list)
Environment:
Last Closed:	2013-02-21 07:40:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
thread apply all bt full (21.36 KB, text/plain) 2012-10-18 04:24 UTC, Xiaoqing Wei	no flags	Details
backtrace info (19.90 KB, text/plain) 2012-10-29 07:48 UTC, Xu Tian	no flags	Details
qemu-kvm backtrace (6.26 KB, text/plain) 2012-10-30 05:45 UTC, Qingtang Zhou	no flags	Details
glibc backtrace report (41.39 KB, text/plain) 2012-10-30 05:47 UTC, Qingtang Zhou	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:0527	0	normal	SHIPPED_LIVE	qemu-kvm bug fix and enhancement update	2013-02-20 21:51:08 UTC

Description Xiaoqing Wei 2012-10-12 11:44:51 UTC

Description of problem:

qemu crashed when rhel6.3.x86_64 guest reboot

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.322.el6.x86_64
spice-server-0.12.0-1.el6.x86_64

How reproducible:
1 / 4

Steps to Reproduce:
1.reboot a rhel.6.3 64bit guest on rhel.6.4 host w/ qemu-kvm.*-322
2.
3.
  
Actual results:
qemu crashes

Expected results:
guest work well and qemu didn't crash.

Additional info:

*NOTE* no knowing whether this is dup as #bz 865718, gdb bt info looks diff, so I report it as a seperate bug,
if it's dup, pls feel free to close.

(gdb) bt
#0  0x00007f4c64dc87e2 in _int_malloc (av=0x7f4c650dde80, bytes=<value optimized out>) at malloc.c:4512
#1  0x00007f4c64dc9b91 in __libc_malloc (bytes=288) at malloc.c:3664
#2  0x00007f4c6744cbb5 in qemu_malloc (size=288) at qemu-malloc.c:57
#3  0x00007f4c6744cca6 in qemu_mallocz (size=288) at qemu-malloc.c:76
#4  0x00007f4c674a49db in qemu_spice_create_one_update (ssd=0x7f4c69f9bab0, rect=0x7fff7c0d26e0) at ui/spice-display.c:184
#5  0x00007f4c674a4e97 in qemu_spice_create_update (ssd=<value optimized out>) at ui/spice-display.c:304
#6  0x00007f4c674a50b8 in qemu_spice_display_refresh (ssd=0x7f4c69f9bab0) at ui/spice-display.c:455
#7  0x00007f4c6741af7e in dpy_refresh (opaque=0x7f4c6959de60) at /usr/src/debug/qemu-kvm-0.12.1.2/console.h:268
#8  gui_update (opaque=0x7f4c6959de60) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3155
#9  0x00007f4c6741a8d0 in qemu_run_timers (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:1323
#10 main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4028
#11 0x00007f4c6743c31a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#12 0x00007f4c6741d315 in main_loop (argc=20, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4206
#13 main (argc=20, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6443

Comment 4 Alon Levy 2012-10-14 09:32:19 UTC

Just to be clear, is the order of events to reproduce:
1. boot RHEL 6.3
2. launch X with qxl driver
3. reboot

?

Comment 6 Alon Levy 2012-10-15 07:24:29 UTC

Hi Gerd,

 Assigning this to you as it seems to be in code you know best,

Thanks,
Alon

Comment 7 Gerd Hoffmann 2012-10-15 07:43:17 UTC

*** Bug 865718 has been marked as a duplicate of this bug. ***

Comment 8 Gerd Hoffmann 2012-10-15 08:26:30 UTC

Looks like use-after-free or buffer overflow killed malloc data structures.
Does it reproduce outside autotest?

Comment 9 Gerd Hoffmann 2012-10-15 10:08:30 UTC

Hmm, didn't reproduce locally so far, even with a screendump loop like autotest does (stack traces look like this could be involved).

Any chance you can run the autotest job with the electricfence malloc debugger?

(1) install ElectricFence
(2) make sure EF_ALLOW_MALLOC_0 environment variable is set to 1
    dunno how to do that with autotest
(3) use ef wrapper script to start qemu
    qemu_binary = /usr/bin/ef /usr/libexec/qemu-kvm

This way we should get a stacktrace showing the place where the memory corruption actually happens rather than the place where malloc is tripped up by the corruption.

Comment 10 Xiaoqing Wei 2012-10-15 11:31:44 UTC

(In reply to comment #8)
> Looks like use-after-free or buffer overflow killed malloc data structures.
> Does it reproduce outside autotest?

Hmmm, I booted a vm manually,
tried 100 times of reboot, vm still alive and didn't core dump.
on qemu-kvm-rhev-0.12.1.2-2.323.el6.x86_64
cmd:

qemu-kvm -name RHEL.6.3.64.REBOOT -nodefaults -monitor stdio -chardev socket,id=serial_id_20120913-134744-NKxy,path=/tmp/serial-20120913-134744-NKxy,server,nowait -device isa-serial,chardev=serial_id_20120913-134744-NKxy -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 -drive file=/root/staf-kvm/autotest/client/tests/kvm/images/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=off,snapshot=off,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idUbqACX,mac=9a:ef:9d:77:de:06,id=ndev00idUbqACX,bus=pci.0,addr=0x3 -netdev tap,id=idUbqACX,vhost=on -m 4096 -smp 2,cores=1,threads=1,sockets=2 -cpu SandyBridge -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -spice port=3000,password=123456,addr=0,image-compression=auto_glz,jpeg-wan-compression=auto,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4 -vga qxl -global qxl-vga.vram_size=33554432 -rtc base=utc,clock=host,driftfix=slew -M rhel6.4.0 -boot order=cdn,once=c,menu=off -no-kvm-pit-reinjection -bios /usr/share/seabios/bios-pm.bin -enable-kvm

Comment 11 Xiaoqing Wei 2012-10-17 09:46:01 UTC

(In reply to comment #9)
> Hmm, didn't reproduce locally so far, even with a screendump loop like
> autotest does (stack traces look like this could be involved).
> 
> Any chance you can run the autotest job with the electricfence malloc
> debugger?
> 
> (1) install ElectricFence
> (2) make sure EF_ALLOW_MALLOC_0 environment variable is set to 1
>     dunno how to do that with autotest
> (3) use ef wrapper script to start qemu
>     qemu_binary = /usr/bin/ef /usr/libexec/qemu-kvm
> 
> This way we should get a stacktrace showing the place where the memory
> corruption actually happens rather than the place where malloc is tripped up
> by the corruption.

Hi Gerd,

no knowing if this meets your requirement:
I append this line to tests.cfg
qemu_binary = ";EF_ALLOW_MALLOC_0=1 /usr/bin/ef `which qemu-kvm`"

then autotest will set that variable to 1 before launching the vm:
EF_ALLOW_MALLOC_0=1 /usr/bin/ef `which qemu-kvm` -name 'vm1' -nodefaults -chardev socket,id=qmp_monitor_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20121017-173104-PFVf,server,nowait -mon chardev=qmp_monitor_id_qmpmonitor1,mode=control -chardev socket,id=serial_id_20121017-173104-PFVf,path=/tmp/serial-20121017-173104-PFVf,server,nowait -device isa-serial,chardev=serial_id_20121017-173104-PFVf -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 -drive file='/root/staf-kvm/autotest/client/tests/kvm/images/RHEL-Server-6.3-64-virtio.qcow2',if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=off,snapshot=off,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idLUd7lP,mac=9a:a2:ad:6e:91:ca,id=ndev00idLUd7lP,bus=pci.0,addr=0x3 -netdev tap,id=idLUd7lP,vhost=on,fd=19 -m 4096 -smp 2,cores=1,threads=1,sockets=2 -cpu 'Penryn' -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -spice port=3001,password=123456,addr=0,image-compression=auto_glz,jpeg-wan-compression=auto,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4 -vga qxl -global qxl-vga.vram_size=33554432 -rtc base=utc,clock=host,driftfix=slew -M rhel6.4.0 -boot order=cdn,once=c,menu=off    -no-kvm-pit-reinjection -bios /usr/share/seabios/bios.bin -enable-kvm

and when the core dump happens, it prints:

17:43:13 INFO | [qemu output] /usr/bin/ef: line 20:  3835 Segmentation fault      (core dumped) ( export LD_PRELOAD=libefence.so.0.0; exec "$@" )
17:43:13 INFO | [qemu output] (Process terminated with status 139)

Comment 12 Gerd Hoffmann 2012-10-17 11:41:40 UTC

Looks good, try "tread apply all bt" on the core dump produced (autotest collects it, right?).

Comment 13 Xiaoqing Wei 2012-10-18 04:24:44 UTC

Created attachment 629143 [details]
thread apply all bt full

Comment 14 Xu Tian 2012-10-29 07:48:04 UTC

Created attachment 634904 [details]
backtrace info

Meet the same issue with the package qemu-kvm-0.12.1.2-2.331.el6.x86_64 in run autotest job;

Comment 15 Qingtang Zhou 2012-10-30 05:45:51 UTC

Created attachment 635335 [details]
qemu-kvm backtrace

Hi, I guess I hit this issue also, qemu-kvm crashed when I boot a RHEL5.8 guest.

qemu version: qemu-kvm-0.12.1.2-2.331.el6.x86_64

Comment 16 Qingtang Zhou 2012-10-30 05:47:31 UTC

Created attachment 635336 [details]
glibc backtrace report

Attach the glibc backtrace log in case someone like it.

Comment 17 Gerd Hoffmann 2012-10-30 09:27:39 UTC

(In reply to comment #14)
> Created attachment 634904 [details]
> backtrace info
> 
> Meet the same issue with the package qemu-kvm-0.12.1.2-2.331.el6.x86_64 in
> run autotest job;

Looks helpful.  Do you still have the core dump?  Can you upload it somewhere?

Comment 18 Gerd Hoffmann 2012-10-30 09:39:10 UTC

Given this happens in autotest I guess you don't need a spice client connected to trigger it, correct?

Comment 19 Xu Tian 2012-10-30 12:06:36 UTC

(In reply to comment #17)
> (In reply to comment #14)
> > Created attachment 634904 [details]
> > backtrace info
> > 
> > Meet the same issue with the package qemu-kvm-0.12.1.2-2.331.el6.x86_64 in
> > run autotest job;
> 
> Looks helpful.  Do you still have the core dump?  Can you upload it
> somewhere?

you can download it from http://fileshare.englab.nay.redhat.com/pub/section2/kvm/xu/bz865767/core

Comment 21 Gerd Hoffmann 2012-11-01 12:25:56 UTC

http://patchwork.ozlabs.org/patch/196184/

Comment 23 Gerd Hoffmann 2012-11-16 15:07:15 UTC

Patch posted.

Comment 25 Gerd Hoffmann 2012-11-19 08:31:01 UTC

*** Bug 873214 has been marked as a duplicate of this bug. ***

Comment 31 errata-xmlrpc 2013-02-21 07:40:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0527.html

Note You need to log in before you can comment on or make changes to this bug.