240009 – qemu-dm segfault installing FreeBSD 32 bit FV on heavily loaded machine

Bug 240009 - qemu-dm segfault installing FreeBSD 32 bit FV on heavily loaded machine

Summary: qemu-dm segfault installing FreeBSD 32 bit FV on heavily loaded machine

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	xen
Sub Component:
Version:	9
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Richard W.M. Jones
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-05-14 11:29 UTC by Richard W.M. Jones
Modified:	2009-03-24 18:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-03-24 18:27:24 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
FreeBSD installation notes (593 bytes, text/plain) 2007-05-14 11:29 UTC, Richard W.M. Jones	no flags	Details
Core dump from qemu-dm (59.09 KB, application/octet-stream) 2007-05-15 12:10 UTC, Richard W.M. Jones	no flags	Details
Patch to pass structure instead of pointers to the IDE DMA thread. (1.66 KB, patch) 2007-05-15 19:05 UTC, Richard W.M. Jones	no flags	Details \| Diff
Screenshot of FreeBSD install failing. (22.56 KB, image/png) 2007-05-16 15:34 UTC, Richard W.M. Jones	no flags	Details
View All

Description Richard W.M. Jones 2007-05-14 11:29:12 UTC

Description of problem:

Install of FreeBSD 6.2 32 bit kernel in fullvirt on an x86_64 dom0 (ie. FV
32-on-64), with heavy load on the machine, causes qemu-dm to segfault with an
error such as:

qemu-dm[10011]: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000041400c18 error 14

Version-Release number of selected component (if applicable):

xen-3.1.0-0.rc7.1.fc7
Linux lambda 2.6.20-2925.8.fc7xen #1 SMP Thu May 10 17:47:43 EDT 2007 x86_64
x86_64 x86_64 GNU/Linux

Other Fedora 7 components are fully up to date as of the moment this was posted.

How reproducible:

Intermittently on my test machine. Much easier to reproduce under heavy load,
as described below.

Steps to Reproduce:
1. Download
ftp://ftp.uk.freebsd.org/pub/FreeBSD/releases/i386/ISO-IMAGES/6.2/6.2-RELEASE-i386-bootonly.iso
2. Begin the install as in the attached instructions.
3. During the install, run 'make -j 4' of a kernel and at the same time boot and
shutdown other guests.

Actual results:

qemu-dm segfaults (the visual indication of this in virt-manager is that
suddenly the FreeBSD console is lost with message "The console is currently
unavailable" although continues -- incorrectly I think -- to display that the
FreeBSD guest is running)

Expected results:

qemu-dm shouldn't segfault.

Additional info:

I was starting up and shutting down two other guests which were both PV. No
other FV guests were running apart from the FreeBSD installer.

It's not very clear, but this upstream bug might be a manifestation of the same
thing:

http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=542

Comment 1 Richard W.M. Jones 2007-05-14 11:29:12 UTC

Created attachment 154641 [details]
FreeBSD installation notes

Comment 2 Richard W.M. Jones 2007-05-14 12:42:23 UTC

I also reproduced this bug with just load, no other guests running.

On Dom0 (a 4 core Athlon) I am running:
  cd linux-2.6.21.1; while true; do make -j 4; make clean; done

No guests are running, except a FreeBSD 6.2 FV 32-on-64 install.  After a little
while the install stops, and in Dom0's dmesg:

qemu-dm[3075]: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000041400c18 error 14

Comment 3 Richard W.M. Jones 2007-05-14 14:30:55 UTC

This bug also happens with an updated Xen hypervisor.  [Background: Dan pointed
out that cset 15038,
http://xenbits.xensource.com/xen-3.1-testing.hg?rev/c00b2ab8af2c looked like it
might have had something to do with this, but even with this change the segfault
is still happening.]

Comment 4 Richard W.M. Jones 2007-05-15 12:10:15 UTC

Created attachment 154722 [details]
Core dump from qemu-dm

Core dump from qemu-dm.

Corresponding binary:
$ rpm -qf /usr/lib64/xen/bin/qemu-dm
xen-3.1.0-0.rc7.1.fc7

Stack trace (from gdb):

Core was generated by `/usr/lib64/xen/bin/qemu-dm -d 2 -vcpus 1 -boot d -serial
pty -acpi -domain-name'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x000000000042c085 in dma_thread_func (opaque=<value optimized out>)
    at /usr/src/debug/xen-3.1.0-testing.hg-rc7/tools/ioemu/hw/ide.c:2402
#2  0x00000030310061b5 in start_thread () from /lib64/libpthread.so.0
#3  0x00000030304d043d in clone () from /lib64/libc.so.6

[Quite amazingly this 60K file expands to the full 39MB core dump with md5sum
189a904867814006d199f4d92c2f642c]

Comment 5 Richard W.M. Jones 2007-05-15 12:22:37 UTC

Stack trace from each thread:

(gdb) thread apply all bt

Thread 3 (process 29295):
#0  0x00000030304c9952 in select () from /lib64/libc.so.6
#1  0x0000000000409555 in main_loop_wait (timeout=10)
    at /usr/src/debug/xen-3.1.0-testing.hg-rc7/tools/ioemu/vl.c:5216
#2  0x000000000046d251 in main_loop ()
    at
/usr/src/debug/xen-3.1.0-testing.hg-rc7/tools/ioemu/target-i386-dm/helper2.c:628
#3  0x000000000040b206 in main (argc=19, argv=0x7fff0c4f2e08)
    at /usr/src/debug/xen-3.1.0-testing.hg-rc7/tools/ioemu/vl.c:6903
#4  0x000000303041da54 in __libc_start_main () from /lib64/libc.so.6
#5  0x0000000000404809 in _start ()

Thread 2 (process 29306):
#0  0x000000303100cabb in read () from /lib64/libpthread.so.0
#1  0x000000303180197a in read_all (fd=5, data=0xc9d2f0, len=16)
    at /usr/include/bits/unistd.h:35
#2  0x00000030318019f2 in read_message (h=0xc9b6b0) at xs.c:768
#3  0x0000003031801b4c in read_thread (arg=<value optimized out>) at xs.c:821
#4  0x00000030310061b5 in start_thread () from /lib64/libpthread.so.0
#5  0x00000030304d043d in clone () from /lib64/libc.so.6

Thread 1 (process 29426):
#0  0x0000000000000000 in ?? ()
#1  0x000000000042c085 in dma_thread_func (opaque=<value optimized out>)
    at /usr/src/debug/xen-3.1.0-testing.hg-rc7/tools/ioemu/hw/ide.c:2402
#2  0x00000030310061b5 in start_thread () from /lib64/libpthread.so.0
#3  0x00000030304d043d in clone () from /lib64/libc.so.6

Comment 6 Richard W.M. Jones 2007-05-15 17:49:03 UTC

I compiled qemu-dm with -O0 -g and generated another core dump:

http://annexia.org/tmp/qemu-dm.bz2
http://annexia.org/tmp/core.qemu-dm.10152.1179249168.bz2

Comment 7 Richard W.M. Jones 2007-05-15 19:05:40 UTC

Created attachment 154763 [details]
Patch to pass structure instead of pointers to the IDE DMA thread.

This patch is currently looking solid.	The FreeBSD install has got much
further than before.  If it stays up overnight I'll feed it upstream.

Comment 8 Richard W.M. Jones 2007-05-15 21:28:07 UTC

FreeBSD install finished successfully for the first time under load.  Patch sent
upstream.

Comment 9 Richard W.M. Jones 2007-05-16 15:34:17 UTC

Created attachment 154836 [details]
Screenshot of FreeBSD install failing.

Unfortunately this patch hasn't corrected the problem.	I'm still seeing
FreeBSD failing during the install at the same place as before, although with a
different error.  This time qemu-dm isn't segfaulting, but FreeBSD itself is
giving an error as shown in the screenshot.

The error is:

anic: initiate_write_inodeblock_ufs2: already started

Comment 10 Chris Lalancette 2008-01-30 17:58:04 UTC

FYI regarding this bug:

There was a recent exchange with someone complaining about IDE multi-threading
problems.  Keir has checked in a patch to 3.2/3.1 that fixes that particular
problem; it may also be relevant here:

http://lists.xensource.com/archives/html/xen-devel/2008-01/msg01151.html

Chris Lalancette

Comment 11 Bug Zapper 2008-05-14 02:53:58 UTC

Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Note You need to log in before you can comment on or make changes to this bug.