+++ This bug was initially created as a clone of Bug #240009 +++ Description of problem: Install of FreeBSD 6.2 32 bit kernel in fullvirt on an x86_64 dom0 (ie. FV 32-on-64), with heavy load on the machine, causes qemu-dm to segfault with an error such as: qemu-dm[10011]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000000041400c18 error 14 Version-Release number of selected component (if applicable): xen-3.1.0-0.rc7.1.fc7 Linux lambda 2.6.20-2925.8.fc7xen #1 SMP Thu May 10 17:47:43 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux Other Fedora 7 components are fully up to date as of the moment this was posted. How reproducible: Intermittently on my test machine. Much easier to reproduce under heavy load, as described below. Steps to Reproduce: 1. Download ftp://ftp.uk.freebsd.org/pub/FreeBSD/releases/i386/ISO-IMAGES/6.2/6.2-RELEASE-i386-bootonly.iso 2. Begin the install as in the attached instructions. 3. During the install, run 'make -j 4' of a kernel and at the same time boot and shutdown other guests. Actual results: qemu-dm segfaults (the visual indication of this in virt-manager is that suddenly the FreeBSD console is lost with message "The console is currently unavailable" although continues -- incorrectly I think -- to display that the FreeBSD guest is running) Expected results: qemu-dm shouldn't segfault. Additional info: I was starting up and shutting down two other guests which were both PV. No other FV guests were running apart from the FreeBSD installer. It's not very clear, but this upstream bug might be a manifestation of the same thing: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=542 -- Additional comment from rjones on 2007-05-14 07:29 EST -- Created an attachment (id=154641) FreeBSD installation notes -- Additional comment from rjones on 2007-05-14 08:42 EST -- I also reproduced this bug with just load, no other guests running. On Dom0 (a 4 core Athlon) I am running: cd linux-2.6.21.1; while true; do make -j 4; make clean; done No guests are running, except a FreeBSD 6.2 FV 32-on-64 install. After a little while the install stops, and in Dom0's dmesg: qemu-dm[3075]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000000041400c18 error 14 -- Additional comment from rjones on 2007-05-14 10:30 EST -- This bug also happens with an updated Xen hypervisor. [Background: Dan pointed out that cset 15038, http://xenbits.xensource.com/xen-3.1-testing.hg?rev/c00b2ab8af2c looked like it might have had something to do with this, but even with this change the segfault is still happening.] -- Additional comment from rjones on 2007-05-15 08:10 EST -- Created an attachment (id=154722) Core dump from qemu-dm Core dump from qemu-dm. Corresponding binary: $ rpm -qf /usr/lib64/xen/bin/qemu-dm xen-3.1.0-0.rc7.1.fc7 Stack trace (from gdb): Core was generated by `/usr/lib64/xen/bin/qemu-dm -d 2 -vcpus 1 -boot d -serial pty -acpi -domain-name'. Program terminated with signal 11, Segmentation fault. #0 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x000000000042c085 in dma_thread_func (opaque=<value optimized out>) at /usr/src/debug/xen-3.1.0-testing.hg-rc7/tools/ioemu/hw/ide.c:2402 #2 0x00000030310061b5 in start_thread () from /lib64/libpthread.so.0 #3 0x00000030304d043d in clone () from /lib64/libc.so.6 [Quite amazingly this 60K file expands to the full 39MB core dump with md5sum 189a904867814006d199f4d92c2f642c] -- Additional comment from rjones on 2007-05-15 08:22 EST -- Stack trace from each thread: (gdb) thread apply all bt Thread 3 (process 29295): #0 0x00000030304c9952 in select () from /lib64/libc.so.6 #1 0x0000000000409555 in main_loop_wait (timeout=10) at /usr/src/debug/xen-3.1.0-testing.hg-rc7/tools/ioemu/vl.c:5216 #2 0x000000000046d251 in main_loop () at /usr/src/debug/xen-3.1.0-testing.hg-rc7/tools/ioemu/target-i386-dm/helper2.c:628 #3 0x000000000040b206 in main (argc=19, argv=0x7fff0c4f2e08) at /usr/src/debug/xen-3.1.0-testing.hg-rc7/tools/ioemu/vl.c:6903 #4 0x000000303041da54 in __libc_start_main () from /lib64/libc.so.6 #5 0x0000000000404809 in _start () Thread 2 (process 29306): #0 0x000000303100cabb in read () from /lib64/libpthread.so.0 #1 0x000000303180197a in read_all (fd=5, data=0xc9d2f0, len=16) at /usr/include/bits/unistd.h:35 #2 0x00000030318019f2 in read_message (h=0xc9b6b0) at xs.c:768 #3 0x0000003031801b4c in read_thread (arg=<value optimized out>) at xs.c:821 #4 0x00000030310061b5 in start_thread () from /lib64/libpthread.so.0 #5 0x00000030304d043d in clone () from /lib64/libc.so.6 Thread 1 (process 29426): #0 0x0000000000000000 in ?? () #1 0x000000000042c085 in dma_thread_func (opaque=<value optimized out>) at /usr/src/debug/xen-3.1.0-testing.hg-rc7/tools/ioemu/hw/ide.c:2402 #2 0x00000030310061b5 in start_thread () from /lib64/libpthread.so.0 #3 0x00000030304d043d in clone () from /lib64/libc.so.6 -- Additional comment from rjones on 2007-05-15 13:49 EST -- I compiled qemu-dm with -O0 -g and generated another core dump: http://annexia.org/tmp/qemu-dm.bz2 http://annexia.org/tmp/core.qemu-dm.10152.1179249168.bz2 -- Additional comment from rjones on 2007-05-15 15:05 EST -- Created an attachment (id=154763) Patch to pass structure instead of pointers to the IDE DMA thread. This patch is currently looking solid. The FreeBSD install has got much further than before. If it stays up overnight I'll feed it upstream. -- Additional comment from rjones on 2007-05-15 17:28 EST -- FreeBSD install finished successfully for the first time under load. Patch sent upstream. -- Additional comment from rjones on 2007-05-16 11:34 EST -- Created an attachment (id=154836) Screenshot of FreeBSD install failing. Unfortunately this patch hasn't corrected the problem. I'm still seeing FreeBSD failing during the install at the same place as before, although with a different error. This time qemu-dm isn't segfaulting, but FreeBSD itself is giving an error as shown in the screenshot. The error is: anic: initiate_write_inodeblock_ufs2: already started -- Additional comment from clalance on 2008-01-30 12:58 EST -- FYI regarding this bug: There was a recent exchange with someone complaining about IDE multi-threading problems. Keir has checked in a patch to 3.2/3.1 that fixes that particular problem; it may also be relevant here: http://lists.xensource.com/archives/html/xen-devel/2008-01/msg01151.html Chris Lalancette ---- This happens on xen 3.0.3-41 as well
Created attachment 294119 [details] core from segfault
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Comment #10 in bug 240009 refers to this mailing list thread http://lists.xensource.com/archives/html/xen-devel/2008-01/msg01147.html About QEMU crashes under high load. THis resulted in the following patch: http://xenbits.xen.org/xen-3.1-testing.hg?rev/df56245d48f5 which fixes one known race condition. I can't say for certain whether this bug reporter is hitting this particular race condition, but it certainly a likely candidate & an race which should be fixed. In absence of any other explanation for the crashes, I'd recommend we apply the upstream patch I reference above and see if QEMU remains crash-free under load.
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
This bug can probably be closed as a DUP of https://bugzilla.redhat.com/show_bug.cgi?id=250988
Yes, most likely. However, I would like to leave it open just until I get another report of whether this patch works for a customer. Assuming that is successful, we can then close this out as a dup. Chris Lalancette
In the past I've hit this bug quite frequently and I've been able to reproduce it simply by restarting multiple Windows DomU at the same time on a machine with many other DomU (24 of them). After installing xen-3.0.3-68.el5 I'm no longer able to reproduce the problem.
On the basis of the confirmation in comment #21, i'm closing this as a dup of bug 250988. If someone manages to get this problem to recur with xen >= 3.0.3-68.el5, then open a new bug with a reproducable test case *** This bug has been marked as a duplicate of bug 250988 ***