Bug 249292 - qemu-dm segfaulting for fully-vritualized guest
qemu-dm segfaulting for fully-vritualized guest
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
5.0
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: john cooper
Gurhan Ozen
:
Depends On: 250988
Blocks: 448899
  Show dependency treegraph
 
Reported: 2007-07-23 11:27 EDT by The Written Word
Modified: 2014-07-24 23:45 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 16:15:29 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Coredump from RHEL3 FV guest (94.60 KB, application/octet-stream)
2007-07-23 14:36 EDT, The Written Word
no flags Details

  None (edit)
Description The Written Word 2007-07-23 11:27:37 EDT
Description of problem:
We have a Sun X4100M2 with 16G running RHEL5/Xen and xen-3.0.3-25.el5. We have 6
guests running on this system, with 3 of them fully-virtualized guests, and 3
para-virtualized guests. One of the fully-virtualized guests (RHEL3/x86 with
latest updates) seems to die. /var/log/messages has:
  Jul 23 14:25:39 nami kernel: qemu-dm[11628]: segfault at 0000000000000000 rip
0000000000000000 rsp 0000000040a000d8 error 14

Where can I find a core file to mail to you to determine the root cause of this
problem. It is very very annoying!

Version-Release number of selected component (if applicable):
  # uname -a
Linux nami.il.thewrittenword.com 2.6.18-8.1.8.el5xen #1 SMP Mon Jun 25 17:19:38
EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

  # rpm -qa | grep xen
xen-3.0.3-25.el5
kernel-xen-2.6.18-8.1.8.el5
kernel-xen-2.6.18-8.1.6.el5
xen-libs-3.0.3-25.el5
xen-libs-3.0.3-25.el5

How reproducible:
  Cannot reproduce on-demand.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Richard W.M. Jones 2007-07-23 11:51:35 EDT
Can you take a look at bug 240009 and see if it looks similar to you?

As for capturing coredumps, please see the instructions here:
http://et.redhat.com/~rjones/xen-stress-tests/
Comment 2 The Written Word 2007-07-23 12:06:08 EDT
I'll need to reboot the server tonight to enable coredumps.

Yes, this bug looks like a dup of bug 240009. I think one of the segfaults today
was when the guest was idle, but I'm not sure. We routinely have problems with
fully-virtualized guests under heavy IO load and slow response from the NFS
server where the guest disk images are stored.
Comment 3 Richard W.M. Jones 2007-07-23 12:14:27 EDT
I'm going to take this one and try it with RHEL3 FV guests on RHEL 5 tomorrow.
Comment 4 The Written Word 2007-07-23 14:35:33 EDT
(In reply to comment #2)
> I'll need to reboot the server tonight to enable coredumps.

Ok, we had some slight downtime so I enabled coredumps and ran some heavy I/O
operations on the RHEL3 FV guest. I'll attach a coredump.
Comment 5 The Written Word 2007-07-23 14:36:57 EDT
Created attachment 159801 [details]
Coredump from RHEL3 FV guest

Segfault error in /var/log/messages:
  qemu-dm[4452]: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000040a000d8 error 14
Comment 6 Daniel Berrange 2007-07-23 14:44:56 EDT
We have winner! Its the same IDE DMA thread issue as bug 240009:

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000000000429cfd in dma_thread_func (opaque=<value optimized out>) at
/usr/src/debug/xen-3.0.3_0-src/tools/ioemu/hw/ide.c:2238
#2  0x00000031000061b5 in __deallocate_stack (pd=0x60) at
../nptl/sysdeps/pthread/list.h:72
#3  0x0000000000000000 in ?? ()

So we should be able to apply same patch from that Fedora bug to RHEL-5.
Comment 7 The Written Word 2007-07-23 15:05:50 EDT
(In reply to comment #6)
> We have winner! Its the same IDE DMA thread issue as bug 240009:
> 
> (gdb) bt
> #0  0x0000000000000000 in ?? ()
> #1  0x0000000000429cfd in dma_thread_func (opaque=<value optimized out>) at
> /usr/src/debug/xen-3.0.3_0-src/tools/ioemu/hw/ide.c:2238
> #2  0x00000031000061b5 in __deallocate_stack (pd=0x60) at
> ../nptl/sysdeps/pthread/list.h:72
> #3  0x0000000000000000 in ?? ()
>
> So we should be able to apply same patch from that Fedora bug to RHEL-5.

Would you like us to apply patch #154763 from bug #240009 to
xen-3.0.3-25.0.3.el5.src.rpm and test?
Comment 8 The Written Word 2007-07-23 15:10:48 EDT
(In reply to comment #7)
> 
> Would you like us to apply patch #154763 from bug #240009 to
> xen-3.0.3-25.0.3.el5.src.rpm and test?

Well, the patch doesn't apply cleanly but if you have something you want us to
test, let me know.
Comment 10 Erwan Velu 2008-05-23 04:32:21 EDT
I'm experiencing that bug in with xen 3.0.3-41.el5_1.5, and regarding the
comments following the links pointed in comment #7, a patch solve this issue. 

Could we expect an update ? 
Comment 11 RHEL Product and Program Management 2008-06-02 16:35:20 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 12 Bill Burns 2008-06-13 08:14:19 EDT
Re: comment #10, this is being looked at.
Comment 13 Daniel Berrange 2008-07-22 06:25:34 EDT
The patch from bug 240009 is now not thought to have fixed the issue. Instead we
believe the problem is a race condition with this fix

http://xenbits.xensource.com/xen-3.1-testing.hg?rev/df56245d48f5

Addition of this patch is already scheduled for 5.3, via another bug. The patch
will be available in xen-3.0.3-68.el5 or later
Comment 18 errata-xmlrpc 2009-01-20 16:15:29 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0118.html

Note You need to log in before you can comment on or make changes to this bug.