Bug 249292 - qemu-dm segfaulting for fully-vritualized guest
Summary: qemu-dm segfaulting for fully-vritualized guest
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen   
(Show other bugs)
Version: 5.0
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
: ---
Assignee: john cooper
QA Contact: Gurhan Ozen
URL:
Whiteboard:
Keywords:
Depends On: 250988
Blocks: 448899
TreeView+ depends on / blocked
 
Reported: 2007-07-23 15:27 UTC by The Written Word
Modified: 2018-10-20 00:35 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 21:15:29 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Coredump from RHEL3 FV guest (94.60 KB, application/octet-stream)
2007-07-23 18:36 UTC, The Written Word
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0118 normal SHIPPED_LIVE xen bug fix and enhancement update 2009-01-20 16:04:49 UTC

Description The Written Word 2007-07-23 15:27:37 UTC
Description of problem:
We have a Sun X4100M2 with 16G running RHEL5/Xen and xen-3.0.3-25.el5. We have 6
guests running on this system, with 3 of them fully-virtualized guests, and 3
para-virtualized guests. One of the fully-virtualized guests (RHEL3/x86 with
latest updates) seems to die. /var/log/messages has:
  Jul 23 14:25:39 nami kernel: qemu-dm[11628]: segfault at 0000000000000000 rip
0000000000000000 rsp 0000000040a000d8 error 14

Where can I find a core file to mail to you to determine the root cause of this
problem. It is very very annoying!

Version-Release number of selected component (if applicable):
  # uname -a
Linux nami.il.thewrittenword.com 2.6.18-8.1.8.el5xen #1 SMP Mon Jun 25 17:19:38
EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

  # rpm -qa | grep xen
xen-3.0.3-25.el5
kernel-xen-2.6.18-8.1.8.el5
kernel-xen-2.6.18-8.1.6.el5
xen-libs-3.0.3-25.el5
xen-libs-3.0.3-25.el5

How reproducible:
  Cannot reproduce on-demand.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Richard W.M. Jones 2007-07-23 15:51:35 UTC
Can you take a look at bug 240009 and see if it looks similar to you?

As for capturing coredumps, please see the instructions here:
http://et.redhat.com/~rjones/xen-stress-tests/

Comment 2 The Written Word 2007-07-23 16:06:08 UTC
I'll need to reboot the server tonight to enable coredumps.

Yes, this bug looks like a dup of bug 240009. I think one of the segfaults today
was when the guest was idle, but I'm not sure. We routinely have problems with
fully-virtualized guests under heavy IO load and slow response from the NFS
server where the guest disk images are stored.

Comment 3 Richard W.M. Jones 2007-07-23 16:14:27 UTC
I'm going to take this one and try it with RHEL3 FV guests on RHEL 5 tomorrow.

Comment 4 The Written Word 2007-07-23 18:35:33 UTC
(In reply to comment #2)
> I'll need to reboot the server tonight to enable coredumps.

Ok, we had some slight downtime so I enabled coredumps and ran some heavy I/O
operations on the RHEL3 FV guest. I'll attach a coredump.

Comment 5 The Written Word 2007-07-23 18:36:57 UTC
Created attachment 159801 [details]
Coredump from RHEL3 FV guest

Segfault error in /var/log/messages:
  qemu-dm[4452]: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000040a000d8 error 14

Comment 6 Daniel Berrange 2007-07-23 18:44:56 UTC
We have winner! Its the same IDE DMA thread issue as bug 240009:

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000000000429cfd in dma_thread_func (opaque=<value optimized out>) at
/usr/src/debug/xen-3.0.3_0-src/tools/ioemu/hw/ide.c:2238
#2  0x00000031000061b5 in __deallocate_stack (pd=0x60) at
../nptl/sysdeps/pthread/list.h:72
#3  0x0000000000000000 in ?? ()

So we should be able to apply same patch from that Fedora bug to RHEL-5.


Comment 7 The Written Word 2007-07-23 19:05:50 UTC
(In reply to comment #6)
> We have winner! Its the same IDE DMA thread issue as bug 240009:
> 
> (gdb) bt
> #0  0x0000000000000000 in ?? ()
> #1  0x0000000000429cfd in dma_thread_func (opaque=<value optimized out>) at
> /usr/src/debug/xen-3.0.3_0-src/tools/ioemu/hw/ide.c:2238
> #2  0x00000031000061b5 in __deallocate_stack (pd=0x60) at
> ../nptl/sysdeps/pthread/list.h:72
> #3  0x0000000000000000 in ?? ()
>
> So we should be able to apply same patch from that Fedora bug to RHEL-5.

Would you like us to apply patch #154763 from bug #240009 to
xen-3.0.3-25.0.3.el5.src.rpm and test?

Comment 8 The Written Word 2007-07-23 19:10:48 UTC
(In reply to comment #7)
> 
> Would you like us to apply patch #154763 from bug #240009 to
> xen-3.0.3-25.0.3.el5.src.rpm and test?

Well, the patch doesn't apply cleanly but if you have something you want us to
test, let me know.

Comment 10 Erwan Velu 2008-05-23 08:32:21 UTC
I'm experiencing that bug in with xen 3.0.3-41.el5_1.5, and regarding the
comments following the links pointed in comment #7, a patch solve this issue. 

Could we expect an update ? 

Comment 11 RHEL Product and Program Management 2008-06-02 20:35:20 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 Bill Burns 2008-06-13 12:14:19 UTC
Re: comment #10, this is being looked at.


Comment 13 Daniel Berrange 2008-07-22 10:25:34 UTC
The patch from bug 240009 is now not thought to have fixed the issue. Instead we
believe the problem is a race condition with this fix

http://xenbits.xensource.com/xen-3.1-testing.hg?rev/df56245d48f5

Addition of this patch is already scheduled for 5.3, via another bug. The patch
will be available in xen-3.0.3-68.el5 or later


Comment 18 errata-xmlrpc 2009-01-20 21:15:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0118.html


Note You need to log in before you can comment on or make changes to this bug.