Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 249292

Summary:

qemu-dm segfaulting for fully-vritualized guest

Product:

Red Hat Enterprise Linux 5

Reporter:

The Written Word <bugzilla>

Component:

xen

Assignee:

john cooper <john.cooper>

Status:

CLOSED ERRATA

QA Contact:

Gurhan Ozen <gozen>

Severity:

medium

Docs Contact:

Priority:

low

Version:

5.0

CC:

gozen, jburke, nobody, sputhenp, tao, xen-maint

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-01-20 21:15:29 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

250988

Bug Blocks:

448899

Attachments:

Description	Flags
Coredump from RHEL3 FV guest	none

Description The Written Word 2007-07-23 15:27:37 UTC

Description of problem:
We have a Sun X4100M2 with 16G running RHEL5/Xen and xen-3.0.3-25.el5. We have 6
guests running on this system, with 3 of them fully-virtualized guests, and 3
para-virtualized guests. One of the fully-virtualized guests (RHEL3/x86 with
latest updates) seems to die. /var/log/messages has:
  Jul 23 14:25:39 nami kernel: qemu-dm[11628]: segfault at 0000000000000000 rip
0000000000000000 rsp 0000000040a000d8 error 14

Where can I find a core file to mail to you to determine the root cause of this
problem. It is very very annoying!

Version-Release number of selected component (if applicable):
  # uname -a
Linux nami.il.thewrittenword.com 2.6.18-8.1.8.el5xen #1 SMP Mon Jun 25 17:19:38
EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

  # rpm -qa | grep xen
xen-3.0.3-25.el5
kernel-xen-2.6.18-8.1.8.el5
kernel-xen-2.6.18-8.1.6.el5
xen-libs-3.0.3-25.el5
xen-libs-3.0.3-25.el5

How reproducible:
  Cannot reproduce on-demand.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Richard W.M. Jones 2007-07-23 15:51:35 UTC

Can you take a look at bug 240009 and see if it looks similar to you?

As for capturing coredumps, please see the instructions here:
http://et.redhat.com/~rjones/xen-stress-tests/

Comment 2 The Written Word 2007-07-23 16:06:08 UTC

I'll need to reboot the server tonight to enable coredumps.

Yes, this bug looks like a dup of bug 240009. I think one of the segfaults today
was when the guest was idle, but I'm not sure. We routinely have problems with
fully-virtualized guests under heavy IO load and slow response from the NFS
server where the guest disk images are stored.

Comment 3 Richard W.M. Jones 2007-07-23 16:14:27 UTC

I'm going to take this one and try it with RHEL3 FV guests on RHEL 5 tomorrow.

Comment 4 The Written Word 2007-07-23 18:35:33 UTC

(In reply to comment #2)
> I'll need to reboot the server tonight to enable coredumps.

Ok, we had some slight downtime so I enabled coredumps and ran some heavy I/O
operations on the RHEL3 FV guest. I'll attach a coredump.

Comment 5 The Written Word 2007-07-23 18:36:57 UTC

Created attachment 159801 [details]
Coredump from RHEL3 FV guest

Segfault error in /var/log/messages:
  qemu-dm[4452]: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000040a000d8 error 14

Comment 6 Daniel Berrangé 2007-07-23 18:44:56 UTC

We have winner! Its the same IDE DMA thread issue as bug 240009:

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000000000429cfd in dma_thread_func (opaque=<value optimized out>) at
/usr/src/debug/xen-3.0.3_0-src/tools/ioemu/hw/ide.c:2238
#2  0x00000031000061b5 in __deallocate_stack (pd=0x60) at
../nptl/sysdeps/pthread/list.h:72
#3  0x0000000000000000 in ?? ()

So we should be able to apply same patch from that Fedora bug to RHEL-5.

Comment 7 The Written Word 2007-07-23 19:05:50 UTC

(In reply to comment #6)
> We have winner! Its the same IDE DMA thread issue as bug 240009:
> 
> (gdb) bt
> #0  0x0000000000000000 in ?? ()
> #1  0x0000000000429cfd in dma_thread_func (opaque=<value optimized out>) at
> /usr/src/debug/xen-3.0.3_0-src/tools/ioemu/hw/ide.c:2238
> #2  0x00000031000061b5 in __deallocate_stack (pd=0x60) at
> ../nptl/sysdeps/pthread/list.h:72
> #3  0x0000000000000000 in ?? ()
>
> So we should be able to apply same patch from that Fedora bug to RHEL-5.

Would you like us to apply patch #154763 from bug #240009 to
xen-3.0.3-25.0.3.el5.src.rpm and test?

Comment 8 The Written Word 2007-07-23 19:10:48 UTC

(In reply to comment #7)
> 
> Would you like us to apply patch #154763 from bug #240009 to
> xen-3.0.3-25.0.3.el5.src.rpm and test?

Well, the patch doesn't apply cleanly but if you have something you want us to
test, let me know.

Comment 10 Erwan Velu 2008-05-23 08:32:21 UTC

I'm experiencing that bug in with xen 3.0.3-41.el5_1.5, and regarding the
comments following the links pointed in comment #7, a patch solve this issue. 

Could we expect an update ?

Comment 11 RHEL Program Management 2008-06-02 20:35:20 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 Bill Burns 2008-06-13 12:14:19 UTC

Re: comment #10, this is being looked at.

Comment 13 Daniel Berrangé 2008-07-22 10:25:34 UTC

The patch from bug 240009 is now not thought to have fixed the issue. Instead we
believe the problem is a race condition with this fix

http://xenbits.xensource.com/xen-3.1-testing.hg?rev/df56245d48f5

Addition of this patch is already scheduled for 5.3, via another bug. The patch
will be available in xen-3.0.3-68.el5 or later

Comment 18 errata-xmlrpc 2009-01-20 21:15:29 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0118.html