Description of problem: We have a Sun X4100M2 with 16G running RHEL5/Xen and xen-3.0.3-25.el5. We have 6 guests running on this system, with 3 of them fully-virtualized guests, and 3 para-virtualized guests. One of the fully-virtualized guests (RHEL3/x86 with latest updates) seems to die. /var/log/messages has: Jul 23 14:25:39 nami kernel: qemu-dm[11628]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000000040a000d8 error 14 Where can I find a core file to mail to you to determine the root cause of this problem. It is very very annoying! Version-Release number of selected component (if applicable): # uname -a Linux nami.il.thewrittenword.com 2.6.18-8.1.8.el5xen #1 SMP Mon Jun 25 17:19:38 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux # rpm -qa | grep xen xen-3.0.3-25.el5 kernel-xen-2.6.18-8.1.8.el5 kernel-xen-2.6.18-8.1.6.el5 xen-libs-3.0.3-25.el5 xen-libs-3.0.3-25.el5 How reproducible: Cannot reproduce on-demand. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Can you take a look at bug 240009 and see if it looks similar to you? As for capturing coredumps, please see the instructions here: http://et.redhat.com/~rjones/xen-stress-tests/
I'll need to reboot the server tonight to enable coredumps. Yes, this bug looks like a dup of bug 240009. I think one of the segfaults today was when the guest was idle, but I'm not sure. We routinely have problems with fully-virtualized guests under heavy IO load and slow response from the NFS server where the guest disk images are stored.
I'm going to take this one and try it with RHEL3 FV guests on RHEL 5 tomorrow.
(In reply to comment #2) > I'll need to reboot the server tonight to enable coredumps. Ok, we had some slight downtime so I enabled coredumps and ran some heavy I/O operations on the RHEL3 FV guest. I'll attach a coredump.
Created attachment 159801 [details] Coredump from RHEL3 FV guest Segfault error in /var/log/messages: qemu-dm[4452]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000000040a000d8 error 14
We have winner! Its the same IDE DMA thread issue as bug 240009: (gdb) bt #0 0x0000000000000000 in ?? () #1 0x0000000000429cfd in dma_thread_func (opaque=<value optimized out>) at /usr/src/debug/xen-3.0.3_0-src/tools/ioemu/hw/ide.c:2238 #2 0x00000031000061b5 in __deallocate_stack (pd=0x60) at ../nptl/sysdeps/pthread/list.h:72 #3 0x0000000000000000 in ?? () So we should be able to apply same patch from that Fedora bug to RHEL-5.
(In reply to comment #6) > We have winner! Its the same IDE DMA thread issue as bug 240009: > > (gdb) bt > #0 0x0000000000000000 in ?? () > #1 0x0000000000429cfd in dma_thread_func (opaque=<value optimized out>) at > /usr/src/debug/xen-3.0.3_0-src/tools/ioemu/hw/ide.c:2238 > #2 0x00000031000061b5 in __deallocate_stack (pd=0x60) at > ../nptl/sysdeps/pthread/list.h:72 > #3 0x0000000000000000 in ?? () > > So we should be able to apply same patch from that Fedora bug to RHEL-5. Would you like us to apply patch #154763 from bug #240009 to xen-3.0.3-25.0.3.el5.src.rpm and test?
(In reply to comment #7) > > Would you like us to apply patch #154763 from bug #240009 to > xen-3.0.3-25.0.3.el5.src.rpm and test? Well, the patch doesn't apply cleanly but if you have something you want us to test, let me know.
I'm experiencing that bug in with xen 3.0.3-41.el5_1.5, and regarding the comments following the links pointed in comment #7, a patch solve this issue. Could we expect an update ?
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Re: comment #10, this is being looked at.
The patch from bug 240009 is now not thought to have fixed the issue. Instead we believe the problem is a race condition with this fix http://xenbits.xensource.com/xen-3.1-testing.hg?rev/df56245d48f5 Addition of this patch is already scheduled for 5.3, via another bug. The patch will be available in xen-3.0.3-68.el5 or later
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0118.html