Description of problem:
We have a Sun X4100M2 with 16G running RHEL5/Xen and xen-3.0.3-25.el5. We have 6
guests running on this system, with 3 of them fully-virtualized guests, and 3
para-virtualized guests. One of the fully-virtualized guests (RHEL3/x86 with
latest updates) seems to die. /var/log/messages has:
Jul 23 14:25:39 nami kernel: qemu-dm: segfault at 0000000000000000 rip
0000000000000000 rsp 0000000040a000d8 error 14
Where can I find a core file to mail to you to determine the root cause of this
problem. It is very very annoying!
Version-Release number of selected component (if applicable):
# uname -a
Linux nami.il.thewrittenword.com 2.6.18-8.1.8.el5xen #1 SMP Mon Jun 25 17:19:38
EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
# rpm -qa | grep xen
Cannot reproduce on-demand.
Steps to Reproduce:
Can you take a look at bug 240009 and see if it looks similar to you?
As for capturing coredumps, please see the instructions here:
I'll need to reboot the server tonight to enable coredumps.
Yes, this bug looks like a dup of bug 240009. I think one of the segfaults today
was when the guest was idle, but I'm not sure. We routinely have problems with
fully-virtualized guests under heavy IO load and slow response from the NFS
server where the guest disk images are stored.
I'm going to take this one and try it with RHEL3 FV guests on RHEL 5 tomorrow.
(In reply to comment #2)
> I'll need to reboot the server tonight to enable coredumps.
Ok, we had some slight downtime so I enabled coredumps and ran some heavy I/O
operations on the RHEL3 FV guest. I'll attach a coredump.
Created attachment 159801 [details]
Coredump from RHEL3 FV guest
Segfault error in /var/log/messages:
qemu-dm: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000040a000d8 error 14
We have winner! Its the same IDE DMA thread issue as bug 240009:
#0 0x0000000000000000 in ?? ()
#1 0x0000000000429cfd in dma_thread_func (opaque=<value optimized out>) at
#2 0x00000031000061b5 in __deallocate_stack (pd=0x60) at
#3 0x0000000000000000 in ?? ()
So we should be able to apply same patch from that Fedora bug to RHEL-5.
(In reply to comment #6)
> We have winner! Its the same IDE DMA thread issue as bug 240009:
> (gdb) bt
> #0 0x0000000000000000 in ?? ()
> #1 0x0000000000429cfd in dma_thread_func (opaque=<value optimized out>) at
> #2 0x00000031000061b5 in __deallocate_stack (pd=0x60) at
> #3 0x0000000000000000 in ?? ()
> So we should be able to apply same patch from that Fedora bug to RHEL-5.
Would you like us to apply patch #154763 from bug #240009 to
xen-3.0.3-25.0.3.el5.src.rpm and test?
(In reply to comment #7)
> Would you like us to apply patch #154763 from bug #240009 to
> xen-3.0.3-25.0.3.el5.src.rpm and test?
Well, the patch doesn't apply cleanly but if you have something you want us to
test, let me know.
I'm experiencing that bug in with xen 3.0.3-41.el5_1.5, and regarding the
comments following the links pointed in comment #7, a patch solve this issue.
Could we expect an update ?
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Re: comment #10, this is being looked at.
The patch from bug 240009 is now not thought to have fixed the issue. Instead we
believe the problem is a race condition with this fix
Addition of this patch is already scheduled for 5.3, via another bug. The patch
will be available in xen-3.0.3-68.el5 or later
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.