Bug 446530 - qemu-dm[19348]: segfault at 0000000000000000 rip 0000000000000000
Summary: qemu-dm[19348]: segfault at 0000000000000000 rip 0000000000000000
Keywords:
Status: CLOSED DUPLICATE of bug 250988
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.1
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-05-14 22:59 UTC by Dinesh Surpur
Modified: 2011-06-08 11:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-12-17 13:44:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
XenSource 1038 0 None None None Never

Description Dinesh Surpur 2008-05-14 22:59:53 UTC
Description of problem:

segment fault by qemu-dm

"qemu-dm[19348]: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000041400c18 error 4"

Version-Release number of selected component (if applicable):

Redhat 5 Update 1 with Xen kernel (2.6.18-53.el5xen)


How reproducible:

I am running inbuilt Emulex lpfc driver and after prolonged runs of I/O i loose
my virtual hosts due to "qemu-dm[19348]: segfault at 0000000000000000".
Sometimes the segment fault happens when scsi layer gives 0x000d000 (Requeue
command) but not always. Here the output from the message log

Message log which shows segfault after receiving 0x000d0000

-------------------------
May 13 20:21:51 sqa-dl380g5-02 multipathd: 350002ac021300088: remaining active
paths: 5
May 13 23:20:27 sqa-dl380g5-02 kernel: lpfc 0000:12:02.0: 2:0336 Rsp Ring 0
error: IOCB Data: x40000024 xaa61c48 x0 x0 x16 x0 x1f80a1a x229b36
May 13 23:20:27 sqa-dl380g5-02 kernel: lpfc 0000:12:02.0: 2:0729 FCP cmd x28
failed <0/0> status: x3 result: x16 Data: x1f8 xa1a
May 13 23:20:27 sqa-dl380g5-02 kernel: lpfc 0000:12:02.0: 2:0710 Iodone <0/0>
cmd ffff8800509c4500, error xd0000 SNS x0 x0 Data: x0 x0
May 13 23:20:27 sqa-dl380g5-02 kernel: lpfc 0000:12:02.0: 2:0336 Rsp Ring 0
error: IOCB Data: x0 x0 x0 x1 x16 x50a1a x1f80895 x20f32
May 13 23:20:27 sqa-dl380g5-02 kernel: lpfc 0000:12:02.0: 2:(0):0749 SCSI Layer
I/O Abort Request Status x2002 ID 1 LUN 0 snum 0x43320c
May 13 23:20:27 sqa-dl380g5-02 kernel: sd 2:0:1:0: SCSI error: return code =
0x000d0000
May 13 23:20:27 sqa-dl380g5-02 kernel: end_request: I/O error, dev sdf, sector
66151511
May 13 23:20:27 sqa-dl380g5-02 kernel: device-mapper: multipath: Failing path 8:80.
May 13 23:20:27 sqa-dl380g5-02 multipathd: 8:80: mark as failed
May 13 23:20:27 sqa-dl380g5-02 multipathd: 350002ac021310088: remaining active
paths: 4
May 13 23:20:27 sqa-dl380g5-02 kernel: qemu-dm[19348]: segfault at
0000000000000000 rip 0000000000000000 rsp 0000000041400c18 error 4
May 13 23:20:28 sqa-dl380g5-02 kernel: xenbr0: port 6(tap1) entering disabled state
May 13 23:20:28 sqa-dl380g5-02 kernel: device tap1 left promiscuous mode
May 13 23:20:28 sqa-dl380g5-02 kernel: xenbr0: port 6(tap1) entering disabled state

--------------------------------


2. Message log  when multipath lost one path and get segfault.

May 12 15:58:25 sqa-dl380g5-02 multipathd: sdg: tur checker reports path is up
May 12 15:58:25 sqa-dl380g5-02 multipathd: 8:96: reinstated
May 12 15:58:25 sqa-dl380g5-02 multipathd: 350002ac0212d0088: remaining active
paths: 2
May 12 15:58:25 sqa-dl380g5-02 kernel: qemu-dm[8034]: segfault at
0000000000000000 rip 0000000000000000 rsp 0000000041400c18 error 4
May 12 15:58:25 sqa-dl380g5-02 avahi-daemon[5946]: Interface tap1.IPv6 no longer
relevant for mDNS.
May 12 15:58:26 sqa-dl380g5-02 kernel: xenbr0: port 5(tap1) entering disabled state
May 12 15:58:26 sqa-dl380g5-02 avahi-daemon[5946]: Leaving mDNS multicast group
on interface tap1.IPv6 with address fe80::88fd:3bff:feb8:8b4f.
May 12 15:58:26 sqa-dl380g5-02 kernel: device tap1 left promiscuous mode
May 12 15:58:26 sqa-dl380g5-02 multipathd: sdk: tur checker reports path is down
May 12 15:58:26 sqa-dl380g5-02 avahi-daemon[5946]: Withdrawing address record
for fe80::88fd:3bff:feb8:8b4f on tap1.
May 12 15:58:26 sqa-dl380g5-02 kernel: xenbr0: port 5(tap1) entering disabled state

Steps to Reproduce:
1. Just Run I/O for prolonged period of time
  

Additional info:
OS running is Rh5 U1 and xen packages present are 

# rpm -qa | grep -i xen
xen-libs-3.0.3-41.el5
kmod-gfs-xen-0.1.19-7.el5
kernel-xen-devel-2.6.18-53.el5
xen-libs-3.0.3-41.el5
kmod-gnbd-xen-0.1.4-12.el5
xen-3.0.3-41.el5
kernel-xen-2.6.18-53.el5
kmod-gfs2-xen-1.52-1.16.el5
kmod-gfs-xen-0.1.19-7.el5_1.3

My boot device is a HP Smary Array Driver and San volumes are from 3PAR storage.

# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      53770332   3495308  47499576   7% /
/dev/cciss/c0d0p1       988088     38136    898948   5% /boot

Comment 1 Erwan Velu 2008-05-23 08:23:06 UTC
I have the same kind of problem, under IO load qemu-dm segfault like

"qemu-dm[23002]: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000041400c18 error 14"

kernel : 2.6.18-53.1.19.el5xen
xen : 3.0.3-41.el5_1.5

This problems seems to exists since a while : 
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1038
http://lists.xensource.com/archives/html/xen-users/2007-08/msg00074.html


Comment 2 Kirby Zhou 2008-11-28 06:12:19 UTC
I have encounter the same probem with 2.6.18-92.1.13.el5xen.

Windows 2003 HVM domU.
RHEL 4.7 AS HVM domU

Is there any workaround?

Comment 3 Kirby Zhou 2008-11-29 14:09:40 UTC
Can this patch works?

http://lists.xensource.com/archives/html/xen-devel/2008-01/msg01151.html

Comment 4 Chris Lalancette 2008-12-01 09:03:15 UTC
(In reply to comment #3)
> Can this patch works?
> 
> http://lists.xensource.com/archives/html/xen-devel/2008-01/msg01151.html

We have this patch already in the RHEL 5.3 beta version of the xen package.  Once it's available, can you try that out?

Thanks,
Chris Lalancette

Comment 5 Chris Lalancette 2008-12-01 10:57:32 UTC
The patch is in the xen package, so you would need to upgrade to xen-3.0.3-76.el5 and xen-libs-3.0.3-76.el5

Chris Lalancette

Comment 6 Kirby Zhou 2008-12-17 13:26:35 UTC
After day and day testing, It actually works :-)

Comment 7 Chris Lalancette 2008-12-17 13:44:37 UTC
OK, great, thanks for the testing.  I'm going to mark this as a duplicate of the other bug for tracking purposes.

Chris Lalancette

*** This bug has been marked as a duplicate of bug 250988 ***

Comment 8 Farrukh Hamid 2011-06-08 11:46:06 UTC
Hi,

I have been getting the similar messages on my RHEL 5.3 (2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux),

tcpserver.exe[30715]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff8289b128 error 14
tcpserver.exe[22190]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff0db2d3b8 error 14
tcpserver.exe[4560]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffb8bba448 error 14
tcpserver.exe[19546]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffbf7789b8 error 14
tcpserver.exe[3695]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff106f3f78 error 14
tcpserver.exe[12536]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffe08520d8 error 14
tcpserver.exe[19990]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff7991c1a8 error 14
tcpserver.exe[31457]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff3c6aef38 error 14
tcpserver.exe[23156]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff51363be8 error 14
tcpserver.exe[22250]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffae76dff8 error 14
tcpserver.exe[12762]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff2dd2b5b8 error 14
tcpserver.exe[4991]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffe9c3e4c8 error 14
tcpserver.exe[27543]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff03e386b8 error 14
tcpserver.exe[8936]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffe02adb38 error 14
tcpserver.exe[864]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffdaee9778 error 14
tcpserver.exe[9462]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff31780008 error 14
tcpserver.exe[15510]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffce3c5c48 error 14
tcpserver.exe[1135]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff53642ec8 error 14
tcpserver.exe[18934]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffab300668 error 14
tcpserver.exe[16453]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff8e9c3248 error 14
tcpserver.exe[31498]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff3d75efe8 error 14
tcpserver.exe[18160]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff93115998 error 14
tcpserver.exe[21848]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff5b03d8c8 error 14
tcpserver.exe[17056]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffbe226ab8 error 14
tcpserver.exe[10398]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffa026eaf8 error 14
tcpserver.exe[29820]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff3dd93618 error 14
tcpserver.exe[18944]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffd832dbb8 error 14
tcpserver.exe[17595]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff8d4a5d28 error 14
tcpserver.exe[11006]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff48392c18 error 14
tcpserver.exe[13669]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff64105998 error 14
tcpserver.exe[11389]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff9d2b1b38 error 14
tcpserver.exe[12248]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007ffff76e7f68 error 14
tcpserver.exe[13351] general protection rip:460fba rsp:7fffa69b1240 error:0
tcpserver.exe[11275]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffb95eae78 error 14
tcpserver.exe[10396]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffba907188 error 14
tcpserver.exe[11057]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffbc5bde48 error 14
tcpserver.exe[11196]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff52c9e528 error 14
tcpserver.exe[30716]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff5148ed18 error 14
tcpserver.exe[1949]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffba0c9958 error 14
tcpserver.exe[2923]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff80d675f8 error 14
tcpserver.exe[32032]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff45a15298 error 14
tcpserver.exe[14929]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff9ac634e8 error 14
tcpserver.exe[21093]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffc74b1f08 error 14
tcpserver.exe[19444]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffed46ccf8 error 14
tcpserver.exe[23396]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffa78e9338 error 14
tcpserver.exe[22388]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff4417f9b8 error 14
tcpserver.exe[24477]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff961259a8 error 14
tcpserver.exe[21453]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff94cbd548 error 14
tcpserver.exe[31983]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffc1eac738 error 14
tcpserver.exe[19998]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff295eae78 error 14
tcpserver.exe[22752]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff5e0328b8 error 14
tcpserver.exe[2817]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fff56e98728 error 14

Please advise.

Farrukh
[root@atmphx log]#


Note You need to log in before you can comment on or make changes to this bug.