426234 – NULL pointer deference in sctp_get_port

Bug 426234 - NULL pointer deference in sctp_get_port

Summary: NULL pointer deference in sctp_get_port

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.1
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Neil Horman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-12-19 14:13 UTC by Nate Straz
Modified:	2008-05-21 15:04 UTC (History)
CC List:	5 users (show)
Fixed In Version:	RHBA-2008-0314
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-05-21 15:04:46 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0314	0	normal	SHIPPED_LIVE	Updated kernel packages for Red Hat Enterprise Linux 5.2	2008-05-20 18:43:34 UTC

Description Nate Straz 2007-12-19 14:13:45 UTC

Description of problem:

While running a distributed test tool which utilizes SCTP for communication,
two of my nodes paniced inside SCTP kernel code.

tank-02:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
 printing eip:
f8e12ffb
*pde = 55add067
Oops: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Modules linked in: sctp gfs(U) lock_dlm gfs2 dlm configfs autofs4 hidp rfcomm
l2cap bluetooth sunrpc ipv6 dm_multipath video sbs backlight i2c_ec button
battery asus_acpi ac lp intel_rng floppy e7xxx_edac e1000 edac_mc i2c_i801
ide_cd parport_pc pcspkr i2c_core parport cdrom sg dm_snapshot dm_zero dm_mirror
dm_mod qla2xxx scsi_transport_fc ata_piix libata sd_mod scsi_mod ext3 jbd
ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<f8e12ffb>]    Not tainted VLI
EFLAGS: 00010286   (2.6.18-53.el5 #1) 
EIP is at sctp_get_port+0x12/0x44 [sctp]
eax: 00000000   ebx: ef935b80   ecx: f8e22ae0   edx: ef935b80
esi: ef935b80   edi: 00000000   ebp: 00000010   esp: ee797ea0
ds: 007b   es: 007b   ss: 0068
Process d_doio (pid: 9256, ti=ee797000 task=e4db8aa0 task.ti=ee797000)
Stack: c05a5a4a f133e500 00000040 f7bb7d80 ee797ebc c05a4ad2 ef935b80 ef935b80 
       ef935b80 ee797eec c05e6d1f f8e17bc0 c05e75f1 f8e17bc0 f5a8eb00 ee797eec 
       00000010 c05a46d8 00000002 2b810002 62590f0a 00000000 00000000 00000000 
Call Trace:
 [<c05a5a4a>] lock_sock+0x8e/0x96
 [<c05a4ad2>] sys_recvfrom+0x101/0x137
 [<c05e6d1f>] inet_autobind+0x1c/0x51
 [<c05e75f1>] inet_dgram_connect+0x30/0x4e
 [<c05a46d8>] sys_connect+0x7d/0xa9
 [<c0492369>] inotify_d_instantiate+0x3c/0x5f
 [<c0483308>] d_rehash+0x1c/0x2b
 [<c05a3e90>] sock_attach_fd+0x6c/0xcc
 [<c06058c7>] _spin_lock_bh+0x8/0x18
 [<c05a5a4a>] lock_sock+0x8e/0x96
 [<c05a5a4a>] lock_sock+0x8e/0x96
 [<c06058c7>] _spin_lock_bh+0x8/0x18
 [<c05a4ec1>] sys_socketcall+0x8c/0x19e
 [<c0407ef7>] do_syscall_trace+0xab/0xb1
 [<c0404eff>] syscall_call+0x7/0xb
 =======================
Code: ff ff bd f4 ff ff ff e9 5f fb ff ff 81 c4 24 02 00 00 89 e8 5b 5e 5f 5d c3
57 89 d7 56 53 89 c3 83 ec 1c 8b 80 f4 01 00 00 89 da <8b> 48 24 89 e0 ff 51 30
0f b7 d7 89 d0 c1 e8 08 c1 e2 08 09 c2 
EIP: [<f8e12ffb>] sctp_get_port+0x12/0x44 [sctp] SS:ESP 0068:ee797ea0
 <0>Kernel panic - not syncing: Fatal exception

tank-04:
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
 printing eip:
f8e19ffb
*pde = 506e5067
Oops: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Modules linked in: sctp gfs(U) lock_dlm gfs2 dlm configfs autofs4 hidp rfcomm
l2cap bluetooth sunrpc ipv6 dm_multipath video sbs backlight i2c_ec button
battery asus_acpi ac lp sg parport_pc floppy ide_cd intel_rng parport i2c_i801
e7xxx_edac cdrom e1000 i2c_core edac_mc pcspkr dm_snapshot dm_zero dm_mirror
dm_mod qla2xxx scsi_transport_fc ata_piix libata sd_mod scsi_mod ext3 jbd
ehci_hcd ohci_hcd uhci_hcd
CPU:    1
EIP:    0060:[<f8e19ffb>]    Not tainted VLI
EFLAGS: 00010286   (2.6.18-53.el5 #1) 
EIP is at sctp_get_port+0x12/0x44 [sctp]
eax: 00000000   ebx: d5af1b00   ecx: f8e29ae0   edx: d5af1b00
esi: d5af1b00   edi: 00000000   ebp: 00000010   esp: e8ebcea0
ds: 007b   es: 007b   ss: 0068
Process d_doio (pid: 9120, ti=e8ebc000 task=f7fc6aa0 task.ti=e8ebc000)
Stack: c05a5a4a f1094680 00000040 f74e7d80 e8ebcebc c05a4ad2 d5af1b00 d5af1b00 
       d5af1b00 e8ebceec c05e6d1f f8e1ebc0 c05e75f1 f8e1ebc0 f17fab00 e8ebceec 
       00000010 c05a46d8 00000002 2b810002 62590f0a 00000000 00000000 00000000 
Call Trace:
 [<c05a5a4a>] lock_sock+0x8e/0x96
 [<c05a4ad2>] sys_recvfrom+0x101/0x137
 [<c05e6d1f>] inet_autobind+0x1c/0x51
 [<c05e75f1>] inet_dgram_connect+0x30/0x4e
 [<c05a46d8>] sys_connect+0x7d/0xa9
 [<c0492369>] inotify_d_instantiate+0x3c/0x5f
 [<c0483308>] d_rehash+0x1c/0x2b
 [<c05a3e90>] sock_attach_fd+0x6c/0xcc
 [<c06058c7>] _spin_lock_bh+0x8/0x18
 [<c0604141>] schedule+0x90d/0x9ba
 [<c05a4ec1>] sys_socketcall+0x8c/0x19e
 [<c0407ef7>] do_syscall_trace+0xab/0xb1
 [<c0404eff>] syscall_call+0x7/0xb
 =======================
Code: ff ff bd f4 ff ff ff e9 5f fb ff ff 81 c4 24 02 00 00 89 e8 5b 5e 5f 5d c3
57 89 d7 56 53 89 c3 83 ec 1c 8b 80 f4 01 00 00 89 da <8b> 48 24 89 e0 ff 51 30
0f b7 d7 89 d0 c1 e8 08 c1 e2 08 09 c2 
EIP: [<f8e19ffb>] sctp_get_port+0x12/0x44 [sctp] SS:ESP 0068:e8ebcea0
 <0>Kernel panic - not syncing: Fatal exception


Version-Release number of selected component (if applicable):
kernel-2.6.18-53.el5

How reproducible:
Unknown
  
Actual results:
See panic messages above

Expected results:
Don't Panic.

Additional info:

Comment 1 Nate Straz 2007-12-19 17:54:22 UTC

Vlad Yasevich suggested on lksctp-developers that this patch might fix the issue:

http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff_plain;h=f26f7c480555812ca7c4037e0a50fa54afe2cb4a

Comment 3 Zdenek Prikryl 2008-01-24 13:12:54 UTC

Hello,
this patch isn't for lksctp-tools but for a kernel. Furtermore, it seems that
this bug isn't caused by lksctp-tools at all. It seems that it is an issue in
the kernel.

So, I'm reassigning this bug to the kernel.

Comment 4 Abhijith Das 2008-02-07 21:41:20 UTC

I hand-patched the patch pointed to by comment #1 into the -77 RHEL5 kernel and
I don't see the panic in the few iterations of tests that I've run. I used to
see this panic within the first few seconds of starting the dd_io test.

I still see this message on the console though: Module sctp cannot be unloaded
due to unsafe usage in net/sctp/protocol.c:1218

Comment 5 Dean Jansa 2008-02-07 21:59:11 UTC

This patch needs to be in to be able to run distributed IO tests on cluster fs
(GFS and GFS2).  Without this fix QE will be blocked as desribed in comment #4.

Comment 6 RHEL Program Management 2008-02-08 05:18:32 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Don Zickus 2008-02-08 20:30:25 UTC

in 2.6.18-79.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 9 Abhijith Das 2008-02-08 22:25:32 UTC

I still see this panic running the dd_io test on the -79 kernel. I was wrong in
comment #4 to say that the patch worked for me. The kernel I built by hand seems
to not hit the panic with or without this patch and I attributed it to the patch
working. Sorry :-(

Comment 10 Linda Wang 2008-02-08 22:55:01 UTC

did it work  for you on -78.el5?

Comment 11 Abhijith Das 2008-02-08 23:19:09 UTC

I just tried the -78 kernel rpm and it fails as well. I see this panic only on
the i686 nodes in my cluster and not on x86_64.

Comment 13 Neil Horman 2008-02-11 00:11:05 UTC

Ok, I've been looking at this, and I've concluded that if sctp_sk(sk)->pf is
NULL, its most likely due to one of 2 cases:

1) someone, when calling sctp_init_sock passed in an invalid address family,
which resulted in ->pf for the socket getting set to NULL, which in turn
resulted in the bad dereference in sctp_get_port.

2) someone is overwriting data in a neigboring struct in the sctp_sock
structure, and is overrunning its bounadary.  As a result, the  ->pf member of
the sctp_sock struct is getting overwritten with a bogus value, in this case, zero.

I think both are possible, but (1) is the more likely candidate, as we never
check for a NULL return code and only assign ->pf in one place during socket
initalization.

I've put together a debug kernel in brew, available under:
.../brewroot/packages/kernel/2.6.18/79.el5.bz426234.debug1/
It contains extra code to bug halt in the event that someone passes a bad
address family value in during socket initalization, and it moves the pf member
of the socket structure to the bottom of the struct (which is an abi breaker,
but will raise our chances of protection in the event that this is a memory
corruptor from a neighboring struct, since this is a debug kernel).  None of
this will fix the problem, but it should give us the info we need to track down
the culprit here.  If you could please run this kernel and attempt to recreate
the problem, I would appreciate it.  Also, it would be a big help to me if you
could setup kdump on these cluster nodes so as to catch a vmcore for me to
analyze if the problem recurrs.  Note that if the problem does not re-occur, it
likely points to a corruption of a neighboring data member, and you may well see
a completely different oops as a result (which is also good information to
have).  Thanks!

Comment 14 Abhijith Das 2008-02-11 04:40:28 UTC

This is the console output with the 79.el5.bz426234.debug1 kernel. I haven't had
the chance to set up kdump. Hope this helps.

Red Hat Enterprise Linux Server release 5.1 (Tikanga)
Kernel 2.6.18-79.el5.bz426234.debug1 on an i686

merit login: ------------[ cut here ]------------
kernel BUG at net/sctp/socket.c:3128!
invalid opcode: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:1e.0/0000:01:00.0/host0/rport-0:0
-2/target0:0:0/0:0:0:1/vendor
Modules linked in: sctp autofs4 hidp rfcomm l2cap bluetooth lock_dlm gfs2(U) d
lm configfs sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath video sbs backlight
 i2c_ec button battery asus_acpi ac lp ide_cd floppy cdrom sg parport_pc i2c_i
801 parport i2c_i810 i2c_algo_bit i2c_core pcspkr e1000 dm_snapshot dm_zero dm
_mirror dm_mod lpfc scsi_transport_fc ata_piix libata sd_mod scsi_mod ext3 jbd
 ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<e04abd99>]    Tainted: G      VLI
EFLAGS: 00010246   (2.6.18-79.el5.bz426234.debug1 #1) 
EIP is at sctp_init_sock+0x173/0x1d2 [sctp]
eax: 00000000   ebx: cdaa2b00   ecx: 00000000   edx: e04a0001
esi: 00000000   edi: cdaa2de0   ebp: 00000084   esp: cdacef24
ds: 007b   es: 007b   ss: 0068
Process d_doio (pid: 2841, ti=cdace000 task=cd96f000 task.ti=cdace000)
Stack: cdaa2b00 00000000 c05e9fc0 cdaa3e00 00013e28 c07d37c8 00000000 ffffff9f
 
       00000002 cdaa3e00 00000001 c05a638a 00000084 00000003 0041eca0 00000000
 
       cdace000 c05a6483 cdacef7c 00000000 c05a66b1 cdacef7c cdace000 00000003
 
Call Trace:
 [<c05e9fc0>] inet_create+0x21a/0x231
 [<c05a638a>] __sock_create+0x133/0x213
 [<c05a6483>] sock_create+0xb/0xe
 [<c05a66b1>] sys_socket+0x18/0x38
 [<c05a70aa>] sys_socketcall+0x6c/0x19e
 [<c0407efa>] do_syscall_trace+0xab/0xb1
 [<c0404eff>] syscall_call+0x7/0xb
 =======================
Code: f8 02 00 00 00 00 00 00 c7 83 f4 02 00 00 00 00 00 00 c7 83 00 03 00 00 
00 00 00 00 e8 80 5e ff ff 85 c0 89 83 04 03 00 00 75 08 <0f> 0b 38 0c e0 30 4
b e0 8d 83 08 03 00 00 ba d0 00 00 00 c6 83 
EIP: [<e04abd99>] sctp_init_sock+0x173/0x1d2 [sctp] SS:ESP 0068:cdacef24
 <0>Kernel panic - not syncing: Fatal exception

Comment 15 Neil Horman 2008-02-11 12:35:17 UTC

Ok, so that tells me alot, thanks.  Can you point me to the source of d_doio? 
It would appear that this program is passing down domain parameter that is
something other than PF_INET or PF_INET6.  We probably need to :
1) correct d_doio to not do that
2) harden sctp to not create sockets for address families that it doesn't support

Comment 16 Nate Straz 2008-02-11 14:00:24 UTC

Here's a link to the latest version:

http://sts.lab.msp.redhat.com/svn/viewvc.py/sistina-test/devel/rhel5/src/d_doio/d_doio.c?view=co

As I look at the socket call in init_connection, line 2138, I get this sinking
feeling.

    if ((sd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP)) < 0) {

Where I think a typo caused this problem.  s/AF/PF/.

Comment 17 Nate Straz 2008-02-11 14:03:07 UTC

No, after looking at the header files, AF_INET is defined at PF_INET.

Comment 18 Neil Horman 2008-02-11 14:37:14 UTC

Actually, looking at this further, everything should be fine in the kernel.  The
call trace indicates we're initalizing through inet_create, whcih means our
family must be AF_INET, so I'm not sure how we wen't from a family of AF_INET in
sctp_init_sock, to a NULL return from get_pf_specific.  I don't suppose you
captured a vmocre of this failure, did you?  If not, could you please using this
same kernel?  Thanks!

Comment 20 Neil Horman 2008-02-11 19:27:05 UTC

Thanks, I think I see whats going on here.  We managed to hit the BUG_ON without
the BUG first, because the correct address family  is being passed in.  inside
sctp_get_pf_specific we're returning the right value (sctp_pf_inet_specific). 
Unfortunately this is a pointer which was initialized at module initalization
and then subsequently trampled over.  If you look at the vmcore , at the address
where this pointer is stored, we can see that everything arround it has been NULLED:
crash> rd 0xe05e2a00 256
e05e2a00:  e05e2a00 e05e2a00 d3b95000 e05d30b9   .*^..*^..P...0].
e05e2a10:  e05d89e0 000002ea e05db880 d343e000   ..].......]...C.
e05e2a20:  00000000 d5c47f60 00000000 00000000   ....`...........
e05e2a30:  00000000 00000000 00000000 00000000   ................
e05e2a40:  00000000 00000000 00000000 00000000   ................
e05e2a50:  00000000 00000000 00000000 00000000   ................
e05e2a60:  00000000 00000000 00000000 00000000   ................
e05e2a70:  00000000 00000000 00000000 00000000   ................
e05e2a80:  00000000 00000000 00000000 00000000   ................
e05e2a90:  00000000 00000000 00000000 00000000   ................
e05e2aa0:  00000000 00000000 00000000 00000000   ................
e05e2ab0:  00000000 00000000 00000000 00000000   ................
e05e2ac0:  00000000 00000000 00000000 00000000   ................
e05e2ad0:  00000000 00000000 00000000 00000000   ................
e05e2ae0:  00000000 00000000 00000000 00000000   ................
e05e2af0:  00000000 00000000 00000000 00000000   ................
e05e2b00:  00000000 00000000 00000000 00000000   ................
e05e2b10:  00000000 00000000 00000000 00000000   ................
e05e2b20:  00000000 00000000 00000000 00000000   ................
e05e2b30:  00000000 00000000 00000000 00000000   ................
e05e2b40:  00000000 00000000 00000000 00000000   ................
e05e2b50:  00000000 00000000 05033f88 45af4700   .........?...G.E
e05e2b60:  52702fd7 087757b1 cb0211b6 3d9e00ba   ./pR.Ww........=


Note sctp_pf_inet_specific is located at e05e2b14 in this core.  This region
also hold several other pointer that must have not been NULL previously, lest
the kernel panic much sooner during this run. (the sctp_globals settings
template is also stored here and is also all zeros)

So I'm inclined to think we either have a memory corruptoror we have a race in
the use othe module with its initalization.  Given that dumpting the contents of
__this_module for sctp yeilds this:
crash> print __this_module
$53 = {
  state = MODULE_STATE_COMING, 
  list = {
    next = 0xe04a2d84, 
    prev = 0xc06787b0
  }, 

I think state shows that its becomming clear that we're trying to create an sctp
socket  prior to the complete initalization of the module.  What we should
probably do is initialize all the components first, and then, the last stage
should be to register the protocol, making it visible to the systemonly once the
rest of the module is in place

Comment 21 Neil Horman 2008-02-11 19:31:00 UTC

It may goes without saying, but I should perhaps also mention that I expect I'll
have a potential fix to test out for this soon :)

Comment 22 Neil Horman 2008-02-11 20:02:05 UTC

Just as a note, after having figured out what this is, it appears its already
fixed upstream.  I'm backporting commit 827bf12236fbafc02bc899aec1b37c342c8cf4e5
to fix this.  Brew build will be available for test later tonight.  I'll post
when its ready

Comment 23 Neil Horman 2008-02-12 01:00:58 UTC

new build available in brew under:
.../brewroot/packages/kernel/2.6.18/79.el5.bz425234.debug2

It should fix this problem straight away.

Please test & confirm.  Thanks!

Comment 24 Abhijith Das 2008-02-12 17:33:05 UTC

I hit this soft lockup with the above kernel the first time I tried it, but
haven't seen it in 5 successive tries (I'm hitting a gfs2 bug instead). I'm
gonna try this a few times with the gfs2 bug fixed (should have fix today) to
see if I can hit this soft lockup again.

BUG: soft lockup - CPU#0 stuck for 10s! [d_doio:4495]

Pid: 4495, comm:               d_doio
EIP: 0060:[<c06093ab>] CPU: 0
EIP is at _spin_lock_irqsave+0x13/0x27
 EFLAGS: 00000286    Tainted: G       (2.6.18-79.el5.bz426234.debug2 #1)
EAX: e05e3ae8 EBX: d9616900 ECX: 000000d0 EDX: 00000296
ESI: d8d6481c EDI: 00000000 EBP: 00000000 DS: 007b ES: 007b
CR0: 8005003b CR2: b7feb000 CR3: 1195a000 CR4: 000006d0
 [<e05c4ec5>] sctp_copy_local_addr_list+0x1a/0x9a [sctp]
 [<e05cc764>] sctp_bind_addr_copy+0x30/0x9f [sctp]
 [<e05c57e1>] sctp_assoc_set_bind_addr_from_ep+0x53/0x58 [sctp]
 [<e05ccfdc>] __sctp_connect+0x1ad/0x3b1 [sctp]
 [<c06093d7>] _spin_lock_bh+0x8/0x18
 [<c05a7dce>] lock_sock+0x8e/0x96
 [<e05d01ee>] sctp_get_port+0x35/0x44 [sctp]
 [<e05cd20a>] sctp_connect+0x2a/0x40 [sctp]
 [<c05ea1fa>] inet_dgram_connect+0x45/0x4e
 [<c05a68e8>] sys_connect+0x7d/0xa9
 [<c049313d>] inotify_d_instantiate+0x3c/0x5f
 [<c0484040>] d_rehash+0x1c/0x2b
 [<c05a60a0>] sock_attach_fd+0x6c/0xcc
 [<c06093d7>] _spin_lock_bh+0x8/0x18
 [<c05a7dce>] lock_sock+0x8e/0x96
 [<c05a7dce>] lock_sock+0x8e/0x96
 [<c06093d7>] _spin_lock_bh+0x8/0x18
 [<c05a70ca>] sys_socketcall+0x8c/0x19e
 [<c0407efa>] do_syscall_trace+0xab/0xb1
 [<c0404eff>] syscall_call+0x7/0xb
 =======================
BUG: soft lockup - CPU#0 stuck for 10s! [d_doio:4495]

Pid: 4495, comm:               d_doio
EIP: 0060:[<c06093ab>] CPU: 0
EIP is at _spin_lock_irqsave+0x13/0x27
 EFLAGS: 00000286    Tainted: G       (2.6.18-79.el5.bz426234.debug2 #1)
EAX: e05e3ae8 EBX: d9616900 ECX: 000000d0 EDX: 00000296
ESI: d8d6481c EDI: 00000000 EBP: 00000000 DS: 007b ES: 007b
CR0: 8005003b CR2: b7feb000 CR3: 1195a000 CR4: 000006d0
 [<e05c4ec5>] sctp_copy_local_addr_list+0x1a/0x9a [sctp]
 [<e05cc764>] sctp_bind_addr_copy+0x30/0x9f [sctp]
 [<e05c57e1>] sctp_assoc_set_bind_addr_from_ep+0x53/0x58 [sctp]
 [<e05ccfdc>] __sctp_connect+0x1ad/0x3b1 [sctp]
 [<c06093d7>] _spin_lock_bh+0x8/0x18
 [<c05a7dce>] lock_sock+0x8e/0x96
 [<e05d01ee>] sctp_get_port+0x35/0x44 [sctp]
 [<e05cd20a>] sctp_connect+0x2a/0x40 [sctp]
 [<c05ea1fa>] inet_dgram_connect+0x45/0x4e
 [<c05a68e8>] sys_connect+0x7d/0xa9
 [<c049313d>] inotify_d_instantiate+0x3c/0x5f
 [<c0484040>] d_rehash+0x1c/0x2b
 [<c05a60a0>] sock_attach_fd+0x6c/0xcc
 [<c06093d7>] _spin_lock_bh+0x8/0x18
 [<c05a7dce>] lock_sock+0x8e/0x96
 [<c05a7dce>] lock_sock+0x8e/0x96
 [<c06093d7>] _spin_lock_bh+0x8/0x18
 [<c05a70ca>] sys_socketcall+0x8c/0x19e
 [<c0407efa>] do_syscall_trace+0xab/0xb1
 [<c0404eff>] syscall_call+0x7/0xb
 =======================

Comment 25 Neil Horman 2008-02-12 19:12:53 UTC

Thats definately going to be a separate bug, and from taking a quick look,
probably not serious.  I think we may make a user space callout somewhere down
that path, that under load, could result in us waiting awhile with a lock held.
Either way, let me know if your oops is cleared up, and I'll get this posted. 
Thanks!

Comment 26 Abhijith Das 2008-02-13 19:19:43 UTC

I haven't seen the panic with the 2.6.18-79.el5.bz426234.debug2 kernel and I
believe it's fixed. I also haven't seen the soft-lockup in comment #24 since
that first time.

Comment 30 Nate Straz 2008-02-22 15:55:29 UTC

This is already in the Beta kernel, but I have hit it once with the -81 kernel.
 I think Abhi has hit it with the above patch too.

Comment 31 Neil Horman 2008-02-22 18:55:20 UTC

The correct patch isn't in the -81 kernel yet.

Comment 35 Don Zickus 2008-03-12 19:40:59 UTC

in kernel-$NEW_VER
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 36 Don Zickus 2008-03-12 19:59:31 UTC

in kernel-2.6.18-85.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 39 errata-xmlrpc 2008-05-21 15:04:46 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.