Description of problem: While running a distributed test tool which utilizes SCTP for communication, two of my nodes paniced inside SCTP kernel code. tank-02: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024 BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024 printing eip: f8e12ffb *pde = 55add067 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq Modules linked in: sctp gfs(U) lock_dlm gfs2 dlm configfs autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video sbs backlight i2c_ec button battery asus_acpi ac lp intel_rng floppy e7xxx_edac e1000 edac_mc i2c_i801 ide_cd parport_pc pcspkr i2c_core parport cdrom sg dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<f8e12ffb>] Not tainted VLI EFLAGS: 00010286 (2.6.18-53.el5 #1) EIP is at sctp_get_port+0x12/0x44 [sctp] eax: 00000000 ebx: ef935b80 ecx: f8e22ae0 edx: ef935b80 esi: ef935b80 edi: 00000000 ebp: 00000010 esp: ee797ea0 ds: 007b es: 007b ss: 0068 Process d_doio (pid: 9256, ti=ee797000 task=e4db8aa0 task.ti=ee797000) Stack: c05a5a4a f133e500 00000040 f7bb7d80 ee797ebc c05a4ad2 ef935b80 ef935b80 ef935b80 ee797eec c05e6d1f f8e17bc0 c05e75f1 f8e17bc0 f5a8eb00 ee797eec 00000010 c05a46d8 00000002 2b810002 62590f0a 00000000 00000000 00000000 Call Trace: [<c05a5a4a>] lock_sock+0x8e/0x96 [<c05a4ad2>] sys_recvfrom+0x101/0x137 [<c05e6d1f>] inet_autobind+0x1c/0x51 [<c05e75f1>] inet_dgram_connect+0x30/0x4e [<c05a46d8>] sys_connect+0x7d/0xa9 [<c0492369>] inotify_d_instantiate+0x3c/0x5f [<c0483308>] d_rehash+0x1c/0x2b [<c05a3e90>] sock_attach_fd+0x6c/0xcc [<c06058c7>] _spin_lock_bh+0x8/0x18 [<c05a5a4a>] lock_sock+0x8e/0x96 [<c05a5a4a>] lock_sock+0x8e/0x96 [<c06058c7>] _spin_lock_bh+0x8/0x18 [<c05a4ec1>] sys_socketcall+0x8c/0x19e [<c0407ef7>] do_syscall_trace+0xab/0xb1 [<c0404eff>] syscall_call+0x7/0xb ======================= Code: ff ff bd f4 ff ff ff e9 5f fb ff ff 81 c4 24 02 00 00 89 e8 5b 5e 5f 5d c3 57 89 d7 56 53 89 c3 83 ec 1c 8b 80 f4 01 00 00 89 da <8b> 48 24 89 e0 ff 51 30 0f b7 d7 89 d0 c1 e8 08 c1 e2 08 09 c2 EIP: [<f8e12ffb>] sctp_get_port+0x12/0x44 [sctp] SS:ESP 0068:ee797ea0 <0>Kernel panic - not syncing: Fatal exception tank-04: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024 printing eip: f8e19ffb *pde = 506e5067 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq Modules linked in: sctp gfs(U) lock_dlm gfs2 dlm configfs autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video sbs backlight i2c_ec button battery asus_acpi ac lp sg parport_pc floppy ide_cd intel_rng parport i2c_i801 e7xxx_edac cdrom e1000 i2c_core edac_mc pcspkr dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 1 EIP: 0060:[<f8e19ffb>] Not tainted VLI EFLAGS: 00010286 (2.6.18-53.el5 #1) EIP is at sctp_get_port+0x12/0x44 [sctp] eax: 00000000 ebx: d5af1b00 ecx: f8e29ae0 edx: d5af1b00 esi: d5af1b00 edi: 00000000 ebp: 00000010 esp: e8ebcea0 ds: 007b es: 007b ss: 0068 Process d_doio (pid: 9120, ti=e8ebc000 task=f7fc6aa0 task.ti=e8ebc000) Stack: c05a5a4a f1094680 00000040 f74e7d80 e8ebcebc c05a4ad2 d5af1b00 d5af1b00 d5af1b00 e8ebceec c05e6d1f f8e1ebc0 c05e75f1 f8e1ebc0 f17fab00 e8ebceec 00000010 c05a46d8 00000002 2b810002 62590f0a 00000000 00000000 00000000 Call Trace: [<c05a5a4a>] lock_sock+0x8e/0x96 [<c05a4ad2>] sys_recvfrom+0x101/0x137 [<c05e6d1f>] inet_autobind+0x1c/0x51 [<c05e75f1>] inet_dgram_connect+0x30/0x4e [<c05a46d8>] sys_connect+0x7d/0xa9 [<c0492369>] inotify_d_instantiate+0x3c/0x5f [<c0483308>] d_rehash+0x1c/0x2b [<c05a3e90>] sock_attach_fd+0x6c/0xcc [<c06058c7>] _spin_lock_bh+0x8/0x18 [<c0604141>] schedule+0x90d/0x9ba [<c05a4ec1>] sys_socketcall+0x8c/0x19e [<c0407ef7>] do_syscall_trace+0xab/0xb1 [<c0404eff>] syscall_call+0x7/0xb ======================= Code: ff ff bd f4 ff ff ff e9 5f fb ff ff 81 c4 24 02 00 00 89 e8 5b 5e 5f 5d c3 57 89 d7 56 53 89 c3 83 ec 1c 8b 80 f4 01 00 00 89 da <8b> 48 24 89 e0 ff 51 30 0f b7 d7 89 d0 c1 e8 08 c1 e2 08 09 c2 EIP: [<f8e19ffb>] sctp_get_port+0x12/0x44 [sctp] SS:ESP 0068:e8ebcea0 <0>Kernel panic - not syncing: Fatal exception Version-Release number of selected component (if applicable): kernel-2.6.18-53.el5 How reproducible: Unknown Actual results: See panic messages above Expected results: Don't Panic. Additional info:
Vlad Yasevich suggested on lksctp-developers that this patch might fix the issue: http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff_plain;h=f26f7c480555812ca7c4037e0a50fa54afe2cb4a
Hello, this patch isn't for lksctp-tools but for a kernel. Furtermore, it seems that this bug isn't caused by lksctp-tools at all. It seems that it is an issue in the kernel. So, I'm reassigning this bug to the kernel.
I hand-patched the patch pointed to by comment #1 into the -77 RHEL5 kernel and I don't see the panic in the few iterations of tests that I've run. I used to see this panic within the first few seconds of starting the dd_io test. I still see this message on the console though: Module sctp cannot be unloaded due to unsafe usage in net/sctp/protocol.c:1218
This patch needs to be in to be able to run distributed IO tests on cluster fs (GFS and GFS2). Without this fix QE will be blocked as desribed in comment #4.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in 2.6.18-79.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
I still see this panic running the dd_io test on the -79 kernel. I was wrong in comment #4 to say that the patch worked for me. The kernel I built by hand seems to not hit the panic with or without this patch and I attributed it to the patch working. Sorry :-(
did it work for you on -78.el5?
I just tried the -78 kernel rpm and it fails as well. I see this panic only on the i686 nodes in my cluster and not on x86_64.
Ok, I've been looking at this, and I've concluded that if sctp_sk(sk)->pf is NULL, its most likely due to one of 2 cases: 1) someone, when calling sctp_init_sock passed in an invalid address family, which resulted in ->pf for the socket getting set to NULL, which in turn resulted in the bad dereference in sctp_get_port. 2) someone is overwriting data in a neigboring struct in the sctp_sock structure, and is overrunning its bounadary. As a result, the ->pf member of the sctp_sock struct is getting overwritten with a bogus value, in this case, zero. I think both are possible, but (1) is the more likely candidate, as we never check for a NULL return code and only assign ->pf in one place during socket initalization. I've put together a debug kernel in brew, available under: .../brewroot/packages/kernel/2.6.18/79.el5.bz426234.debug1/ It contains extra code to bug halt in the event that someone passes a bad address family value in during socket initalization, and it moves the pf member of the socket structure to the bottom of the struct (which is an abi breaker, but will raise our chances of protection in the event that this is a memory corruptor from a neighboring struct, since this is a debug kernel). None of this will fix the problem, but it should give us the info we need to track down the culprit here. If you could please run this kernel and attempt to recreate the problem, I would appreciate it. Also, it would be a big help to me if you could setup kdump on these cluster nodes so as to catch a vmcore for me to analyze if the problem recurrs. Note that if the problem does not re-occur, it likely points to a corruption of a neighboring data member, and you may well see a completely different oops as a result (which is also good information to have). Thanks!
This is the console output with the 79.el5.bz426234.debug1 kernel. I haven't had the chance to set up kdump. Hope this helps. Red Hat Enterprise Linux Server release 5.1 (Tikanga) Kernel 2.6.18-79.el5.bz426234.debug1 on an i686 merit login: ------------[ cut here ]------------ kernel BUG at net/sctp/socket.c:3128! invalid opcode: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:1e.0/0000:01:00.0/host0/rport-0:0 -2/target0:0:0/0:0:0:1/vendor Modules linked in: sctp autofs4 hidp rfcomm l2cap bluetooth lock_dlm gfs2(U) d lm configfs sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath video sbs backlight i2c_ec button battery asus_acpi ac lp ide_cd floppy cdrom sg parport_pc i2c_i 801 parport i2c_i810 i2c_algo_bit i2c_core pcspkr e1000 dm_snapshot dm_zero dm _mirror dm_mod lpfc scsi_transport_fc ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<e04abd99>] Tainted: G VLI EFLAGS: 00010246 (2.6.18-79.el5.bz426234.debug1 #1) EIP is at sctp_init_sock+0x173/0x1d2 [sctp] eax: 00000000 ebx: cdaa2b00 ecx: 00000000 edx: e04a0001 esi: 00000000 edi: cdaa2de0 ebp: 00000084 esp: cdacef24 ds: 007b es: 007b ss: 0068 Process d_doio (pid: 2841, ti=cdace000 task=cd96f000 task.ti=cdace000) Stack: cdaa2b00 00000000 c05e9fc0 cdaa3e00 00013e28 c07d37c8 00000000 ffffff9f 00000002 cdaa3e00 00000001 c05a638a 00000084 00000003 0041eca0 00000000 cdace000 c05a6483 cdacef7c 00000000 c05a66b1 cdacef7c cdace000 00000003 Call Trace: [<c05e9fc0>] inet_create+0x21a/0x231 [<c05a638a>] __sock_create+0x133/0x213 [<c05a6483>] sock_create+0xb/0xe [<c05a66b1>] sys_socket+0x18/0x38 [<c05a70aa>] sys_socketcall+0x6c/0x19e [<c0407efa>] do_syscall_trace+0xab/0xb1 [<c0404eff>] syscall_call+0x7/0xb ======================= Code: f8 02 00 00 00 00 00 00 c7 83 f4 02 00 00 00 00 00 00 c7 83 00 03 00 00 00 00 00 00 e8 80 5e ff ff 85 c0 89 83 04 03 00 00 75 08 <0f> 0b 38 0c e0 30 4 b e0 8d 83 08 03 00 00 ba d0 00 00 00 c6 83 EIP: [<e04abd99>] sctp_init_sock+0x173/0x1d2 [sctp] SS:ESP 0068:cdacef24 <0>Kernel panic - not syncing: Fatal exception
Ok, so that tells me alot, thanks. Can you point me to the source of d_doio? It would appear that this program is passing down domain parameter that is something other than PF_INET or PF_INET6. We probably need to : 1) correct d_doio to not do that 2) harden sctp to not create sockets for address families that it doesn't support
Here's a link to the latest version: http://sts.lab.msp.redhat.com/svn/viewvc.py/sistina-test/devel/rhel5/src/d_doio/d_doio.c?view=co As I look at the socket call in init_connection, line 2138, I get this sinking feeling. if ((sd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP)) < 0) { Where I think a typo caused this problem. s/AF/PF/.
No, after looking at the header files, AF_INET is defined at PF_INET.
Actually, looking at this further, everything should be fine in the kernel. The call trace indicates we're initalizing through inet_create, whcih means our family must be AF_INET, so I'm not sure how we wen't from a family of AF_INET in sctp_init_sock, to a NULL return from get_pf_specific. I don't suppose you captured a vmocre of this failure, did you? If not, could you please using this same kernel? Thanks!
Thanks, I think I see whats going on here. We managed to hit the BUG_ON without the BUG first, because the correct address family is being passed in. inside sctp_get_pf_specific we're returning the right value (sctp_pf_inet_specific). Unfortunately this is a pointer which was initialized at module initalization and then subsequently trampled over. If you look at the vmcore , at the address where this pointer is stored, we can see that everything arround it has been NULLED: crash> rd 0xe05e2a00 256 e05e2a00: e05e2a00 e05e2a00 d3b95000 e05d30b9 .*^..*^..P...0]. e05e2a10: e05d89e0 000002ea e05db880 d343e000 ..].......]...C. e05e2a20: 00000000 d5c47f60 00000000 00000000 ....`........... e05e2a30: 00000000 00000000 00000000 00000000 ................ e05e2a40: 00000000 00000000 00000000 00000000 ................ e05e2a50: 00000000 00000000 00000000 00000000 ................ e05e2a60: 00000000 00000000 00000000 00000000 ................ e05e2a70: 00000000 00000000 00000000 00000000 ................ e05e2a80: 00000000 00000000 00000000 00000000 ................ e05e2a90: 00000000 00000000 00000000 00000000 ................ e05e2aa0: 00000000 00000000 00000000 00000000 ................ e05e2ab0: 00000000 00000000 00000000 00000000 ................ e05e2ac0: 00000000 00000000 00000000 00000000 ................ e05e2ad0: 00000000 00000000 00000000 00000000 ................ e05e2ae0: 00000000 00000000 00000000 00000000 ................ e05e2af0: 00000000 00000000 00000000 00000000 ................ e05e2b00: 00000000 00000000 00000000 00000000 ................ e05e2b10: 00000000 00000000 00000000 00000000 ................ e05e2b20: 00000000 00000000 00000000 00000000 ................ e05e2b30: 00000000 00000000 00000000 00000000 ................ e05e2b40: 00000000 00000000 00000000 00000000 ................ e05e2b50: 00000000 00000000 05033f88 45af4700 .........?...G.E e05e2b60: 52702fd7 087757b1 cb0211b6 3d9e00ba ./pR.Ww........= Note sctp_pf_inet_specific is located at e05e2b14 in this core. This region also hold several other pointer that must have not been NULL previously, lest the kernel panic much sooner during this run. (the sctp_globals settings template is also stored here and is also all zeros) So I'm inclined to think we either have a memory corruptoror we have a race in the use othe module with its initalization. Given that dumpting the contents of __this_module for sctp yeilds this: crash> print __this_module $53 = { state = MODULE_STATE_COMING, list = { next = 0xe04a2d84, prev = 0xc06787b0 }, I think state shows that its becomming clear that we're trying to create an sctp socket prior to the complete initalization of the module. What we should probably do is initialize all the components first, and then, the last stage should be to register the protocol, making it visible to the systemonly once the rest of the module is in place
It may goes without saying, but I should perhaps also mention that I expect I'll have a potential fix to test out for this soon :)
Just as a note, after having figured out what this is, it appears its already fixed upstream. I'm backporting commit 827bf12236fbafc02bc899aec1b37c342c8cf4e5 to fix this. Brew build will be available for test later tonight. I'll post when its ready
new build available in brew under: .../brewroot/packages/kernel/2.6.18/79.el5.bz425234.debug2 It should fix this problem straight away. Please test & confirm. Thanks!
I hit this soft lockup with the above kernel the first time I tried it, but haven't seen it in 5 successive tries (I'm hitting a gfs2 bug instead). I'm gonna try this a few times with the gfs2 bug fixed (should have fix today) to see if I can hit this soft lockup again. BUG: soft lockup - CPU#0 stuck for 10s! [d_doio:4495] Pid: 4495, comm: d_doio EIP: 0060:[<c06093ab>] CPU: 0 EIP is at _spin_lock_irqsave+0x13/0x27 EFLAGS: 00000286 Tainted: G (2.6.18-79.el5.bz426234.debug2 #1) EAX: e05e3ae8 EBX: d9616900 ECX: 000000d0 EDX: 00000296 ESI: d8d6481c EDI: 00000000 EBP: 00000000 DS: 007b ES: 007b CR0: 8005003b CR2: b7feb000 CR3: 1195a000 CR4: 000006d0 [<e05c4ec5>] sctp_copy_local_addr_list+0x1a/0x9a [sctp] [<e05cc764>] sctp_bind_addr_copy+0x30/0x9f [sctp] [<e05c57e1>] sctp_assoc_set_bind_addr_from_ep+0x53/0x58 [sctp] [<e05ccfdc>] __sctp_connect+0x1ad/0x3b1 [sctp] [<c06093d7>] _spin_lock_bh+0x8/0x18 [<c05a7dce>] lock_sock+0x8e/0x96 [<e05d01ee>] sctp_get_port+0x35/0x44 [sctp] [<e05cd20a>] sctp_connect+0x2a/0x40 [sctp] [<c05ea1fa>] inet_dgram_connect+0x45/0x4e [<c05a68e8>] sys_connect+0x7d/0xa9 [<c049313d>] inotify_d_instantiate+0x3c/0x5f [<c0484040>] d_rehash+0x1c/0x2b [<c05a60a0>] sock_attach_fd+0x6c/0xcc [<c06093d7>] _spin_lock_bh+0x8/0x18 [<c05a7dce>] lock_sock+0x8e/0x96 [<c05a7dce>] lock_sock+0x8e/0x96 [<c06093d7>] _spin_lock_bh+0x8/0x18 [<c05a70ca>] sys_socketcall+0x8c/0x19e [<c0407efa>] do_syscall_trace+0xab/0xb1 [<c0404eff>] syscall_call+0x7/0xb ======================= BUG: soft lockup - CPU#0 stuck for 10s! [d_doio:4495] Pid: 4495, comm: d_doio EIP: 0060:[<c06093ab>] CPU: 0 EIP is at _spin_lock_irqsave+0x13/0x27 EFLAGS: 00000286 Tainted: G (2.6.18-79.el5.bz426234.debug2 #1) EAX: e05e3ae8 EBX: d9616900 ECX: 000000d0 EDX: 00000296 ESI: d8d6481c EDI: 00000000 EBP: 00000000 DS: 007b ES: 007b CR0: 8005003b CR2: b7feb000 CR3: 1195a000 CR4: 000006d0 [<e05c4ec5>] sctp_copy_local_addr_list+0x1a/0x9a [sctp] [<e05cc764>] sctp_bind_addr_copy+0x30/0x9f [sctp] [<e05c57e1>] sctp_assoc_set_bind_addr_from_ep+0x53/0x58 [sctp] [<e05ccfdc>] __sctp_connect+0x1ad/0x3b1 [sctp] [<c06093d7>] _spin_lock_bh+0x8/0x18 [<c05a7dce>] lock_sock+0x8e/0x96 [<e05d01ee>] sctp_get_port+0x35/0x44 [sctp] [<e05cd20a>] sctp_connect+0x2a/0x40 [sctp] [<c05ea1fa>] inet_dgram_connect+0x45/0x4e [<c05a68e8>] sys_connect+0x7d/0xa9 [<c049313d>] inotify_d_instantiate+0x3c/0x5f [<c0484040>] d_rehash+0x1c/0x2b [<c05a60a0>] sock_attach_fd+0x6c/0xcc [<c06093d7>] _spin_lock_bh+0x8/0x18 [<c05a7dce>] lock_sock+0x8e/0x96 [<c05a7dce>] lock_sock+0x8e/0x96 [<c06093d7>] _spin_lock_bh+0x8/0x18 [<c05a70ca>] sys_socketcall+0x8c/0x19e [<c0407efa>] do_syscall_trace+0xab/0xb1 [<c0404eff>] syscall_call+0x7/0xb =======================
Thats definately going to be a separate bug, and from taking a quick look, probably not serious. I think we may make a user space callout somewhere down that path, that under load, could result in us waiting awhile with a lock held. Either way, let me know if your oops is cleared up, and I'll get this posted. Thanks!
I haven't seen the panic with the 2.6.18-79.el5.bz426234.debug2 kernel and I believe it's fixed. I also haven't seen the soft-lockup in comment #24 since that first time.
This is already in the Beta kernel, but I have hit it once with the -81 kernel. I think Abhi has hit it with the above patch too.
The correct patch isn't in the -81 kernel yet.
in kernel-$NEW_VER You can download this test kernel from http://people.redhat.com/dzickus/el5
in kernel-2.6.18-85.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html