Bug 223818 - kernel panic in sctp module
kernel panic in sctp module
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.4
i386 Linux
high Severity high
: ---
: ---
Assigned To: Neil Horman
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-01-22 11:58 EST by Fred Guerber
Modified: 2013-01-18 05:31 EST (History)
3 users (show)

See Also:
Fixed In Version: RHSA-2007-0085
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-27 02:55:52 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
linux-2.6.9-net-sctp-icmp-cleanup.patch (4.72 KB, patch)
2007-02-02 09:55 EST, Chris Williams
no flags Details | Diff
rediffed patch (4.83 KB, patch)
2007-02-02 15:47 EST, Neil Horman
no flags Details | Diff

  None (edit)
Description Fred Guerber 2007-01-22 11:58:28 EST
Description of problem:
Kernel panic during sctp traffic, when the destination is unreachable.
The stack trace always shows (about 5 occurences, always same stack):

Module sctp cannot be unloaded due to unsafe usage in net/sctp/protocol.c:1171
Unable to handle kernel NULL pointer dereference at virtual address 00000018
 printing eip:
f8e9430b
*pde = 37de7001
Oops: 0002 [#1]
SMP 
Modules linked in: sctp 8021q sg cpqci(U) i2c_dev i2c_core md5 ipv6 oct9721(U)
ptiproxy(U) mps(U) tmc(U) joydev button battery ac ehci_hcd uhci_hcd e1000(U)
bnx2(U) dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod cciss sd_mod scsi_mod
CPU:    0
EIP:    0060:[<f8e9430b>]    Tainted: PF     VLI
EFLAGS: 00010202   (2.6.9-42.ELsmp) 
EIP is at sctp_err_lookup+0x12d/0x153 [sctp]
eax: f7d4c080   ebx: c03ceed8   ecx: 0000002d   edx: c382f9c0
esi: f382e000   edi: c03ceeb8   ebp: f4fb14c0   esp: c03ceeac
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c03ce000 task=c0322a80)
Stack: 00000000 f5830040 c3ba1200 0b5a0002 4fada8c1 f95b1010 c3ba0e34 00000000 
       00000000 bbada8c1 bbada8c1 0b590002 bbada8c1 00000000 00000000 00000000 
       00000000 00000000 00000001 00000001 f4fb14c0 f4fb14c0 f5830024 f8e943d0 
Call Trace:
 [<f8e943d0>] sctp_v4_err+0x5f/0x11f [sctp]
 [<c02b7ede>] icmp_unreach+0x242/0x263
 [<c02b8319>] icmp_rcv+0x141/0x18c
 [<c0297084>] ip_local_deliver+0xfe/0x1e8
 [<c0297664>] ip_rcv+0x366/0x417
 [<c0280b81>] netif_receive_skb+0x2ac/0x2ec
 [<c0280c3c>] process_backlog+0x7b/0xfd
 [<c0280d6c>] net_rx_action+0xae/0x160
 [<c01269b8>] __do_softirq+0x4c/0xb1
 [<c010819f>] do_softirq+0x4f/0x56
 =======================
 [<c011749e>] smp_apic_timer_interrupt+0x9a/0x9c
 [<c02d5142>] apic_timer_interrupt+0x1a/0x20
 [<c01040e8>] mwait_idle+0x33/0x42
 [<c01040a0>] cpu_idle+0x26/0x3b
 [<c0395786>] start_kernel+0x199/0x19d
Code: 44 24 64 89 30 8b 44 24 08 8b 54 24 68 89 02 89 d8 eb 36 8b 15 28 c5 46 c0
b8 00 f0 ff ff 21 e0 8b 40 10 f7 d2 8b 04 82 ff 40 08 <f0> ff 0d 18 00 00 00 0f
94 c0 84 c0 74 07 31 c0 e8 e7 62 3e c7 
 <0>Kernel panic - not syncing: Fatal exception in interrupt


Version-Release number of selected component (if applicable):
RHEL4 U4 - kernel 2.6.9-42.ELsmp
only sctp rpm present: lksctp-tools-1.0.2-6.4E.1

How reproducible:
Perform M3UA traffic (over SCTP) between two boxes.
Poweroff one box => the other one panics
[boxes = HP Proliant DL380G5 servers - 2 dual-core xeons - 4GB RAM]

Steps to Reproduce:
1. fresh reboot => no backlog 
2. start bidirectional M3UA traffic -- not necessarily high -- about 500msg/s
3. reset one of the machines (power off + on)
 
Actual results:
Kernel panic of the remaining machine in the sctp module
(on icmp reception)

Expected results:
Errors only

Additional info:
Reproduced several times, about 5-6 occurences, always with the same stack.
Different machines had the problem (all proliant DL servers): 
DL380G5--2xdual core Xeon and DL385G1--2xdual core Opteron
Comment 1 Francois-Xavier 'FiX' KOWALSKI 2007-01-25 04:50:10 EST
Eror condition seems to be close to the problem fixed by this patch
<http://www.linux.sgi.com/archives/netdev/2005-07/msg00142.html>. Is this patch
applied on RHEL4?
Comment 2 Francois-Xavier 'FiX' KOWALSKI 2007-01-25 05:59:27 EST
Looking at the kernel code (after an "rpmbuild -bp --target i686" on the
.src.rpm), it looks like the patch is not in RHEL4U4.  What about having it in
an U4 errata?

Example code:

input.c:
/* Common cleanup code for icmp/icmpv6 error handler. */
void sctp_err_finish(struct sock *sk, struct sctp_endpoint *ep,
		     struct sctp_association *asoc)
{
	sctp_bh_unlock_sock(sk);

patch on input.c:

 /* Common cleanup code for icmp/icmpv6 error handler. */
-void sctp_err_finish(struct sock *sk, struct sctp_endpoint *ep,
-                    struct sctp_association *asoc)
+void sctp_err_finish(struct sock *sk, struct sctp_association *asoc)
 {
Comment 3 Neil Horman 2007-01-29 14:44:34 EST
Have you already tested this patch to confirm that it fixes the problem?
Comment 4 Francois-Xavier 'FiX' KOWALSKI 2007-01-30 03:33:12 EST
No, as we do not know whether the lksctp source version in rhel4u4 matches the
one required by the patch.  This is just a guess from the problem described by
the Sridhar.  Additionally, my understanding is that replacing a rh-provided
module by another one breaks rh support, so we did not want to engage in this
way.  Please fix my understanding if I am wrong.
Comment 5 Marie-Antoinette de Bonis-Hamelin (so-called Marian) 2007-02-02 08:46:19 EST
New kernel built based on
- latest errata kernel as of 2007-01-25
- with on top, the manual patch from link 
http://www.linux.sgi.com/archives/netdev/2005-07/msg00142.html
3 files based on the patch above 
#   linux-2.6.9/include/net/sctp/sctp.h
#   linux-2.6.9/net/sctp/input.c
#   linux-2.6.9/net/sctp/ipv6.c
This new kernel has fixed the kernel panic in sctp module.
Comment 7 Chris Williams 2007-02-02 09:55:24 EST
Created attachment 147230 [details]
linux-2.6.9-net-sctp-icmp-cleanup.patch
Comment 8 Neil Horman 2007-02-02 15:47:58 EST
Created attachment 147252 [details]
rediffed patch

apparently that patch was taken against the latest errata kernel for RHEL4 and
it needed some fixup.  This rediffed patch applies cleanly
Comment 10 RHEL Product and Program Management 2007-02-06 10:44:25 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 11 Marie-Antoinette de Bonis-Hamelin (so-called Marian) 2007-02-15 06:14:27 EST
New kernel fix was tested successfully
- this time based on Rhel4 U4 kernel-2.6.9-42.EL
- with on top the sctp patch found here 
http://www.linux.sgi.com/archives/netdev/2005-07/msg00142.html
Comment 13 Ronald Pacheco 2007-02-19 08:48:13 EST
From the HP Open Call team via e-mail on 2/16/07:

here is my proposition to Red Hat for their support of our sctp patch.

I believe the cleanest way for the moment is to add our sctp patch
(attached) to the stock Rhel4 U4 kernel src rpm found here:

ftp://updates.redhat.com/enterprise/4AS/en/os/SRPMS/kernel-2.6.9-42.EL.src.rpm

This sctp patch is based on the one found here:

http://www.linux.sgi.com/archives/netdev/2005-07/msg00142.html

This patch is in the current main stream kernel tree and also delivered
in Rhel5 RC kernels.


We then rebuild re-versioned binary kernel rpms with our back-ported sctp
patch.

The advantages of rebuilding from a modified src rpm are as follows:

    *  We have uniquely identifiable kernel rpms tracked by the rpm packaging system
    *  We have clear separation from the stock Rhel4 U4 kernel rpms
    *  We build the sctp module in the Red Hat kernel tree with the correct
compile options
    *  We have a deliverable which a client can install and later remove without
hacking configs
    *  Red Hat can easily control the src rpm for quality and changelog 


The following is the diff of our changes in the rpm spec file which
controls the build of the src rpm.  This diff is WRT the
kernel-2.6.9-42.EL.src.rpm spec file.

[root@repoman LKSCTP2]# diff -u kernel-2.6.spec.old kernel-2.6.spec
--- kernel-2.6.spec.old 2007-02-09 11:10:35.000000000 +0100
+++ kernel-2.6.spec     2007-02-13 16:48:31.000000000 +0100
@@ -22,7 +22,7 @@
 # that the kernel isn't the stock distribution kernel, for example by
 # adding some text to the end of the version number.
 #
-%define release 42.EL
+%define release 42.HP_OpenCall_sctp_fix.EL
 %define sublevel 9
 %define kversion 2.6.%{sublevel}
 %define rpmversion 2.6.%{sublevel}
@@ -693,6 +693,8 @@
 Patch1333: linux-2.6.9-net-sctp-shutdown.patch
 Patch1334: linux-2.6.9-net-sctp-receive-buffer.patch
 Patch1335: linux-2.6.9-net-sctp.patch
+Patch1336: linux-2.6.9-net-sctp-icmp-cleanup.patch

 # NIC driver updates
 Patch1350: linux-2.6.9-net-b44-4g4g.patch
@@ -2343,6 +2345,8 @@
 %patch1334 -p1
 # various sctp fixes
 %patch1335 -p1
+# sctp ICMP cleanup patch
+%patch1336 -p1

 # NIC driver fixes.
 # Fix problems with b44 & 4g/4g
@@ -3648,6 +3652,12 @@
 %endif

 %changelog
+* Wed Feb 9 2007 Gareth Armstrong <gareth.armstrong@hp.com>
<mailto:gareth.armstrong@hp.com>  - 2.6.9-42.HP_OpenCall_sctp_fix.EL
+- Add lksctp icmp cleanup patch described here
+- https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=223818
+- and backported from here
+- http://www.linux.sgi.com/archives/netdev/2005-07/msg00142.html
+
 * Wed Jul 12 2006 Jason Baron <jbaron@redhat.com>
<mailto:jbaron@redhat.com>  [2.6.9-42]
 -s390: qeth IP address parsing fix (Jan Glauber/Pete Zaitcev) [195604]



 



Index: linux-2.6.9/net/sctp/input.c
===================================================================
--- linux-2.6.9/net/sctp/input.c
+++ linux-2.6.9/net/sctp/input.c	2007-02-01 10:48:38.000000000 +0100
@@ -310,7 +310,6 @@
 /* Common lookup code for icmp/icmpv6 error handler. */
 struct sock *sctp_err_lookup(int family, struct sk_buff *skb,
 			     struct sctphdr *sctphdr,
-			     struct sctp_endpoint **epp,
 			     struct sctp_association **app,
 			     struct sctp_transport **tpp)
 {
@@ -318,11 +317,10 @@
 	union sctp_addr daddr;
 	struct sctp_af *af;
 	struct sock *sk = NULL;
-	struct sctp_endpoint *ep = NULL;
 	struct sctp_association *asoc = NULL;
 	struct sctp_transport *transport = NULL;
 
-	*app = NULL; *epp = NULL; *tpp = NULL;
+	*app = NULL; *tpp = NULL;
 
 	af = sctp_get_af_specific(family);
 	if (unlikely(!af)) {
@@ -337,25 +335,15 @@
 	 * packet.
 	 */
 	asoc = __sctp_lookup_association(&saddr, &daddr, &transport);
-	if (!asoc) {
-		/* If there is no matching association, see if it matches any
-		 * endpoint. This may happen for an ICMP error generated in
-		 * response to an INIT_ACK.
-		 */
-		ep = __sctp_rcv_lookup_endpoint(&daddr);
-		if (!ep) {
-			return NULL;
-		}
-	}
+	if (!asoc)
+		return NULL;
 
-	if (asoc) {
-		if (ntohl(sctphdr->vtag) != asoc->c.peer_vtag) {
-			ICMP_INC_STATS_BH(ICMP_MIB_INERRORS);
-			goto out;
-		}
-		sk = asoc->base.sk;
-	} else
-		sk = ep->base.sk;
+	sk = asoc->base.sk;
+
+	if (ntohl(sctphdr->vtag) != asoc->c.peer_vtag) {
+		ICMP_INC_STATS_BH(ICMP_MIB_INERRORS);
+		goto out;
+	}
 
 	sctp_bh_lock_sock(sk);
 
@@ -365,7 +353,6 @@
 	if (sock_owned_by_user(sk))
 		NET_INC_STATS_BH(LINUX_MIB_LOCKDROPPEDICMPS);
 
-	*epp = ep;
 	*app = asoc;
 	*tpp = transport;
 	return sk;
@@ -374,21 +361,16 @@
 	sock_put(sk);
 	if (asoc)
 		sctp_association_put(asoc);
-	if (ep)
-		sctp_endpoint_put(ep);
 	return NULL;
 }
 
 /* Common cleanup code for icmp/icmpv6 error handler. */
-void sctp_err_finish(struct sock *sk, struct sctp_endpoint *ep,
-		     struct sctp_association *asoc)
+void sctp_err_finish(struct sock *sk, struct sctp_association *asoc)
 {
 	sctp_bh_unlock_sock(sk);
 	sock_put(sk);
 	if (asoc)
 		sctp_association_put(asoc);
-	if (ep)
-		sctp_endpoint_put(ep);
 }
 
 /*
@@ -413,7 +395,6 @@
 	int type = skb->h.icmph->type;
 	int code = skb->h.icmph->code;
 	struct sock *sk;
-	struct sctp_endpoint *ep;
 	struct sctp_association *asoc;
 	struct sctp_transport *transport;
 	struct inet_opt *inet;
@@ -430,7 +411,7 @@
 	savesctp  = skb->h.raw;
 	skb->nh.iph = iph;
 	skb->h.raw = (char *)sh;
-	sk = sctp_err_lookup(AF_INET, skb, sh, &ep, &asoc, &transport);
+	sk = sctp_err_lookup(AF_INET, skb, sh, &asoc, &transport);
 	/* Put back, the original pointers. */
 	skb->nh.raw = saveip;
 	skb->h.raw = savesctp;
@@ -480,7 +461,7 @@
 	}
 
 out_unlock:
-	sctp_err_finish(sk, ep, asoc);
+	sctp_err_finish(sk, asoc);
 }
 
 /*
Index: linux-2.6.9/net/sctp/ipv6.c
===================================================================
--- linux-2.6.9/net/sctp/ipv6.c
+++ linux-2.6.9/net/sctp/ipv6.c	2007-02-01 10:44:01.000000000 +0100
@@ -88,7 +88,6 @@
 	struct ipv6hdr *iph = (struct ipv6hdr *)skb->data;
 	struct sctphdr *sh = (struct sctphdr *)(skb->data + offset);
 	struct sock *sk;
-	struct sctp_endpoint *ep;
 	struct sctp_association *asoc;
 	struct sctp_transport *transport;
 	struct ipv6_pinfo *np;
@@ -102,7 +101,7 @@
 	savesctp  = skb->h.raw;
 	skb->nh.ipv6h = iph;
 	skb->h.raw = (char *)sh;
-	sk = sctp_err_lookup(AF_INET6, skb, sh, &ep, &asoc, &transport);
+	sk = sctp_err_lookup(AF_INET6, skb, sh, &asoc, &transport);
 	/* Put back, the original pointers. */
 	skb->nh.raw = saveip;
 	skb->h.raw = savesctp;
@@ -133,7 +132,7 @@
 	}
 
 out_unlock:
-	sctp_err_finish(sk, ep, asoc);
+	sctp_err_finish(sk, asoc);
 out:
 	if (likely(idev != NULL))
 		in6_dev_put(idev);
Index: linux-2.6.9/include/net/sctp/sctp.h
===================================================================
--- linux-2.6.9/include/net/sctp/sctp.h
+++ linux-2.6.9/include/net/sctp/sctp.h	2007-02-01 10:26:49.000000000 +0100
@@ -174,11 +174,9 @@
 	const union sctp_addr *,
 	struct sctp_transport **);
 struct sock *sctp_err_lookup(int family, struct sk_buff *,
-			     struct sctphdr *, struct sctp_endpoint **,
-			     struct sctp_association **,
+			     struct sctphdr *, struct sctp_association **,
 			     struct sctp_transport **);
-void sctp_err_finish(struct sock *, struct sctp_endpoint *,
-			    struct sctp_association *);
+void sctp_err_finish(struct sock *, struct sctp_association *);
 void sctp_icmp_frag_needed(struct sock *, struct sctp_association *,
 			   struct sctp_transport *t, __u32 pmtu);
 
Comment 14 Jason Baron 2007-02-19 17:52:36 EST
committed in stream E5 build 42.0.9 
Comment 15 Joshua Giles 2007-02-21 23:16:24 EST
I was able to reproduce a hang on one of the machines after some time....new
kernel did not demonstrate this behavior after many attempts.
Comment 18 Red Hat Bugzilla 2007-02-27 02:55:53 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2007-0085.html
Comment 19 Jason Baron 2007-02-27 15:28:42 EST
committed in stream U5 build 49. A test kernel with this patch is available from
http://people.redhat.com/~jbaron/rhel4/

Note You need to log in before you can comment on or make changes to this bug.