Bug 15178 - Panic in tcp stack
Panic in tcp stack
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
6.2
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Michael K. Johnson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2000-08-02 16:43 EDT by Brian Brock
Modified: 2008-05-01 11:37 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2000-08-22 12:59:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Brian Brock 2000-08-02 16:43:41 EDT
A customer is getting a kernel panic 2-3 times daily, under the following
circumstances:

about 300 TCP connections (100 megabit) using a custom proxy server.

panics always occur in TCP stack, sometimes in tcp_ack(), sometimes in
tcp_retransmit_collapse_try().

no dynamic allocation (apart from the stack itself) in the program.

enough free memory that swap isn't used.

Currently running 2.2.12-20, more unstable with more recent kernels... both
logs of the panic (some copied by hand) and an upgrade to 2.2.16-3 are
forthcoming.


Hardware:

compaq 1850 server
- RAID 1, NCR RAID controller
- 512 ECC ram
- TLAN NIC
- SMP (2 proc, not crash as much, once a week at most, closer to two weeks)
- an (apart from the number of CPUs) UP machine (vanilla UP kernel)

AP 206
- IDE drive
- 256 ram
- Intel NIC, pro 100
- same version


lsmod output (from the Compaq 1850):

tlan                   19892   1 (autoclean)
ncr53c8xx              52264   0 (unused)
cpqarray               15200   6

I also have the System.map and ksyms, not included here because of their
length, I'll email them to whomever asks.  Let me know if any other output
is required.

Traces from customer follow:

---------------
        We've got the 2.2.12-20 kernel.  These dumps are representative of
the ones we've been getting.  One always crashes on
tcp_retransmit_collapse_try and the other in tcp_ack.  The crash in tcp ack
is always on the same instruction.  The crash in
tcp_retransmit_collapse_try
is always in within an instruction or two (it will crash on the 83 38 01
sometimes).  Just for variety, I guess.

NOTE: I beleive these dumps are reliable because I can use gdb to
disassemble vmlinuz and I can find the code byte-for-byte where it should
be.



WARNING: This version of ksymoops is obsolete.
WARNING: The current version can be obtained from
+ftp://ftp.ocs.com.au/pub/ksymoops                                             
Options used: -V (default)
              -o /lib/modules/2.2.12-20/ (default)
              -k /proc/ksyms (default)
              -l /proc/modules (default)
              -m System.map (specified)
              -c 1 (default)

EIP: 0010:[<c01639b1>]
eax: 0031eef1 ebx: d2c29470 ecx: 00000000 edx: 0000000
esi: 00000000 edi: 00000006 ebp: d2c293c0 esp: c0225e24
ds: 0018 es: 0018 ss: 0018
Call Trace: [<c0164cf5>] [<c0169d43>] [<c0169eb2>] [<c016a14b>]
[<c015cc33>]
+[<c015cf31>] [<c014f2bd>]
[<c014f2bd>] [<c01184f5>] [<c010ae6b>] [<c101ab38>] [<c010b5fd>]
[<c0106000>]
+[<c0108620>]
[<c0109d08>] [<c0106000>] [<c010607b>] [<c0106000>] [<c0100176>]
Code: 2b 42 44 8b 4b 50 29 c1 89 c8 85 c0 7d 05 b8 01 00 00 00 50

>>EIP: c01639b1
<tcp_ack+2a1/370>                                               Trace:
c0164cf5 <tcp_rcv_established+449/5e8>
Trace: c0169d43 <tcp_v4_do_rcv+6f/178>
Trace: c0169eb2 <tcp_v4_rcv+66/384>
Trace: c016a14b <tcp_v4_rcv+2ff/384>
Trace: c015cc33 <ip_local_deliver+223/27c>
Trace: c015cf31 <ip_rcv+2a5/2d4>
Trace: c014f2bd <net_bh+179/1d4>
Trace: c014f2bd <net_bh+179/1d4>
Trace: c0109d08 <system_call+34/38>
Code:  c01639b1 <tcp_ack+2a1/370>              00000000 <_EIP>: <===
Code:  c01639b1 <tcp_ack+2a1/370>                 0:    2b 42 44
+subl   0x44(%edx),%eax <===
Code:  c01639b4 <tcp_ack+2a4/370>                 3:    8b 4b 50
+movl   0x50(%ebx),%ecx
Code:  c01639b7 <tcp_ack+2a7/370>                 6:    29 c1
+subl   %eax,%ecx
Code:  c01639b9 <tcp_ack+2a9/370>                 8:    89 c8
+movl   %ecx,%eax
Code:  c01639bb <tcp_ack+2ab/370>                 a:    85 c0
+testl  %eax,%eax
Code:  c01639bd <tcp_ack+2ad/370>                 c:    7d 05
+jnl     c01639c4 <tcp_ack+2b4/370>
Code:  c01639bf <tcp_ack+2af/370>                 e:    b8 01 00 00 00
+movl   $0x1,%eax
Code:  c01639c4 <tcp_ack+2b4/370>                13:    50
+pushl  %eax


5 warnings issued.  Results may not be reliable.


-----



WARNING: This version of ksymoops is obsolete.
WARNING: The current version can be obtained from
+ftp://ftp.ocs.com.au/pub/ksymoops                                             
Options used: -V (default)
              -o /lib/modules/2.2.12-20/ (default)
              -k /proc/ksyms (default)
              -l /proc/modules (default)
              -m System.map (specified)
              -c 1 (default)

EIP: 0010:[<c016612e>]
EFLAGS: 00010246
eax: 00000000 ebx: c8742c20 ecx: 00000000 edx: 00000000
esi: c895d7f0 edi c8742c20 ebp: c0225dd8 esp: c0225dc4
Call Trace: [<c014d4c5>] [<c0166483>] [<c015cc33>] [<c015cf31>]
[<c014f2bd>]
[<c01184f5>] [<c010ae6b>] [<c010ab38>] [<c01085fd>] [<c0106000>]
[<c0108620>]
[<c0109d08>] [<c0106000>] [<c010607b>] [<c0106000>] [<c0100176>]
Code: 80 79 66 00 74 11 8b 81 88 00 00 00 83 38 01 0f 95 c0 25 ff

>>EIP: c016612e
<tcp_retrans_try_collapse+3a/208>                               Trace:
c014d4c5 <__kfree_skb+a1/a8>
Trace: c0166483 <tcp_retransmit_skb+a3/164>
Trace: c015cc33 <ip_local_deliver+223/27c>
Trace: c015cf31 <ip_rcv+2a5/2d4>
Trace: c014f2bd <net_bh+179/1d4>
Trace: c01184f5 <do_bottom_half+45/64>
Trace: c0109d08 <system_call+34/38>
Code:  c016612e <tcp_retrans_try_collapse+3a/208> 00000000 <_EIP>: <===
Code:  c016612e <tcp_retrans_try_collapse+3a/208>    0: 80 79 66 00
+cmpb   $0x0,0x66(%ecx) <===
Code:  c0166132 <tcp_retrans_try_collapse+3e/208>    4: 74 11
+je      c0166145 <tcp_retrans_try_collapse+51/208>
Code:  c0166134 <tcp_retrans_try_collapse+40/208>    6: 8b 81 88 00 00 00
+movl   0x88(%ecx),%eax
Code:  c016613a <tcp_retrans_try_collapse+46/208>    c: 83 38 01
+cmpl   $0x1,(%eax)
Code:  c016613d <tcp_retrans_try_collapse+49/208>    f: 0f 95 c0
+setne  %al
Code:  c0166140 <tcp_retrans_try_collapse+4c/208>   12: 25 ff 00 00 00
+andl   $0xff,%eax


4 warnings issued.  Results may not be reliable.
Comment 1 Bill Nottingham 2000-08-02 17:12:42 EDT
They *really* need to try a 2.2.16-based kernel; many
TCP bugs were fixed in those.
Comment 2 Brian Brock 2000-08-04 14:12:42 EDT
Initially, stability goes up with 2.2.16-3.

Waiting for a few days of production use to see if the problem is truly gone, or
if it's just occuring less often now.
Comment 3 Matt Novi 2000-08-04 14:52:12 EDT
After running with 2.2.16 for one day there was no panic. However, this kernel 
runs too slow for our production environment (the 2.2.16 box was several 
packets behind our 2.2.12 box all day).  I've backed off to 2.2.14 which 
appears to be just as fast as the 2.2.12 kernel.
Comment 4 Brian Brock 2001-06-25 10:26:02 EDT
Problem appears resolved with no change in many months.  Changing status to
"closed".

Note You need to log in before you can comment on or make changes to this bug.