Bug 119519

Summary: (NET 83815)Kernel panic on overwhelming number of TCP requests
Product: [Fedora] Fedora Reporter: Bojan Smojver <bojan>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 1CC: alan, sahil.verma
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-29 20:16:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bojan Smojver 2004-03-31 01:38:16 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Gecko/20031114 Galeon/1.3.14

Description of problem:
When running a large number (1,000,000) of tests using ab from Apache
2.0.46 (newer versions of ab have a few problems, therefore this one)
to test Apache 2.0.49, the kernel panics intermittently. It has
happened at least 5 times so far on my system.

The command I used to test Apache 2.0.49 (compiled from source with
only --enable-shared option) was:

ab -c 10 -k -n 1000000 http://localhost/inquiry.sm

The options mean:
-c: concurrency level
-k: keep-alive requests on
-n: the total number of requests to perform

Sidenote: The file being fetched, inquiry.sm, is a mod_spin macro file
(mod_spin is an Apache 2 module concocted by yours truly). This means
that a shared library is linked into Apache 2, some code in it
executed and the result then served out to the world. Now, even if
mod_spin has gazillion bugs (probably true), the kernel should not
panic. Therefore, the problem is with the kernel.

Version-Release number of selected component (if applicable):
kernel-2.4.22-1.2174.nptl

How reproducible:
Sometimes

Steps to Reproduce:
1. Download Apache 2.0.49, compile with --enable-shared, install.
Alternatively, use some other software for the same test.
2. Overwhelm the system it with huge number of concurrent TCP requests.

Actual Results:  Kernel panics. I will provide some more details once
the machine dies again and I catch its death messages somehow (if
anyone can tell me what's the easiest way to do this, that would be
really nice).

Expected Results:  The kernel should not panic.

Additional info:  The test was done on HP Pavillion ZE4201 notebook.
More info is here:

http://www.rexursive.com/articles/linuxonhpze4201.html.

Comment 1 Bojan Smojver 2004-03-31 03:31:17 UTC
OK, the thing finally crashed (jeez - I can't believe I'm actually
saying this :-). Here is what I see on the screen (hand typed, sorry
if there are errors - my vision is getting blurry from all the hex):

-------------------------------------------------
[<c020f01e>] netif_receive_skb [kernel] 0x13e (0xc2f01d1c)
[<c020f15d>] process_backlog [kernel] 0x6d (0xc2f01d3c)
[<c020f26a>] net_rx_action [kernel] 0x6a (0xc2f01d54)
[<c0121e45>] do_softirq [kernel] 0x95 (0xc2f01d70)
[<c02173e5>] .txt.lock.netfilter [kernel] 0xb6 (0xc2f01d88)
[<c02297c0>] ip_queue_xmit2 [kernel] 0x0 (0xc2f01db0)
[<c02284b3>] ip_queue_xmit [kernel] 0x483 (0xc2f01dc8)
[<c02297c0>] ip_queue_xmit2 [kernel] 0x0 (0xc2f01de0)
[<c023df8a>] tcp_v4_send_check [kernel] 0x4a (0xc2f01dfc)
[<c0238928>] tcp_transmit_skb [kernel] 0x3b8 (0xc2f01e1c)
[<c0239604>] tcp_write_xmit [kernel] 0x184 (0xc2f01e1c)
[<c022dfce>] tcp_sendmsg [kernel] 0x5de (0xc2f01e84)
[<c01188a0>] recalc_task_prio [kernel] 0x90 (0xc2f01ea4)
[<c024bb02>] inet_recvmsg [kernel] 0x52 (0xc2f01ee0)
[<c024bb62>] inet_sendmsg [kernel] 0x42 (0xc2f01efc)
[<c0206f9b>] sock_sendmsg [kernel] 0x6b (0xc2f01f10)
[<c020722e>] sock_write [kernel] 0xae (0xc2f01f54)
[<c0144103>] sys_write [kernel] 0xa3 (0xc2f01f94)
[<c0109747>] system_call [kernel] 0x33 (0xc2f01fc0)

Code: 0f 0b 62 00 03 0c 28 c0 e9 6c fd ff ff 8d 74 26 00 55 57 56
 <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
-------------------------------------------------

Hope this helps.

Comment 2 Sahil Verma 2004-04-01 00:14:40 UTC
Well, the panic message and register dump has scrolled off..

When it dies, is the keyboard still responsive? If so, enable sysrq
key and capture thread and register dumps, then sync and reboot. The
traces should show up in /var/log/messages.

Or else use a serial console to the box.

How long does it take to crash? Please run slabtop every so many
seconds and save the output.

Comment 3 Bojan Smojver 2004-04-01 01:32:15 UTC
OK, I'll try what you suggested. BTW, I had sysrq key support in and I
tried to sync, but noting showed up in /var/log/messages. Not sure
why. I'll be more careful next time I crash the box and I'll try to
catch more info.

As for the time needed to crash the box - that varies. Sometimes it'll
go down after a few minutes or so, sometimes it needs half an hour.
Sometimes it'll run through all the tests just fine.

Thanks for the hints.

Comment 4 Alan Cox 2004-05-03 15:08:10 UTC
83815 ethernet.


Comment 5 Bojan Smojver 2004-05-06 06:57:45 UTC
Switched to FC2. We'll see what 2.6 does with it.

Comment 6 Bojan Smojver 2004-06-04 01:12:40 UTC
Not sure if the new problem is related this bug or not, but I'll put
the info here anyway. It may be useful.

The platform this time is FC2, kernel 2.6.5-1.358. I'm using Apache
2.0.49 (compiled from source) with libapreq2 (from current CVS) to
upload some files through an HTML form (all to/from localhost - I'm
not connected to the network at all). When I do what with a relatively
big file (around 9.5 MB), on occasion the machine will hang. There is
nothing on the screen or log files that would suggest the error type -
everything just freezes.

Not sure if this is some kind of hardware problem (I've bumped the
BIOS up on this notebook to the latest available from HP) or if it is
kernel related like before.

Comment 7 David Lawrence 2004-09-29 20:16:05 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/