74539 – Kernel 2.4.18-10 - network nearly unusable

Bug 74539 - Kernel 2.4.18-10 - network nearly unusable

Summary: Kernel 2.4.18-10 - network nearly unusable

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	i686
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-09-26 11:07 UTC by Markus Doehr
Modified:	2008-08-01 16:22 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:39:57 UTC
Embargoed:

Attachments	(Terms of Use)

Description Markus Doehr 2002-09-26 11:07:23 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2b) Gecko/20020828

Description of problem:
I installed Kernel 2.4.18-10 on a Redhat 7.3 system serving a large website.
Apache 1.3.23-14, php-4.1.2-7.3.4. The system isn't able to server more than 5
requests per seconds. Tried to tune many kernel parameters but it didn't help.

Going back to 2.4.18-3bigmem will serve about 1000 requests per second.

I saw many timeouts to select() in strac'ing the apache root process. 

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Install 2.4.18-10 on a i686 system
2.
3.
	

Additional info:

Comment 1 Arjan van de Ven 2002-09-26 11:09:09 UTC

what kind of network card is this ?

Comment 2 Markus Doehr 2002-09-26 11:12:16 UTC

[root@www ~]# lsmod
Module                  Size  Used by    Not tainted
ipchains               46184   0
autofs                 12740   0  (autoclean) (unused)
e100                   77524   1
usb-ohci               22688   0  (unused)
usbcore                77024   1  [usb-ohci]
ext3                   70720   6
jbd                    53504   6  [ext3]
aic7xxx               125728   7
sd_mod                 12896  14
scsi_mod              115120   2  [aic7xxx sd_mod]

Comment 3 Markus Doehr 2002-09-26 20:20:49 UTC

This now also happens with the 'old' kernel:

Here are the results of

# http_load -parallel 50 -rate 50 -seconds 60 urls

where it fetches the index.html page.

1977 fetches, 1038 max parallel, 1.07245e+07 bytes, in 60.0051 seconds
5424.66 mean bytes/connection
32.9472 fetches/sec, 178727 bytes/sec
msecs/connect: 6857.95 mean, 45008.7 max, 3.437 min
msecs/first-response: 5345.73 mean, 32055 max, 5.173 min
1111 bad byte counts
HTTP response codes:
  code 200 -- 866

Only 32 fetches per second, is this normal? 

[root@www ~]# uname -a
Linux www.edonkey2000.com 2.4.18-3bigmem #1 SMP Thu Apr 18 07:17:10 EDT 2002
i686 unknown

Is there anything I can do?

Comment 4 Markus Doehr 2002-09-26 20:22:54 UTC

forgot to mention that the system the http_load is running on is connected via
100Mbit switch and using the eepro100 driver.

Comment 5 Markus Doehr 2002-09-26 20:23:50 UTC

[root@www ~]# strace -p 846
select(0, NULL, NULL, NULL, {0, 280000}) = 0 (Timeout)
time(NULL)                              = 1033070689
kill(2253, SIGALRM)                     = 0
kill(2320, SIGALRM)                     = 0
kill(2341, SIGALRM)                     = 0
kill(2367, SIGALRM)                     = 0
wait4(-1, 0xbffff6d8, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 1033070690
kill(2215, SIGALRM)                     = 0
kill(2219, SIGALRM)                     = 0
kill(2246, SIGALRM)                     = 0
wait4(-1, 0xbffff6d8, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 1033070691
kill(2131, SIGALRM)                     = 0
kill(2214, SIGALRM)                     = 0
kill(2366, SIGALRM)                     = 0
wait4(-1, 0xbffff6d8, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 1033070692
kill(2302, SIGALRM)                     = 0
kill(2310, SIGALRM)                     = 0
kill(2312, SIGALRM)                     = 0
kill(2374, SIGALRM)                     = 0
kill(2390, SIGALRM)                     = 0
wait4(-1, 0xbffff6d8, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 1033070693
kill(2129, SIGALRM)                     = 0
kill(2154, SIGALRM)                     = 0
kill(2193, SIGALRM)                     = 0
wait4(-1, 0xbffff6d8, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0}

Comment 6 Arjan van de Ven 2002-09-26 20:40:46 UTC

any idea if this gets fixed when using the "e100" driver instead?

Comment 7 Markus Doehr 2002-09-26 20:50:23 UTC

misunderstanding:
www is running the e100 driver
sda is running the eepro100 driver

www is SLOW
sda is normal (running Redhat 7.2 - 2.4.7-10)

So the www server is suffering due to the usage of e100? Should we try to switch
to eepro100?

Comment 8 Markus Doehr 2002-09-26 20:59:45 UTC

could this be related to https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=73185

Comment 9 Arjan van de Ven 2002-09-26 21:02:51 UTC

not really; the card in that bug is new and not quite on the market yet

Comment 10 Bugzilla owner 2004-09-30 15:39:57 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.