Bug 10225 - Stuck TCP sockets on SMP Netfinity
Summary: Stuck TCP sockets on SMP Netfinity
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 6.1
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Michael K. Johnson
QA Contact:
URL: http://www.zkey.com
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2000-03-17 23:49 UTC by Marc Provitt
Modified: 2008-05-01 15:37 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2000-03-21 14:15:12 UTC
Embargoed:


Attachments (Terms of Use)

Description Marc Provitt 2000-03-17 23:49:01 UTC
IBM Netfinity 5000 and 5500 servers. SMP machines. the 5000s are dual P3,
the 5500 is a quad Xeon

periodically TCP sockets get stuck, be they imap (:143) or LDAP (:389), in
the CLOSE_WAIT or TIME_WAIT states. This happens on one of the 3 Web
servers (the 5000s)at a time.. they pile up exausting the resources on the
file/mail server (the 5500). making the site unreacable.. we have roughly
250,000 email accounts and are expecting to reach over 1M email accounts in
the very near future..

Kernel Ver: 2.2.12-20smp. We've also tried .14, 15pre10, even going so far
as to modify the header file to allow a greater deal of TCP connections..

The email server we're using is Cyrus imapd-v1.6.22-2  and sasl-v1.5.15-2

We've also upped /proc/sys/fs/file-max to 65535.. but this is more to
accomidate the amount of traffic the servers experience..


Please help, there are 10's of millions of $$ at stake, and there are
people here considering scrapping everything linux!!

Comment 1 Alan Cox 2000-03-17 23:59:59 UTC
Firstly TIME_WAIT is not an error. Its a protocol requirement. TCP requires
this to avoid the risk of data corruption with other sessions. Im sure you'd
prefer intact email.

TIME_WAIT lasts 120 seconds. Thus a heavy polling rate of thousands of mail
clients would cause and you would expect to see a lot of TIME_WAIT sockets,
especially if you have people using silly (eg 5 second) poll rates.

How many connections/minute is your IMAP running at ?

Comment 2 Alan Cox 2000-03-21 14:15:59 UTC
Ok the system info you sent shows nothing at all out of the
ordinary either in configuration or setup. I see no obvious
reasons for problems.

I see two possible issues here:

1.	Someone is despite your claims otherwise polling very
	fast running you out of resource

2.	You have a lot of large mailboxes and Cyrus is not using
	maildir format. That can cause a huge amount of I/O and
	memory usage reformatting mailboxes after changes.

Both are speculation. I'd need to look at netstat output to
judge further. Can you do

	echo "1" >/proc/sys/net/ipv4/tcp_syncookies

and when the box gets loaded tell me if 'dmesg' and the logs show
any messages abotu sending syn cookies.

You appear to be running cyrus imapd as a standalone daemon - this
is a correct assumption o my part ?

Comment 3 Alan Cox 2000-08-08 20:29:42 UTC
No response since March: closing



Note You need to log in before you can comment on or make changes to this bug.