Bug 10225

Summary: Stuck TCP sockets on SMP Netfinity
Product: [Retired] Red Hat Linux Reporter: Marc Provitt <mprov>
Component: kernelAssignee: Michael K. Johnson <johnsonm>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 6.1CC: linux, wil
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
URL: http://www.zkey.com
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-03-21 14:15:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marc Provitt 2000-03-17 23:49:01 UTC
IBM Netfinity 5000 and 5500 servers. SMP machines. the 5000s are dual P3,
the 5500 is a quad Xeon

periodically TCP sockets get stuck, be they imap (:143) or LDAP (:389), in
the CLOSE_WAIT or TIME_WAIT states. This happens on one of the 3 Web
servers (the 5000s)at a time.. they pile up exausting the resources on the
file/mail server (the 5500). making the site unreacable.. we have roughly
250,000 email accounts and are expecting to reach over 1M email accounts in
the very near future..

Kernel Ver: 2.2.12-20smp. We've also tried .14, 15pre10, even going so far
as to modify the header file to allow a greater deal of TCP connections..

The email server we're using is Cyrus imapd-v1.6.22-2  and sasl-v1.5.15-2

We've also upped /proc/sys/fs/file-max to 65535.. but this is more to
accomidate the amount of traffic the servers experience..


Please help, there are 10's of millions of $$ at stake, and there are
people here considering scrapping everything linux!!

Comment 1 Alan Cox 2000-03-17 23:59:59 UTC
Firstly TIME_WAIT is not an error. Its a protocol requirement. TCP requires
this to avoid the risk of data corruption with other sessions. Im sure you'd
prefer intact email.

TIME_WAIT lasts 120 seconds. Thus a heavy polling rate of thousands of mail
clients would cause and you would expect to see a lot of TIME_WAIT sockets,
especially if you have people using silly (eg 5 second) poll rates.

How many connections/minute is your IMAP running at ?

Comment 2 Alan Cox 2000-03-21 14:15:59 UTC
Ok the system info you sent shows nothing at all out of the
ordinary either in configuration or setup. I see no obvious
reasons for problems.

I see two possible issues here:

1.	Someone is despite your claims otherwise polling very
	fast running you out of resource

2.	You have a lot of large mailboxes and Cyrus is not using
	maildir format. That can cause a huge amount of I/O and
	memory usage reformatting mailboxes after changes.

Both are speculation. I'd need to look at netstat output to
judge further. Can you do

	echo "1" >/proc/sys/net/ipv4/tcp_syncookies

and when the box gets loaded tell me if 'dmesg' and the logs show
any messages abotu sending syn cookies.

You appear to be running cyrus imapd as a standalone daemon - this
is a correct assumption o my part ?

Comment 3 Alan Cox 2000-08-08 20:29:42 UTC
No response since March: closing