Bug 11320 - kernel-2.2.14-12 crashes after multiple failed connect() calls on same socket
Summary: kernel-2.2.14-12 crashes after multiple failed connect() calls on same socket
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 6.2
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Alan Cox
QA Contact:
URL: http://www.cpons.com/linux-bug/bomb_k...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2000-05-09 04:55 UTC by bgunter
Modified: 2008-05-01 15:37 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2000-07-31 21:12:44 UTC
Embargoed:


Attachments (Terms of Use)
Test program to show the bug. (1.23 KB, text/plain)
2000-05-09 04:56 UTC, bgunter
no flags Details

Description bgunter 2000-05-09 04:55:15 UTC
This is a severe bug that crashes kernel-2.2.14-12 (SMP and UP), as well as
its predecessors.  It shows up when a program makes multiple calls to
connect(), using the same socket for each call.  If the connect() calls
fail repeatedly with "Connection refused", the kernel will crash out after
just a few calls.

The problem is aggravated if other system calls are made between calls to
connect().  I use a call to sleep(1) in my example program.  In this case
the kernel will usually fail after three to six cycles.

I also tested this on a kernel 2.2.14 I built myself from the tarball, and
it did not crash.  It returned "Connection refused" the first time, and
"Invalid argument" after that.  (The connect() man page insinuates that
this is incorrect as well, but that's a different story.)

For example code, see http://www.cpons.com/linux-bug/bomb_kernel.c

And sync your disks before running it...

Comment 1 bgunter 2000-05-09 04:56:59 UTC
Created attachment 230 [details]
Test program to show the bug.

Comment 2 SB 2000-05-15 18:34:59 UTC
This is a duplicate of bug reported 4/22 on rawhide #10986 FYI, example code is
included there as well.

-Stan Bubrouski

Comment 3 Alan Cox 2000-05-26 22:45:59 UTC
Verified this is the case. Its also not the known bug we found with
daddr=0. There is something more fundamentally [word not allowed on a public web
site] with this. The trace shows the daddr based check doesnt seem to be working
at all. Might be a connect in progress code path bug ?

Alan

Comment 4 Alan Cox 2000-05-26 23:40:59 UTC
Cause identified. Im working on what needs to get fixed, and checking with
people so I dont break something else in the process

Comment 5 Jonathan Lewis 2000-06-20 15:11:29 UTC
I have been having occasional (roughly weekly) system crashes with the
2.2.14-5.0 kernel which seem to match the above description. Not surprisingly,
no info about the crash could be found in any of the system logs. 

I decided to test the kernel bug reported here and of course the test program
bomb_kernel.c does the job of killing off the 2.2.15-5.0 kernel after 3-6
connect() fails.

I also tried bomb_kernel.c on my custom built 2.2.16 kernel and everything seems
fine now. I get

Trial: 1        connect: Connection refused
Trial: 2        connect: Connection refused

up to the 411th time it tries to make the connection, whereafter I get

Trial: 411      connect: No buffer space available
Trial: 412      connect: No buffer space available

and so on.

In any case the kernel didn't crash, which was nice, and hasn't so far. I'm just
waiting to see if this has cleared up my almost weekly system crashes....

	Jon

Comment 6 Need Real Name 2000-07-14 22:07:18 UTC
Hello,

I ran into this truly annoying problem and found a way around it.

You can workaround this bug in user level code by doing this:

0. Open socket
1. connect using that newly opened fd ONCE, if it fails for whatever reason.
2. close socket
3. repeat 0 as many times as you want to try connecting to the service.

Normally, people open a socket and it stays alive during each connect
attempt and then close it if all of the atteptms fail. In this case, you close
the fd and reopen it anew for every connect attempt.

Hope this helps.....

Thank you.

-Peter Keller

Comment 7 Michael K. Johnson 2000-07-31 21:12:42 UTC
Alan, was this fixed in our 2.2.16-3 kernel as I suspect?

Comment 8 Alan Cox 2000-07-31 22:32:15 UTC
DaveM fixed it yes



Note You need to log in before you can comment on or make changes to this bug.