Bug 11320 - kernel-2.2.14-12 crashes after multiple failed connect() calls on same socket
kernel-2.2.14-12 crashes after multiple failed connect() calls on same socket
Status: CLOSED ERRATA
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
6.2
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Alan Cox
http://www.cpons.com/linux-bug/bomb_k...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2000-05-09 00:55 EDT by bgunter
Modified: 2008-05-01 11:37 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2000-07-31 17:12:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Test program to show the bug. (1.23 KB, text/plain)
2000-05-09 00:56 EDT, bgunter
no flags Details

  None (edit)
Description bgunter 2000-05-09 00:55:15 EDT
This is a severe bug that crashes kernel-2.2.14-12 (SMP and UP), as well as
its predecessors.  It shows up when a program makes multiple calls to
connect(), using the same socket for each call.  If the connect() calls
fail repeatedly with "Connection refused", the kernel will crash out after
just a few calls.

The problem is aggravated if other system calls are made between calls to
connect().  I use a call to sleep(1) in my example program.  In this case
the kernel will usually fail after three to six cycles.

I also tested this on a kernel 2.2.14 I built myself from the tarball, and
it did not crash.  It returned "Connection refused" the first time, and
"Invalid argument" after that.  (The connect() man page insinuates that
this is incorrect as well, but that's a different story.)

For example code, see http://www.cpons.com/linux-bug/bomb_kernel.c

And sync your disks before running it...
Comment 1 bgunter 2000-05-09 00:56:59 EDT
Created attachment 230 [details]
Test program to show the bug.
Comment 2 SB 2000-05-15 14:34:59 EDT
This is a duplicate of bug reported 4/22 on rawhide #10986 FYI, example code is
included there as well.

-Stan Bubrouski
Comment 3 Alan Cox 2000-05-26 18:45:59 EDT
Verified this is the case. Its also not the known bug we found with
daddr=0. There is something more fundamentally [word not allowed on a public web
site] with this. The trace shows the daddr based check doesnt seem to be working
at all. Might be a connect in progress code path bug ?

Alan
Comment 4 Alan Cox 2000-05-26 19:40:59 EDT
Cause identified. Im working on what needs to get fixed, and checking with
people so I dont break something else in the process
Comment 5 Jonathan Lewis 2000-06-20 11:11:29 EDT
I have been having occasional (roughly weekly) system crashes with the
2.2.14-5.0 kernel which seem to match the above description. Not surprisingly,
no info about the crash could be found in any of the system logs. 

I decided to test the kernel bug reported here and of course the test program
bomb_kernel.c does the job of killing off the 2.2.15-5.0 kernel after 3-6
connect() fails.

I also tried bomb_kernel.c on my custom built 2.2.16 kernel and everything seems
fine now. I get

Trial: 1        connect: Connection refused
Trial: 2        connect: Connection refused

up to the 411th time it tries to make the connection, whereafter I get

Trial: 411      connect: No buffer space available
Trial: 412      connect: No buffer space available

and so on.

In any case the kernel didn't crash, which was nice, and hasn't so far. I'm just
waiting to see if this has cleared up my almost weekly system crashes....

	Jon
Comment 6 Need Real Name 2000-07-14 18:07:18 EDT
Hello,

I ran into this truly annoying problem and found a way around it.

You can workaround this bug in user level code by doing this:

0. Open socket
1. connect using that newly opened fd ONCE, if it fails for whatever reason.
2. close socket
3. repeat 0 as many times as you want to try connecting to the service.

Normally, people open a socket and it stays alive during each connect
attempt and then close it if all of the atteptms fail. In this case, you close
the fd and reopen it anew for every connect attempt.

Hope this helps.....

Thank you.

-Peter Keller
Comment 7 Michael K. Johnson 2000-07-31 17:12:42 EDT
Alan, was this fixed in our 2.2.16-3 kernel as I suspect?
Comment 8 Alan Cox 2000-07-31 18:32:15 EDT
DaveM fixed it yes

Note You need to log in before you can comment on or make changes to this bug.