This is a severe bug that crashes kernel-2.2.14-12 (SMP and UP), as well as its predecessors. It shows up when a program makes multiple calls to connect(), using the same socket for each call. If the connect() calls fail repeatedly with "Connection refused", the kernel will crash out after just a few calls. The problem is aggravated if other system calls are made between calls to connect(). I use a call to sleep(1) in my example program. In this case the kernel will usually fail after three to six cycles. I also tested this on a kernel 2.2.14 I built myself from the tarball, and it did not crash. It returned "Connection refused" the first time, and "Invalid argument" after that. (The connect() man page insinuates that this is incorrect as well, but that's a different story.) For example code, see http://www.cpons.com/linux-bug/bomb_kernel.c And sync your disks before running it...
Created attachment 230 [details] Test program to show the bug.
This is a duplicate of bug reported 4/22 on rawhide #10986 FYI, example code is included there as well. -Stan Bubrouski
Verified this is the case. Its also not the known bug we found with daddr=0. There is something more fundamentally [word not allowed on a public web site] with this. The trace shows the daddr based check doesnt seem to be working at all. Might be a connect in progress code path bug ? Alan
Cause identified. Im working on what needs to get fixed, and checking with people so I dont break something else in the process
I have been having occasional (roughly weekly) system crashes with the 2.2.14-5.0 kernel which seem to match the above description. Not surprisingly, no info about the crash could be found in any of the system logs. I decided to test the kernel bug reported here and of course the test program bomb_kernel.c does the job of killing off the 2.2.15-5.0 kernel after 3-6 connect() fails. I also tried bomb_kernel.c on my custom built 2.2.16 kernel and everything seems fine now. I get Trial: 1 connect: Connection refused Trial: 2 connect: Connection refused up to the 411th time it tries to make the connection, whereafter I get Trial: 411 connect: No buffer space available Trial: 412 connect: No buffer space available and so on. In any case the kernel didn't crash, which was nice, and hasn't so far. I'm just waiting to see if this has cleared up my almost weekly system crashes.... Jon
Hello, I ran into this truly annoying problem and found a way around it. You can workaround this bug in user level code by doing this: 0. Open socket 1. connect using that newly opened fd ONCE, if it fails for whatever reason. 2. close socket 3. repeat 0 as many times as you want to try connecting to the service. Normally, people open a socket and it stays alive during each connect attempt and then close it if all of the atteptms fail. In this case, you close the fd and reopen it anew for every connect attempt. Hope this helps..... Thank you. -Peter Keller
Alan, was this fixed in our 2.2.16-3 kernel as I suspect?
DaveM fixed it yes