Red Hat Bugzilla – Bug 76785
Sockets fails to close
Last modified: 2008-08-01 12:22:52 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830
Description of problem:
With the 2.4.18-17.7.3 kernel an application which uses many threads to connect
to other hosts (200), fails to recognize a typical "connection closed by peer"
on some of the sockets. No problems with the 2.4.18-10 kernel.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start threaded parent application which connects to a large number of remote
daemons (one thread per remote machine, separate socket). This remote daemon
will fork&exec another application.
2. All threads enter the recv() system call to receive data from the remote
3. Terminate remote executed application
Actual Results: After some time, some of the threads are still in the recv()
system call, and the socket is still listed as ESTABLISHED (with netstat -a).
However all remote applications has terminated and no sockets to parent machine
exists on these remote machines. The application therefore hangs, but a ctrl-c
terminates it successfully.
Expected Results: The recv() system call should have returned 0 to indicate
that the remote application has closed the connection. Then the thread is
terminated and when all threads exits the parent application should have
terminated automatically without needing to press ctrl-c.
Kernel is 2.4.18-17.7.xsmp i686 build. Kernel 2.4.18-10smp i686 build works
fine. Glibc is glibc-2.2.5-40 (i686).
This is a MPI implementation and the described sequence is used to launch the
MPI application on the cluster nodes. Unfortunately no example source is
available (however this could be created) nor is this easy to reproduce since it
requires a relatively high number of cluster nodes (i.e 32 nodes work fine, 64
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases,
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/