From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830 Description of problem: With the 2.4.18-17.7.3 kernel an application which uses many threads to connect to other hosts (200), fails to recognize a typical "connection closed by peer" on some of the sockets. No problems with the 2.4.18-10 kernel. Version-Release number of selected component (if applicable): 2.4.18-17.7.3 How reproducible: Always Steps to Reproduce: 1. Start threaded parent application which connects to a large number of remote daemons (one thread per remote machine, separate socket). This remote daemon will fork&exec another application. 2. All threads enter the recv() system call to receive data from the remote executed application. 3. Terminate remote executed application Actual Results: After some time, some of the threads are still in the recv() system call, and the socket is still listed as ESTABLISHED (with netstat -a). However all remote applications has terminated and no sockets to parent machine exists on these remote machines. The application therefore hangs, but a ctrl-c terminates it successfully. Expected Results: The recv() system call should have returned 0 to indicate that the remote application has closed the connection. Then the thread is terminated and when all threads exits the parent application should have terminated automatically without needing to press ctrl-c. Additional info: Kernel is 2.4.18-17.7.xsmp i686 build. Kernel 2.4.18-10smp i686 build works fine. Glibc is glibc-2.2.5-40 (i686). This is a MPI implementation and the described sequence is used to launch the MPI application on the cluster nodes. Unfortunately no example source is available (however this could be created) nor is this easy to reproduce since it requires a relatively high number of cluster nodes (i.e 32 nodes work fine, 64 does not).
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/