Bug 76785 - Sockets fails to close
Summary: Sockets fails to close
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-10-26 16:03 UTC by Steffen Persvold
Modified: 2008-08-01 16:22 UTC (History)
0 users

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:40:07 UTC
Embargoed:


Attachments (Terms of Use)

Description Steffen Persvold 2002-10-26 16:03:07 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830

Description of problem:
With the 2.4.18-17.7.3 kernel an application which uses many threads to connect
to other hosts (200), fails to recognize a typical "connection closed by peer"
on some of the sockets. No problems with the 2.4.18-10 kernel.

Version-Release number of selected component (if applicable):
2.4.18-17.7.3


How reproducible:
Always

Steps to Reproduce:
1. Start threaded parent application which connects to a large number of remote
daemons (one thread per remote machine, separate socket). This remote daemon
will fork&exec another application.
2. All threads enter the recv() system call to receive data from the remote
executed application.
3. Terminate remote executed application


Actual Results:  After some time, some of the threads are still in the recv()
system call, and the socket is still listed as ESTABLISHED (with netstat -a).
However all remote applications has terminated and no sockets to parent machine
exists on these remote machines. The application therefore hangs, but a ctrl-c
terminates it successfully.

Expected Results:  The recv() system call should have returned 0 to indicate
that the remote application has closed the connection. Then the thread is
terminated and when all threads exits the parent application should have
terminated automatically without needing to press ctrl-c.

Additional info:

Kernel is 2.4.18-17.7.xsmp i686 build. Kernel 2.4.18-10smp i686 build works
fine. Glibc is glibc-2.2.5-40 (i686).

This is a MPI implementation and the described sequence is used to launch the
MPI application on the cluster nodes. Unfortunately no example source is
available (however this could be created) nor is this easy to reproduce since it
requires a relatively high number of cluster nodes (i.e 32 nodes work fine, 64
does not).

Comment 1 Bugzilla owner 2004-09-30 15:40:07 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.