Bug 76785 - Sockets fails to close
Sockets fails to close
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.3
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-10-26 12:03 EDT by Steffen Persvold
Modified: 2008-08-01 12:22 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-30 11:40:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Steffen Persvold 2002-10-26 12:03:07 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830

Description of problem:
With the 2.4.18-17.7.3 kernel an application which uses many threads to connect
to other hosts (200), fails to recognize a typical "connection closed by peer"
on some of the sockets. No problems with the 2.4.18-10 kernel.

Version-Release number of selected component (if applicable):
2.4.18-17.7.3


How reproducible:
Always

Steps to Reproduce:
1. Start threaded parent application which connects to a large number of remote
daemons (one thread per remote machine, separate socket). This remote daemon
will fork&exec another application.
2. All threads enter the recv() system call to receive data from the remote
executed application.
3. Terminate remote executed application


Actual Results:  After some time, some of the threads are still in the recv()
system call, and the socket is still listed as ESTABLISHED (with netstat -a).
However all remote applications has terminated and no sockets to parent machine
exists on these remote machines. The application therefore hangs, but a ctrl-c
terminates it successfully.

Expected Results:  The recv() system call should have returned 0 to indicate
that the remote application has closed the connection. Then the thread is
terminated and when all threads exits the parent application should have
terminated automatically without needing to press ctrl-c.

Additional info:

Kernel is 2.4.18-17.7.xsmp i686 build. Kernel 2.4.18-10smp i686 build works
fine. Glibc is glibc-2.2.5-40 (i686).

This is a MPI implementation and the described sequence is used to launch the
MPI application on the cluster nodes. Unfortunately no example source is
available (however this could be created) nor is this easy to reproduce since it
requires a relatively high number of cluster nodes (i.e 32 nodes work fine, 64
does not).
Comment 1 Bugzilla owner 2004-09-30 11:40:07 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.