Bug 114447

Summary: exit system call does not exit though process in status end and parent waits
Product: [Retired] Red Hat Linux Reporter: Albert Fluegel <tdsc.af>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 9CC: riel
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:41:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Albert Fluegel 2004-01-28 11:10:31 UTC
Description of problem:
It sometimes happens, that a process is in the system call
exit(), is in status end (according to ps -o ...,wchan,...),
the parent has called wait4... but it's child does not end.
This is very annoying problem. It seems to depend on the
kernel version, how often it happens. With
2.4.18-27 and earlier it happened sometimes, say every
1000th program started on 200 machines (i.e. statistically
every 200000th process). With 2.4.20-8 it happened every
some 100th process on one machine i.e. every 500th process.
With 2.4.20-20 we didn't see it for quite some time. With
2.4.20-28 the problem is back with about the same rate
like with 2.4.18-27. The machines are all dual processors.
The problem occurs with heavily nonlinear increasing
likelyness with increasing processor speed. From 2.8 GHz Xeon
or faster it happens MUCH more often than on slower machines.
More experiences, whose significance is unclear:
It seems only to happen when the processes are started as
sub-processes (with some shells inbetween) of rshd. Often
the problem occurred when installing RPMs with the rpm
Command. I've seen a very interesting behaviour here: Whether
the problem showed up depended on the current working directory
where the rpm program started. This way the exit hung nearly
every time:
(pwd is e.g. /)
rpm -i <options> /path/to/some/NFS/directory/kernel-some-version.rpm
but this way it worked:
cd /path/to/some/NFS/directory && rpm -i <options> kernel-some-version.rpm

Version-Release number of selected component (if applicable):
see above

How reproducible:
spread lots of jobs using rsh to a lot of machines

Steps to Reproduce:
1.see above, sorry, that i don't provide rsh using scripts and so on here
2.
3.
  
Actual results:
with a certain probability exit does not return

Expected results:
exit returns, parent gets exit data through some wait

Additional info:
see above.

Comment 1 Bugzilla owner 2004-09-30 15:41:49 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/