Bug 8266 - File descriptor loss: bash command substitution
Summary: File descriptor loss: bash command substitution
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: bash
Version: 6.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Michael K. Johnson
QA Contact:
URL:
Whiteboard:
: 12184 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2000-01-07 16:43 UTC by Harold Knudsen
Modified: 2008-05-01 15:37 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-10-26 06:12:24 UTC
Embargoed:


Attachments (Terms of Use)
System info and Addendum#1 (976 bytes, text/plain)
2000-01-07 16:48 UTC, Harold Knudsen
no flags Details
Possible work-around for bug in bash (or libc or ??). (1.77 KB, patch)
2001-10-24 22:37 UTC, Erling Jacobsen
no flags Details | Diff
Explanation of my previously attached patch (523 bytes, text/plain)
2001-10-25 09:52 UTC, Erling Jacobsen
no flags Details
More information indicating the underlying (actual) bug (439 bytes, text/plain)
2001-10-25 11:29 UTC, Erling Jacobsen
no flags Details

Description Harold Knudsen 2000-01-07 16:43:27 UTC
A repeatable, but relatively rare, error occurs on my system (Dell 410
Workstation, Pentium III, 450Mz, TAG #QAQK) when executing a bash
command substitution, e.g., LN=$(/bin/echo "abc").  The error, reported
from bash (subst.c:2533) is: `Can't reopen pipe to command substitution
(fd 4): No child processes'.
The following scripts have been and are being used to gather statistics
on the frequency of occurrence of this error.
------------------------------------------------------------
#! /bin/sh
# nnx - script that generates/detects bash command substitution error
# $1 = error count file
COUNT=0
while true
do
   C1=0
   until [ $C1 -gt 999 ]
   do
      LN=$(/bin/echo "abc") # The command substitution
      if [ -z $LN ] # Empty string returned on command substitution error
      then
        echo "$COUNT$C1" >> $1
        echo 2
      fi
      C1=$[$C1 = 1 ]
   done
   COUNT=$[$COUNT + 1]
done

---------------------------------------------------------------------
#! /bin/sh
# nxxd - driver for nxx
# $1 = error_count file
# runs until terminated with ^C
while true
do
  nnx $1
done

---------------------------------------------------------------------
Typical use is: `nxxd error_count_file &'
When run under kernel-2.2.5-15 the average command substitution error
frequency was found to be 1 in 119383. (average of the counts in the
error_count_file (22 samples).

Variations on the experiment:
1. I have also installed kernel-2.2.12-20 (from RedHat 6.1) to see if
   the problem exists there.  It does, and with increased freguency (1 in
   52169, on 21 samples).
2. I have explored possible timing sensitivities by placing a delay loop
   in the nxx script, and rerunning it under kernel-2.2.12-20.  See
   ADDENDUM#1, below, for the code (d_nxx).  The frequency of error is
   reduced about ten-fold (from 1 in 52169 to 1 in 587466).
   Increasing the delay count (changing 19 to 49 in `until [ $j -gt 19 ]')
   appears to even further reduce the error frequency.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CONJECTURES and CONCLUSION:
1. A software race (compiler induced?) exists in the control of
/proc/self/fd.  This race is probably not in bash and likely to be
   in the kernel.
2. The fact that the added delay (item 2. above) decreases the frequency
   of the error makes hardware induced failure less likely.

It seems important to test this on another machine running the same
software (something I don't have easy access to).

The problem, if not limited to my system, is serious--most bash (sh)
scripts use command substitution and correct system operation
depends on the correct operation of many scripts.

Please let me know if I can supply other information in helping to
solve this problem.
Thank you,
Harold Knudsen, Emeritus Professor, Computer Science,
University of New Mexico

Comment 1 Harold Knudsen 2000-01-07 16:48:59 UTC
Created attachment 49 [details]
System info and Addendum#1

Comment 2 Damien Miller 2000-08-11 02:59:24 UTC
Here are some more data points:

Both are on RH6.2 with all errata updates

Kernel 2.2.16-3. P-III 700 128Mb RAM
bash-1.14.7-22.: 1 failure in 6,283,000 substitutions

Kernel 2.2.16-3 rebuilt with advanced routing enabled (but not used). Celeron
400 128Mb RAM
bash-1.14.7-22.: 22 failures in 3,600,000 substitutions
bash2-2.03-8: 0 failures in 4,640,000 tests

---------

This bug is *very* annoying when it occurs on long, unattended software builds.
It drives me near insane when an overnight build stalls during a kernel build or
an autoconf run is messed up resulting in miscompiled software.


Comment 3 Erling Jacobsen 2001-10-24 22:37:05 UTC
Created attachment 34962 [details]
Possible work-around for bug in bash (or libc or ??).

Comment 4 Erling Jacobsen 2001-10-25 09:52:44 UTC
Created attachment 35021 [details]
Explanation of my previously attached patch

Comment 5 Erling Jacobsen 2001-10-25 11:29:18 UTC
Created attachment 35041 [details]
More information indicating the underlying (actual) bug

Comment 6 paulh 2001-10-26 06:12:19 UTC
See bug 14781. I had similar problems with my nightly compile. However thiis 
error has disappeared since RedHat 7.0



Comment 7 Phil Knirsch 2002-07-23 08:43:24 UTC
*** Bug 12184 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.