Red Hat Bugzilla – Bug 49911
bash 2.04 has runaway process bug
Last modified: 2007-04-18 12:35:07 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.72 [en] (X11; U; Linux 2.2.14-5.0 i586)
Description of problem:
bash 2.04 as shipped with RHL 7.1 has a bug which can cause scripts to
mysteriously spin out of control consuming large amounts of CPU time
(looping endlessly on wait4 system call).
A new version of bash (2.05) is available which fixes this problem.
This bug can render long-running scripts unusable. The unpredictable
nature and resource-hogging behavior make this bug rather annoying, to say
Steps to Reproduce:
1. Download and install 'autonet' utility from above URL.
2. Type 'autonet start'.
3. Leave running for several hours.
(this process can be sped up a bit by changing the INTERVAL settings in
/etc/sysconfig/autonet to "1", but it still takes over an hour usually)
Actual Results: After several hours of normal operation, the autonet
script will suddenly slip into a state where it consumes 99% of the CPU
time and becomes unresponsive. An strace will show bash is doing endless
calls to the wait4 system call with an ECHLD result.
I suspect that similar results occur with other long-running or continuous
shell scripts, but have not yet pursued this.
I have put updated bash RPMs up on http://www.foogod.com/software/autonet.
These include all of the fixes from the 2.04 RPM, so they should be good for
general distribution unless there are any known issues in 2.05 that weren't in
(someone might want to check whether the "exclude" patch is still needed if
anybody knows how to create a test case. The bash code has been changing in
this area, so it may no longer be needed. I have included it in the 2.05 RPM
just to be safe.)
(err.. I meant "export" patch.. sigh.)
2.05 has been in rawhide for about 3 months.
Ok, I missed the RPMs in the rawhide FTP directory before, however:
RAWHIDE is not an appropriate resolution for this bug. This is not a new
feature request, or an issue with some part of the system that nobody uses, this
is a confirmed BUG in one of the core components of the system, a potentially
sytem crashing bug if it happens to hit the wrong process at the wrong time
(which it could do randomly). I have encountered at least one other incident
with a runaway system script which I believe may be due to this bug (I was
unable to obtain enough info at the time to be sure, but it was similar).
Howabout an errata or _something_ to let people know that this problem exists
BEFORE it screws over their production systems without warning? (and that
there's a fix available).