From Bugzilla Helper: User-Agent: Mozilla/4.72 [en] (X11; U; Linux 2.2.14-5.0 i586) Description of problem: bash 2.04 as shipped with RHL 7.1 has a bug which can cause scripts to mysteriously spin out of control consuming large amounts of CPU time (looping endlessly on wait4 system call). A new version of bash (2.05) is available which fixes this problem. This bug can render long-running scripts unusable. The unpredictable nature and resource-hogging behavior make this bug rather annoying, to say the least. How reproducible: Usually Steps to Reproduce: 1. Download and install 'autonet' utility from above URL. 2. Type 'autonet start'. 3. Leave running for several hours. (this process can be sped up a bit by changing the INTERVAL settings in /etc/sysconfig/autonet to "1", but it still takes over an hour usually) Actual Results: After several hours of normal operation, the autonet script will suddenly slip into a state where it consumes 99% of the CPU time and becomes unresponsive. An strace will show bash is doing endless calls to the wait4 system call with an ECHLD result. I suspect that similar results occur with other long-running or continuous shell scripts, but have not yet pursued this. Additional info:
I have put updated bash RPMs up on http://www.foogod.com/software/autonet. These include all of the fixes from the 2.04 RPM, so they should be good for general distribution unless there are any known issues in 2.05 that weren't in 2.04. (someone might want to check whether the "exclude" patch is still needed if anybody knows how to create a test case. The bash code has been changing in this area, so it may no longer be needed. I have included it in the 2.05 RPM just to be safe.) -alex
(err.. I meant "export" patch.. sigh.)
2.05 has been in rawhide for about 3 months.
Ok, I missed the RPMs in the rawhide FTP directory before, however: RAWHIDE is not an appropriate resolution for this bug. This is not a new feature request, or an issue with some part of the system that nobody uses, this is a confirmed BUG in one of the core components of the system, a potentially sytem crashing bug if it happens to hit the wrong process at the wrong time (which it could do randomly). I have encountered at least one other incident with a runaway system script which I believe may be due to this bug (I was unable to obtain enough info at the time to be sure, but it was similar). Howabout an errata or _something_ to let people know that this problem exists BEFORE it screws over their production systems without warning? (and that there's a fix available).