Red Hat Bugzilla – Bug 229105
batch during high load causes pathological overload
Last modified: 2007-11-30 17:11:57 EST
Description of problem:
If one attempts to run a job with batch when the system load is greater than 0.8
(configuration level) then atd, with option -b0, might continuously re-check
system load in a tight loop thereby increasing atd cpu usage to nearly 100% and
also thereby increasing the system load even higher. The batch job will never
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. ensure atd is running with "-b0" option
2. run some background task to keep system load at, say, 1.o or higher
3. echo date | batch
4. use top to check atd hogging all remaining cpu
5. check that system load goes even higher
6. wait forever for batch job to complete
batch job never completes
atd should pause a bit between checks under such circumstances.
atd should also run the batch job regardless of system load after some period
(say 1 hour), otherwise a batch job will never run on a constantly loaded host
(which often happens in the real world).
The -b0 option should only affect the separation between batch jobs that
actually run. It is not supposed to be used as the interval at which atd checks
for runability. So when a batch job is not runnable because of high system load
atd should pause a bit regardless of the -b option setting, otherwise infinite
pointless consumption of cpu will be attempted.
I can't still reproduce it. Could you write to me the precise task, which are
Did you run the precise instructions that I outlines in steps 1 through 6?
Can you please write to me the precise task, which you are trying?
Yes, I did.
I have problem to find suitable job which make system load so high. I try some
makewhatis jobs, some personal scripts etc. and nothing strange seen.
What was the system load ("uptime" load average) when running the test?
The current load average should have been greater than 1.0.
To create a high load all you need to do is something like "while :; do i=1;
done &; while :; do i=1; done &" ... but these must run for several minutes to
If you are having trouble doing something simple like getting the load average
up then maybe you should concentrate on becoming a CEO instead.
Can't reproduce in at-3.1.10. Won't write feature for at-3.1.8