Bug 229105 - batch during high load causes pathological overload
Summary: batch during high load causes pathological overload
Alias: None
Product: Fedora
Classification: Fedora
Component: at
Version: 6
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Marcela Mašláňová
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2007-02-17 04:31 UTC by JW
Modified: 2007-11-30 22:11 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2007-10-24 08:51:29 UTC
Type: ---

Attachments (Terms of Use)

Description JW 2007-02-17 04:31:18 UTC
Description of problem:
If one attempts to run a job with batch when the system load is greater than 0.8
(configuration level) then atd, with option -b0, might continuously re-check
system load in a tight loop thereby increasing atd cpu usage to nearly 100% and
also thereby increasing the system load even higher.  The batch job will never
ever run.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. ensure atd is running with "-b0" option
2. run some background task to keep system load at, say, 1.o or higher
3. echo date | batch
4. use top to check atd hogging all remaining cpu
5. check that system load goes even higher
6. wait forever for batch job to complete

Actual results:
batch job never completes

Expected results:
atd should pause a bit between checks under such circumstances.
atd should also run the batch job regardless of system load after some period
(say 1 hour), otherwise a batch job will never run on a constantly loaded host
(which often happens in the real world).

Additional info:
The -b0 option should only affect the separation between batch jobs that
actually run.  It is not supposed to be used as the interval at which atd checks
for runability.  So when a batch job is not runnable because of high system load
atd should pause a bit regardless of the -b option setting, otherwise infinite
pointless consumption of cpu will be attempted.

Comment 1 Marcela Mašláňová 2007-03-07 14:27:12 UTC
I can't still reproduce it. Could you write to me the precise task, which are
you running?

Comment 2 JW 2007-03-07 23:06:37 UTC
Did you run the precise instructions that I outlines in steps 1 through 6?

Can you please write to me the precise task, which you are trying?

Comment 3 Marcela Mašláňová 2007-03-13 11:19:23 UTC
Yes, I did.

I have problem to find suitable job which make system load so high. I try some
makewhatis jobs, some personal scripts etc. and nothing strange seen.

Comment 4 JW 2007-03-13 11:32:06 UTC
What was the system load ("uptime" load average) when running the test?
The current load average should have been greater than 1.0.

To create a high load all you need to do is something like "while :; do i=1;
done &; while :; do i=1; done &" ... but these must run for several minutes to
take effect.

If you are having trouble doing something simple like getting the load average
up then maybe you should concentrate on becoming a CEO instead.

Comment 5 Marcela Mašláňová 2007-10-24 08:51:29 UTC
Can't reproduce in at-3.1.10. Won't write feature for at-3.1.8

Note You need to log in before you can comment on or make changes to this bug.