Bug 88013

Summary: Scheduler doesn't background nice 19 jobs
Product: [Retired] Red Hat Raw Hide Reporter: Jeremy Sanders <jss>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.0CC: michael, pfrields
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-10-30 03:52:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy Sanders 2003-04-04 16:41:17 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312

Description of problem:
Using the rawhide kernel above (and with the errata RedHat 7.3 kernel
2.4.18-27.7.x), CPU-bound nice +19 jobs are given too much cpu with respect to
CPU-bound nice 0 jobs.

For instance on a 433MHz PII (also tried 1400MHz Athlon & P4 2700MHz)

  5:29pm  up 35 min,  3 users,  load average: 2.14, 1.80, 0.99
69 processes: 65 sleeping, 4 running, 0 zombie, 0 stopped
CPU states: 90.6% user,  0.1% system,  9.1% nice,  0.0% idle
Mem:   384772K av,   76876K used,  307896K free,       0K shrd,    7428K buff
Swap: 1028152K av,       0K used, 1028152K free                   25948K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1497 jss       25   0  1000 1000   380 R    90.4  0.2   9:49 cpufloattest
 1635 jss       39  19   956  956   340 R N   9.1  0.2   0:51 cpufloattest

So 1/10 of the power goes to the nice 19 job, even though they are both cpu
bound. There's no way of giving the nice 19 job lower priority (AFAIK).

On a Compaq Tru64 4.1B system:

load averages:  2.30,  2.23,  2.19                                     17:31:14
48 processes:  3 running, 14 sleeping, 31 idle
CPU states: 99.5% user,  0.0% nice,  0.4% system,  0.0% idle
Memory: Real: 77M/235M act/tot  Virtual: 9M/1223M use/tot  Free: 118M

  PID USERNAME PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
211277 jss       59    0 2120K  188K run     6:23 99.70% a.out
203876 jmb65     63   19   12M 9322K run   316:20  0.00% tree

The nice 0 job gets 99.7% of the CPU - a far better deal.

Can the scheduler be fixed to lower the priority of batch nice +19 jobs?
Otherwise a patch to implement something like the SCHED_BATCH patch would be useful.

(Also nice 10 jobs get 1/3 of the cpu - I'd suggest that's too much).


Version-Release number of selected component (if applicable):
kernel-2.4.20-8.1

How reproducible:
Always

Steps to Reproduce:
1. Run nice 0 foreground job
2. Run nice 19 background job
3. 19 job gets 6-10% of the CPU
    

Additional info:

Comment 1 Michael Lee Yohe 2003-04-04 17:59:38 UTC
I tend to agree - processes which are niced at 19 are those which "if you can
run, fine..."  I run several things at 19 (time updates, random signature
generation, etc.) that do not need any sort of priority (not 9% at least).  This
would probably include Ingo Molnar (author of the O(1) scheduler for the kernel
- also works for Red Hat).

Upstream?

It does bring up a good point for the central kernel maintainers - gauging
scheduling performance versus commerical UNIX kernels would be interesting to
see how Linux fares out.

Comment 2 Jeremy Sanders 2003-05-24 12:22:53 UTC
This situation is getting worse.  Using 2.4.20-13.7 on 7.3 gives 15% to the
niced-19 process on a P4.

It would be great if RH were to include Ingo Molnar's SCHED_BATCH feature to
allow background execution of jobs when there are no other jobs running.

Comment 3 Jeremy Sanders 2003-05-26 11:16:02 UTC
I've contacted Ingo Molnar and he says he'll look into the nice 19 issue.