Bug 84109

Summary:	scheduler priority bug
Product:	[Retired] Red Hat Linux	Reporter:	Marc Schmitt <marc.schmitt>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED ERRATA	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.3	CC:	jss, rmj
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:	2.4.20-13.7smp	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2003-05-27 12:53:31 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Marc Schmitt 2003-02-12 11:46:59 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20021120
Netscape/7.01

Description of problem:
we are always running low priority jobs in the background.  The scheduler gets
confused and ends up running jobs with nice +19 almost at top priority.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. launch 2 jobs with default priority
2. launch 4 jobs with nice +19

    

Actual Results:
213 processes: 203 sleeping, 8 running, 2 zombie, 0 stopped
CPU0 states: 91.2% user,  8.4% system,  2.2% nice,  0.0% idle
CPU1 states: 99.0% user,  0.1% system, 99.4% nice,  0.0% idle
Mem:   513100K av,  504868K used,    8232K free,       0K shrd,   28092K buff
Swap: 2096472K av,  186560K used, 1909912K free                  223376K cached
PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
26495 darwin    25   0 48988  47M   436 R    47.1  9.5   8:30 darwin
26464 darwin    25   0 50524  49M   436 R    46.9  9.8   8:51 darwin
26614 gonnet    39  19  2416 2416   300 R N  33.4  0.4   4:08 Switch4.LI
26735 gonnet    39  19  2396 2396   300 R N  33.4  0.4   2:02 Switch4.LI
24154 gonnet    39  19  1548 1548   304 R N  33.2  0.3  21:50 Switch4.LI
25323 gonnet    39  19  2400 2400   304 R N   2.7  0.4   9:52 Switch4.LI
. . . . 

Notice that the first two jobs have nice 0 and they get 100% of one cpu.
The other 4 jobs have all nice=19 and get also 100% of the other cpu.



Expected Results:  Jobs should run at assigned priority.

Additional info:
Kernel 2.4.18-19.7.xsmp (athlon) on a Tyan Tiger MPX dual athlon motherboard.

Comment 1 Roderick Johnstone 2003-03-17 15:05:05 UTC

We are seeing something similar.

 PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CO
 2175 maw       39  19 47864  46M  2276 R N  99.2  4.6   6:51 cloudy.exe
15198 derosa    25   0  7140 7140  3532 R    50.2  0.6 116:51 xspec
 1748 swa       25   0 21304  14M  1288 R    49.7  1.4   8:05 cosmomc

Here two nice zero jobs are sharing one cpu with a nice 19 job having the other
cpu to itself.

Surely this is a scheduling bug?

Kernel version is 2.4.18-24.7.xsmp for Athlon on redhat 7.3. Tyan Tiger MP, 2x
Athlon MP processors.

Comment 2 Marc Schmitt 2003-03-25 10:53:52 UTC

I did an upgrade to 2.4.18-27.7.xsmp, the problem remains:

 11:42am  up 2 days,  2:01, 33 users,  load average: 3.01, 3.29, 3.16
203 processes: 198 sleeping, 5 running, 0 zombie, 0 stopped
CPU0 states: 96.2% user,  3.3% system, 11.3% nice,  0.0% idle
CPU1 states: 95.0% user,  4.2% system, 94.1% nice,  0.0% idle
Mem:  2064336K av, 1933252K used,  131084K free,       0K shrd,  110096K buff
Swap: 2096472K av,       0K used, 2096472K free                 1236936K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
10364 gonnet    39  19  2428 2428   304 R N  95.0  0.1   1:29 Switch4.LI
10405 gonnet    25   0  322M 322M   596 R    84.3 15.9   2:18 mapleTTY
10685 gonnet    39  19  2416 2416   296 R N  12.2  0.1   0:02 Switch4.LI
 1917 root       5 -10  308M  51M  5116 S <   4.1  2.5 154:25 X
20918 gonnet    15   0 16604  16M 14804 R     2.1  0.8   0:14 kdeinit
10678 gonnet    15   0  1212 1212   916 R     1.1  0.0   0:00 top
20857 gonnet    15   0 13768  13M 13012 S     0.1  0.6   4:15 kdeinit
20891 gonnet    15   0 18060  17M 15264 S     0.1  0.8   1:44 kdeinit
20902 gonnet    15   0 16652  16M 14804 S     0.1  0.8   0:15 kdeinit
    1 root      15   0   480  480   420 S     0.0  0.0   0:07 init

You can see that the top process is at nice 19, while the one at nice
0 does not get as much cpu.  In normal circustances, the nice 0 should
get 100% and the other two at nice 19 50% each.

Could someone look into this, please? Thanks.

Comment 3 Marc Schmitt 2003-05-27 12:53:31 UTC

The problem is gone in 2.4.20-13.7smp, I'm closing this bug. Thanks!