Description of problem: With the EL3 (WS3) release the sceduler do not handle nice processes propper on a smp system. Here is a snipplet of the process state: CPU states: cpu user nice system irq softirq iowait idle total 99.8% 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% cpu00 0.0% 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% cpu01 99.9% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 22143 darwin 39 19 54436 53M 42304 R N 50.0 10.6 7:36 0 darwin 21249 darwin 39 19 58460 57M 42300 R N 49.9 11.4 21:51 0 darwin 22431 darwin 25 0 147M 147M 104M R 49.9 29.6 3:51 1 darwin 21676 darwin 25 0 46220 45M 384 R 49.8 9.0 9:46 1 darwin As you can see the nice processes sitts on one CPU and the non nice on the other. This is wrong and can only be found in the 2.4 kernel of redhat. The vanilla kernel do correct. With the above processes the nice processes should not get 50% of all CPU power together. Version-Release number of selected component (if applicable): 2.4.x How reproducible: This bug is hardely to reproduce as it only appere sometimes. I did a small programm to test it but I cannot prove that this will show the bug: #include <stdio.h> #include <unistd.h> int main(int argc, char **argv) { unsigned long pid; if ((pid = fork()) == 0) { nice(19); for (;;); } if ((pid = fork()) == 0) { nice(19); for (;;); } sleep(30); if ((pid = fork()) == 0) { for (;;); } for (;;); return 0; } // int main(int argc, char **argv... I did some research and found that you just apply the O(1) patch to your 2.4.x kernels. I think this bug is somewhat buggy and make more problemes than solving. But I'm not absolutely sure that this is the problem.
Let me add some additional info to the above. The problem is difficult to reproduce because it requires two cpus and 4 process (2 normal and 2 niced) to show up. Then it will happen when the 2 niced processes are assigned to one cpu and the 2 non-niced processes are assigned to the other cpu. All the processes then get 50% of the cpu. In all other kernels this is recognized or rotated sufficiently often so that it does not become a problem. In the long run the niced processes get about 10% of the cpu. In 2.4 they stay like that forever. This is serious in our setup where we *always* have processes running in the background at low priority (niced). Best wishes, Gaston Gonnet.
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.