Description of problem: Load averages can creep up in a stair-step fashion representing a phantom minimum load on a system. The lowest load value increases from the healthy 0 to very high numbers, on occasion given the appropriate process and IO load. This scheduler accounting issue has no application outside of reported load so only applications that are os load aware are impacted by this bug. This issue has been discovered and resolved in 2.6.10 kernels by Ingo Molnar. I've backported that fix to RH's 2.4.21-37* kernels in the attached diff which I apply into the kernel RPMs as patch52. The URL field for this bug references Ingo's 2.6 fix from late 2004 Version-Release number of selected component (if applicable): 2.4.21-37.0.1.EL and earlier How reproducible: Readily in high process count SMP systems with considerable blocked disk, lan IO Steps to Reproduce: 1. Run many processes that regularly become uninterruptible on an SMP system 2. Watch the load averages/peek at rq->nr_uninterruptible and watch it increment sporadically Actual results: Load averages will increase in a stair-step fashion within 0 to several weeks under load. Disabling production applications on an affected system will return its load averages to a baseline integer value >0 Expected results: Load averages should always return to near-zero on an idle system Additional info: Numerous Proofpoint Inc. customers have encountered this issue on production RHEL3U5,6/Sendmail/Proofpoint/MySQL servers. Sendmail implementations typically run 600 to 2000 children. Network utilization is generally quite low, disk IO blocks processes often. Please consider integrating my patch or passing feedback to me, thanks. Will DeHaan <will>
Created attachment 124779 [details] scheduler patch to fix escalating load issue
A fix for this problem was committed to the RHEL3 U8 patch pool on 9-Jun-2006 (in kernel version 2.4.21-44.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0437.html