Red Hat Bugzilla – Bug 181815
Phantom escalating load due to flawed rq->nr_uninterruptible increment
Last modified: 2007-11-30 17:07:09 EST
Description of problem:
Load averages can creep up in a stair-step fashion representing a phantom
minimum load on a system. The lowest load value increases from the healthy 0 to
very high numbers, on occasion given the appropriate process and IO load.
This scheduler accounting issue has no application outside of reported load so
only applications that are os load aware are impacted by this bug.
This issue has been discovered and resolved in 2.6.10 kernels by Ingo Molnar.
I've backported that fix to RH's 2.4.21-37* kernels in the attached diff which I
apply into the kernel RPMs as patch52. The URL field for this bug references
Ingo's 2.6 fix from late 2004
Version-Release number of selected component (if applicable):
2.4.21-37.0.1.EL and earlier
Readily in high process count SMP systems with considerable blocked disk, lan IO
Steps to Reproduce:
1. Run many processes that regularly become uninterruptible on an SMP system
2. Watch the load averages/peek at rq->nr_uninterruptible and watch it increment
Load averages will increase in a stair-step fashion within 0 to several weeks
under load. Disabling production applications on an affected system will return
its load averages to a baseline integer value >0
Load averages should always return to near-zero on an idle system
Numerous Proofpoint Inc. customers have encountered this issue on production
RHEL3U5,6/Sendmail/Proofpoint/MySQL servers. Sendmail implementations typically
run 600 to 2000 children. Network utilization is generally quite low, disk IO
blocks processes often.
Please consider integrating my patch or passing feedback to me, thanks.
Will DeHaan <email@example.com>
Created attachment 124779 [details]
scheduler patch to fix escalating load issue
A fix for this problem was committed to the RHEL3 U8
patch pool on 9-Jun-2006 (in kernel version 2.4.21-44.EL).
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.