Bug 181815 - Phantom escalating load due to flawed rq->nr_uninterruptible increment
Phantom escalating load due to flawed rq->nr_uninterruptible increment
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Ingo Molnar
Brian Brock
http://lkml.org/lkml/2004/11/16/78
:
Depends On:
Blocks: RHEL3U8CanFix 186960
  Show dependency treegraph
 
Reported: 2006-02-16 14:49 EST by Will DeHaan
Modified: 2007-11-30 17:07 EST (History)
8 users (show)

See Also:
Fixed In Version: RHSA-2006-0437
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-07-20 09:49:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
scheduler patch to fix escalating load issue (857 bytes, patch)
2006-02-16 14:49 EST, Will DeHaan
no flags Details | Diff

  None (edit)
Description Will DeHaan 2006-02-16 14:49:21 EST
Description of problem:

Load averages can creep up in a stair-step fashion representing a phantom
minimum load on a system. The lowest load value increases from the healthy 0 to
very high numbers, on occasion given the appropriate process and IO load.

This scheduler accounting issue has no application outside of reported load so
only applications that are os load aware are impacted by this bug.

This issue has been discovered and resolved in 2.6.10 kernels by Ingo Molnar.
I've backported that fix to RH's 2.4.21-37* kernels in the attached diff which I
apply into the kernel RPMs as patch52. The URL field for this bug references
Ingo's 2.6 fix from late 2004

Version-Release number of selected component (if applicable):

2.4.21-37.0.1.EL and earlier

How reproducible:

Readily in high process count SMP systems with considerable blocked disk, lan IO

Steps to Reproduce:
1. Run many processes that regularly become uninterruptible on an SMP system
2. Watch the load averages/peek at rq->nr_uninterruptible and watch it increment
sporadically
  
Actual results:

Load averages will increase in a stair-step fashion within 0 to several weeks
under load. Disabling production applications on an affected system will return
its load averages to a baseline integer value >0

Expected results:

Load averages should always return to near-zero on an idle system

Additional info:

Numerous Proofpoint Inc. customers have encountered this issue on production
RHEL3U5,6/Sendmail/Proofpoint/MySQL servers. Sendmail implementations typically
run 600 to 2000 children. Network utilization is generally quite low, disk IO
blocks processes often. 

Please consider integrating my patch or passing feedback to me, thanks. 
Will DeHaan <will@aggro.us>
Comment 1 Will DeHaan 2006-02-16 14:49:21 EST
Created attachment 124779 [details]
scheduler patch to fix escalating load issue
Comment 30 Ernie Petrides 2006-06-14 19:32:35 EDT
A fix for this problem was committed to the RHEL3 U8
patch pool on 9-Jun-2006 (in kernel version 2.4.21-44.EL).
Comment 38 Red Hat Bugzilla 2006-07-20 09:49:58 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0437.html

Note You need to log in before you can comment on or make changes to this bug.