Bug 124450

Summary: RHEL kernel fails to maintain scheduling and throughput under load
Product: Red Hat Enterprise Linux 3 Reporter: James Bottomley <james.bottomley>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: ask, barryn, buckh, bugs-redhat, dledford, k.georgiou, ksnider, petrides, richard.cunningham, riel, sct, tao, van.okamura
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-19 15:25:24 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
readprofile output none

Description James Bottomley 2004-05-26 13:17:43 EDT
Description of problem:

As part of the release tests for LifeKeeper (the SteelEye HA product),
we perform stress acceptance tests of the operating system.

These tests involve taking a heavy duty array (MSA100/EVA/MA8000) with
a large number of LUNs and running a stress test simultaneously on
each of the LUNS.

The stress tests are simple tar, untar and checksum of a fixed pool of
data (all residing on the LUN).

Previously, RHAS 2.1 (a sparse set of kernels up to 2.4.9-e.40 were
tested) was able to withstand the full 32 lun load.

No version of RHEL 3.0 has managed to get beyond about 8 LUNs without
failing.

Our failure criteria are defined by the LifeKeeper product.  Either
the communications fail, which means on a running system that the
communications process (which is installed reniced to realtime
priority and mlocked into memory) was not scheduled for a period of 15
seconds, or that LifeKeeper believes it has lost contact with one of
its discs (each disc is pinged every few seconds using an INQUIRY, a
failure occurs if there's no response to the INQUIRY after 120 seconds).

With RHEL, both of these failures have been observed.

We've tried starting out with one LUN of stress and gradually
increasing a LUN at a time.  Under this type of load increase, we see
the times taken for I/Os to complete to rise dramatically after about
2 LUNS (using both iostat and the simple INQUIRY timing that
LifeKeeper does).  We also observe the system to go down to less than
one MB of free memory (the remainder all residing in the cache), the
system time to rise to around 70% and the iowait time to fall to about
zero.

sar -b reports the number of transactions per second to remain
constant at between 1 and 2 (by contrast, with RHAS2.1 the tps rises
linearly with LUNS until it levels out at about 120).


Version-Release number of selected component (if applicable):

we've tried this with the three default kernels from RHEL3.0, U1 and
U2 with no appreciable differences in the results.

We also tried altering various kernel tuning parameters:

elvtune -r 4 -w 4 on all the devices
echo 100 > /proc/sys/vm/overcommit_ratio

We also tried reducing the depth of the tags on the qla2340 cards down
 all the way to 8 in the drivers

None of these produced any appreciable effects


How reproducible:

All the time

Additional info:

The failing systems are IBM 330's with 1GB of ram and two CPUs the
storage is FASTt 200 optical SAN using a brocade switch and qla2340
fibre cards.

The CPUs are:

cpu family      : 6
model           : 11
model name      : Intel(R) Pentium(R) III CPU family      1133MHz
stepping        : 1
cpu MHz         : 1128.596
cache size      : 512 KB

We also noted that SMP kernels seemed to withstand much less stress
(3-4 LUNS) than UP kernels (which could get up to 7-8 LUNS before
failing).
Comment 1 Wim Coekaerts 2004-06-01 22:44:33 EDT
we have seen the same problem on rhel3 with ocfs.
in our case we had about 20 luns even u to 50
Comment 2 Rik van Riel 2004-06-01 23:56:12 EDT
Could be IO elevator, could be SCSI midlayer, could be the HBA driver.
Assigning to both Tom Coughlan and Doug Ledford, who've done work on
these kernel subsystems...
Comment 3 Doug Ledford 2004-06-02 09:15:23 EDT
Rik, it could be any of those three, but that wouldn't explain the
system time going through the roof I don't think.

James, can you boot up this machine with kernel profiling enabled,
load it up until it's doing this exact thing, then zero out the
profile, let it run this way for a minute or so, then get the profile
data and post that here?  I'd like to know what part of the kernel we
are spending our time in before doing any guess work as to what the
cause is.
Comment 4 Kevin Krafthefer 2004-06-08 16:55:44 EDT
Created attachment 100979 [details]
readprofile output
Comment 5 Kevin Krafthefer 2004-06-08 16:57:18 EDT
The readprofile output was taken from a 2.4.21-15.EL machine a few
minutes into a tiobench. readprofile was issued prior to tiobench to
clear counts.
Comment 6 Doug Ledford 2004-06-09 10:08:30 EDT
During the readprofile/tiobench run was the system exhibiting very
poor throughput and scheduling behavior?
Comment 7 Kevin Krafthefer 2004-06-11 18:38:04 EDT
hmmm... what would you qualify as poor thrughput in terms of tiobench
values?
Comment 9 Aleksander Adamowski 2004-06-13 17:21:05 EDT
This indeed seems similar to bug 121434, where a similar issue has
been observed on production systems with practical applications.

James, could you attach your script that does the synthetic 
stress-testing benchmark? I could test one of my systems overnight to
see if I get similar results.
Comment 10 Doug Ledford 2004-09-02 13:04:39 EDT
Posting this comment just in case anyone on this bugzilla is not also
watching bugzilla 121434.  There was a set of test kernel rpms posted
in bugzilla 121434 that *might* also have an impact on this issue.  No
guarantees, but tests and feedback appreciated.
Comment 13 Aleksander Adamowski 2007-09-07 13:15:32 EDT
Are there any parties interested in this bug yet? Upgrade to RHEL4 solves the
problem AFAIK.
Comment 14 RHEL Product and Program Management 2007-10-19 15:25:24 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.