Red Hat Bugzilla – Bug 124450
RHEL kernel fails to maintain scheduling and throughput under load
Last modified: 2007-11-30 17:07:02 EST
Description of problem:
As part of the release tests for LifeKeeper (the SteelEye HA product),
we perform stress acceptance tests of the operating system.
These tests involve taking a heavy duty array (MSA100/EVA/MA8000) with
a large number of LUNs and running a stress test simultaneously on
each of the LUNS.
The stress tests are simple tar, untar and checksum of a fixed pool of
data (all residing on the LUN).
Previously, RHAS 2.1 (a sparse set of kernels up to 2.4.9-e.40 were
tested) was able to withstand the full 32 lun load.
No version of RHEL 3.0 has managed to get beyond about 8 LUNs without
Our failure criteria are defined by the LifeKeeper product. Either
the communications fail, which means on a running system that the
communications process (which is installed reniced to realtime
priority and mlocked into memory) was not scheduled for a period of 15
seconds, or that LifeKeeper believes it has lost contact with one of
its discs (each disc is pinged every few seconds using an INQUIRY, a
failure occurs if there's no response to the INQUIRY after 120 seconds).
With RHEL, both of these failures have been observed.
We've tried starting out with one LUN of stress and gradually
increasing a LUN at a time. Under this type of load increase, we see
the times taken for I/Os to complete to rise dramatically after about
2 LUNS (using both iostat and the simple INQUIRY timing that
LifeKeeper does). We also observe the system to go down to less than
one MB of free memory (the remainder all residing in the cache), the
system time to rise to around 70% and the iowait time to fall to about
sar -b reports the number of transactions per second to remain
constant at between 1 and 2 (by contrast, with RHAS2.1 the tps rises
linearly with LUNS until it levels out at about 120).
Version-Release number of selected component (if applicable):
we've tried this with the three default kernels from RHEL3.0, U1 and
U2 with no appreciable differences in the results.
We also tried altering various kernel tuning parameters:
elvtune -r 4 -w 4 on all the devices
echo 100 > /proc/sys/vm/overcommit_ratio
We also tried reducing the depth of the tags on the qla2340 cards down
all the way to 8 in the drivers
None of these produced any appreciable effects
All the time
The failing systems are IBM 330's with 1GB of ram and two CPUs the
storage is FASTt 200 optical SAN using a brocade switch and qla2340
The CPUs are:
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1133MHz
stepping : 1
cpu MHz : 1128.596
cache size : 512 KB
We also noted that SMP kernels seemed to withstand much less stress
(3-4 LUNS) than UP kernels (which could get up to 7-8 LUNS before
we have seen the same problem on rhel3 with ocfs.
in our case we had about 20 luns even u to 50
Could be IO elevator, could be SCSI midlayer, could be the HBA driver.
Assigning to both Tom Coughlan and Doug Ledford, who've done work on
these kernel subsystems...
Rik, it could be any of those three, but that wouldn't explain the
system time going through the roof I don't think.
James, can you boot up this machine with kernel profiling enabled,
load it up until it's doing this exact thing, then zero out the
profile, let it run this way for a minute or so, then get the profile
data and post that here? I'd like to know what part of the kernel we
are spending our time in before doing any guess work as to what the
Created attachment 100979 [details]
The readprofile output was taken from a 2.4.21-15.EL machine a few
minutes into a tiobench. readprofile was issued prior to tiobench to
During the readprofile/tiobench run was the system exhibiting very
poor throughput and scheduling behavior?
hmmm... what would you qualify as poor thrughput in terms of tiobench
This indeed seems similar to bug 121434, where a similar issue has
been observed on production systems with practical applications.
James, could you attach your script that does the synthetic
stress-testing benchmark? I could test one of my systems overnight to
see if I get similar results.
Posting this comment just in case anyone on this bugzilla is not also
watching bugzilla 121434. There was a set of test kernel rpms posted
in bugzilla 121434 that *might* also have an impact on this issue. No
guarantees, but tests and feedback appreciated.
Are there any parties interested in this bug yet? Upgrade to RHEL4 solves the
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
For more information of the RHEL errata support policy, please visit:
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.