Bug 124450
Summary: | RHEL kernel fails to maintain scheduling and throughput under load | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | James Bottomley <james.bottomley> | ||||
Component: | kernel | Assignee: | Tom Coughlan <coughlan> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | CC: | ask, barryn, buckh, bugs-redhat, dledford, k.georgiou, ksnider, petrides, richard.cunningham, riel, sct, tao, van.okamura | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-10-19 19:25:24 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
James Bottomley
2004-05-26 17:17:43 UTC
we have seen the same problem on rhel3 with ocfs. in our case we had about 20 luns even u to 50 Could be IO elevator, could be SCSI midlayer, could be the HBA driver. Assigning to both Tom Coughlan and Doug Ledford, who've done work on these kernel subsystems... Rik, it could be any of those three, but that wouldn't explain the system time going through the roof I don't think. James, can you boot up this machine with kernel profiling enabled, load it up until it's doing this exact thing, then zero out the profile, let it run this way for a minute or so, then get the profile data and post that here? I'd like to know what part of the kernel we are spending our time in before doing any guess work as to what the cause is. Created attachment 100979 [details]
readprofile output
The readprofile output was taken from a 2.4.21-15.EL machine a few minutes into a tiobench. readprofile was issued prior to tiobench to clear counts. During the readprofile/tiobench run was the system exhibiting very poor throughput and scheduling behavior? hmmm... what would you qualify as poor thrughput in terms of tiobench values? This indeed seems similar to bug 121434, where a similar issue has been observed on production systems with practical applications. James, could you attach your script that does the synthetic stress-testing benchmark? I could test one of my systems overnight to see if I get similar results. Posting this comment just in case anyone on this bugzilla is not also watching bugzilla 121434. There was a set of test kernel rpms posted in bugzilla 121434 that *might* also have an impact on this issue. No guarantees, but tests and feedback appreciated. Are there any parties interested in this bug yet? Upgrade to RHEL4 solves the problem AFAIK. This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you. |