Description of problem: A large financial customer is seeing poor performance from the RHEL4 scheduler in terms of sub millisecond CPU response. The application is a high frequency financial trading application. The customer kernel developer changed the code to allocate the job to free CPU's and managed to reduce latency in their application from milliseconds to microseconds. Red Hat has identified the following steps: 1) Research kernel patches for scheduler optimization from customer and share with Red Hat kernel team for review 2) Test kernel patches against real production applications and assess potential benefits and/or drawbacks, including assessment of potential performance regressions 3) Leverage scheduler optimizations in future RHEL updates/versions as appropriate 4) Help push Red Hat-reviewed code changes upstream for Fedora 10/11 and Red Hat Enterprise Linux 6 The customer has supplied Red Hat with their patch.
Attached is improved performance by Tom Tracy with Wombat, on a kernel from Larry Woodman. Completed testing the scheduler patch using Wombat. Throughput increased from 44K to 74K. Latency is comparable to RHEL5.2 I have attached results showing the throughput and latency comparisons with the scheduler patch.
Created attachment 313402 [details] Wombat Performance w/ 2-1Gbit nics on Intel quad-core
Looking at the patch that was posted to the list, we may want to differentiate between static (realtime) priority threads and dynamic priority threads for the default behavior of the migrate-on-clone code. For realtime threads, it's safe to assume that latency is top priority, so we should probably enable it by default there. For dynamic priority threads, there will often be a throughput benefit to avoiding the migration, due to cache effects, as well as a power saving benefit to keeping cores idle longer, so it should be disabled by default there. Since we're going to have a sysctl tunable anyway, we might as well default to the settings that will make the most people happy.
Updating PM score.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Committed in 78.27.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: A new allowable value was added to the /proc/sys/kernel/wake_balance tunable parameter. Setting wake_balance to 2 will cause the scheduler to run the thread being awakened on any avaialble CPU rather than scheduling it on the optimal CPU based on a combination of cache footprint and idleness of the CPU in question. This will cause the scheduler to reduce the overall latencey even at the cost of total system throughput. Large financial applications that experience poor latencey performance from the RHEL4 scheduler and would like to see sub millisecond CPU response times should set /proc/sys/kernel/wake_balance = 2.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,5 +1 @@ -A new allowable value was added to the /proc/sys/kernel/wake_balance tunable parameter. +A new allowable value has been added to the /proc/sys/kernel/wake_balance tunable parameter. Setting wake_balance to a value of 2 will instruct the scheduler to run the thread on any available CPU rather than scheduling it on the optimal CPU. Setting this kernel parameter to 2 will force the scheduler to reduce the overall latency even at the cost of total system throughput.-Setting wake_balance to 2 will cause the scheduler to run the thread being awakened on any avaialble CPU rather than scheduling it on the optimal CPU based on a combination of cache footprint and idleness of the CPU in question. This will cause the scheduler to reduce the overall latencey even at the cost of total system throughput. - -Large financial applications that experience poor latencey performance from the RHEL4 scheduler -and would like to see sub millisecond CPU response times should set /proc/sys/kernel/wake_balance = 2.
Any updates here? Has this issue been resolved in the RHEL 4.8 Beta? later kernel?
We have tested this for performance in 4.8 and ack this for 4.8. /proc/sys/kernel/wake_balance = 2.
Chris, this change is in RHEL4-U8. You enable it by setting proc/sys/kernel/wake_balance = 2. Larry Woodman
Sorry for the confusion. I meant to ask whether this issue had been tested by QA, customer or partner and if so, whether or not it has been VERIFIED.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html