Bug 459520
| Summary: | performance degrade on oltp workload | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | IBM Bug Proxy <bugproxy> |
| Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> |
| Status: | CLOSED NOTABUG | QA Contact: | Martin Jenner <mjenner> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 5.3 | CC: | epollard, jmoyer |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2008-11-13 20:23:22 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
IBM Bug Proxy
2008-08-19 16:50:56 UTC
Created attachment 314561 [details]
sysctl -a output
Created attachment 314562 [details]
sysctl diff
Can we get the exact test being run and some before and after kernel upgrade output? If I remember right from the first I heard about this "TPC C" was the test but it would be good to get exact details about the test run and the results seen rather than just a high level problem definition. We do need to be able to reproduce in house to possibly fix and verify the any possible fix. The test being run is on a just disclosed system using Intel Dunnington processors. A result was published on tpc.org today for a 1,200,632 tpmC score: http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=108081902 The published score is a 3 tier run. For the bug, I have been running in 2 tier mode which will allow the system to run faster than what the database was built for. This was done to facilitate testing of the fastgup patch. While testing fastgup I made a baseline score on the RHEL 5.2 kernel and a baseline score on the RHEL 5.3 kernel before building the 5.3 kernel with the fastgup patch. The scores for the 2 baseline runs are: 2 tier TPC-C Score RHEL 5.2 (2.6.18-92.el5) 1,209,812.48 RHEL 5.3 test (2.6.18-103.el5) 1,157,443.24 The degrade from 5.2 to 5.3 here is measured as 4.3%. The 4.3% number is lower than the 4.5% earlier reported because I re-ran the test after applying a small patch to enable Oprofile support for Dunnington. I will attach that patch soon. The most noticeable difference I have found in the profiling data so far is that the 5.3 configuration has more idle time (shown in vmstat output as iowait) indicating that the system is not able to drive as hard. I will also attach vmstat output for both configs. Note that the vmstat data includes the rampup period before the measurements are computed. Created attachment 314580 [details]
dunnington oprofile support
Created attachment 314581 [details]
RHEL 5.2 (2.6.18-92.el5) vmstat
Created attachment 314582 [details]
RHEL 5.3 test (2.6.18-103.el5) vmstat
Created attachment 314583 [details]
RHEL 5.2 (2.6.18-92.el5) oprofile symbols
Created attachment 314584 [details]
RHEL 5.2 (2.6.18-92.el5) oprofile binaries
Created attachment 314585 [details]
RHEL 5.3 (2.6.18-103.el5) oprofile symbols
Created attachment 314586 [details]
RHEL 5.3 (2.6.18-103.el5) oprofile binaries
It turns out that this is not a RHEL 5.3 kernel bug. A bug in a script that tunes the scheduling priority of the DB2 log writer thread is at fault. When the thread is tuned correctly the performance returns to where it should be. It is unclear why this happened when the kernel was changed, but the logic in the script is sufficiently broken for me to say that this is not a kernel problem. I am changing the status to not a bug and will wait awhile to close it out in case anyone has additional comments. Closing |