Bug 459520 - performance degrade on oltp workload
performance degrade on oltp workload
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
x86_64 All
medium Severity medium
: rc
: ---
Assigned To: Red Hat Kernel Manager
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-19 12:50 EDT by IBM Bug Proxy
Modified: 2008-11-13 15:23 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-11-13 15:23:22 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sysctl -a output (29.02 KB, text/plain)
2008-08-19 12:51 EDT, IBM Bug Proxy
no flags Details
sysctl diff (2.52 KB, text/plain)
2008-08-19 12:51 EDT, IBM Bug Proxy
no flags Details
dunnington oprofile support (440 bytes, text/plain)
2008-08-19 17:20 EDT, IBM Bug Proxy
no flags Details
RHEL 5.2 (2.6.18-92.el5) vmstat (51.75 KB, text/plain)
2008-08-19 17:20 EDT, IBM Bug Proxy
no flags Details
RHEL 5.3 test (2.6.18-103.el5) vmstat (51.73 KB, text/plain)
2008-08-19 17:20 EDT, IBM Bug Proxy
no flags Details
RHEL 5.2 (2.6.18-92.el5) oprofile symbols (167.92 KB, text/plain)
2008-08-19 17:31 EDT, IBM Bug Proxy
no flags Details
RHEL 5.2 (2.6.18-92.el5) oprofile binaries (1.25 KB, text/plain)
2008-08-19 17:31 EDT, IBM Bug Proxy
no flags Details
RHEL 5.3 (2.6.18-103.el5) oprofile symbols (194.16 KB, text/plain)
2008-08-19 17:31 EDT, IBM Bug Proxy
no flags Details
RHEL 5.3 (2.6.18-103.el5) oprofile binaries (1.48 KB, text/plain)
2008-08-19 17:31 EDT, IBM Bug Proxy
no flags Details

  None (edit)
Description IBM Bug Proxy 2008-08-19 12:50:56 EDT
=Comment: #0=================================================
KARL RISTER <krister@us.ibm.com> - 2008-08-14 21:30 EDT
---Problem Description---
On a large OLTP benchmark, a 4.5% performance degrade was observed when
upgrading the kernel from the RHEL5.2 kernel (2.6.18-92.el5) to a pre-release
RHEL 5.3 kernel (2.6.18-103.el5).

The pre-release kernel was built by installing the source rpm and then running
rpmbuild (source was used instead of a binary because this was being done in
anticipation of testing a patch).
 
Contact Information = Karl Rister (kmr@us.ibm.com) / Steve Pratt
(slpratt@us.ibm.com)
 
---Additional Hardware Info---
2 node x3950 M2
8 x Six Core processors (48 cores, 48 threads)
512GB RAM
Large Disk Setup
  80 block devices (each a 24 disk RAID 0)
  Fiber Switches
  8 Dual Port Fiber Channel Adapters
 
---uname output---
Linux itcopus83.austin.ibm.com 2.6.18-103.el5 #1 SMP Tue Aug 12 13:27:11 CDT
2008 x86_64 x86_64 x86_64 GNU/Linux
 
Machine Type = x3950 M2 4RZ-7141
 

---Steps to Reproduce---
N/A
This is a large specialized setup.
 
=Comment: #1=================================================
KARL RISTER <krister@us.ibm.com> - 2008-08-14 21:31 EDT

sysctl -a output

=Comment: #4=================================================
KARL RISTER <krister@us.ibm.com> - 2008-08-15 11:28 EDT
The previous kernel was the RHEL5.2 GA distribution binary.  I am working on
getting detailed profiling information to make comparisons.
=Comment: #5=================================================
KARL RISTER <krister@us.ibm.com> - 2008-08-18 10:54 EDT

sysctl diff

Here is a diff of the sysctl output from 2.6.18-92.el5 to 2.6.18-103.el5.  The
most obvious changes that I see are some new blocks of nfs related parameters. 
We do have some nfs interaction in our test because that is where the binaries
we are loading are located, but any data should be faulted in before the
measurement period begins.
Comment 1 IBM Bug Proxy 2008-08-19 12:51:01 EDT
Created attachment 314561 [details]
sysctl -a output
Comment 2 IBM Bug Proxy 2008-08-19 12:51:05 EDT
Created attachment 314562 [details]
sysctl diff
Comment 3 Ed Pollard 2008-08-19 13:02:41 EDT
Can we get the exact test being run and some before and after kernel upgrade output?

If I remember right from the first I heard about this "TPC C" was the test but it would be good to get exact details about the test run and the results seen rather than just a high level problem definition. We do need to be able to reproduce in house to possibly fix and verify the any possible fix.
Comment 4 IBM Bug Proxy 2008-08-19 17:20:47 EDT
The test being run is on a just disclosed system using Intel Dunnington
processors.  A result was published on tpc.org today for a 1,200,632 tpmC score:

http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=108081902

The published score is a 3 tier run.  For the bug, I have been running in 2 tier
mode which will allow the system to run faster than what the database was built
for.  This was done to facilitate testing of the fastgup patch.  While testing
fastgup I made a baseline score on the RHEL 5.2 kernel and a baseline score on
the RHEL 5.3 kernel before building the 5.3 kernel with the fastgup patch.  The
scores for the 2 baseline runs are:

2 tier TPC-C Score
RHEL 5.2 (2.6.18-92.el5)              1,209,812.48
RHEL 5.3 test (2.6.18-103.el5)        1,157,443.24

The degrade from 5.2 to 5.3 here is measured as 4.3%.  The 4.3% number is lower
than the 4.5% earlier reported because I re-ran the test after applying a small
patch to enable Oprofile support for Dunnington.  I will attach that patch soon.

The most noticeable difference I have found in the profiling data so far is that
the 5.3 configuration has more idle time (shown in vmstat output as iowait)
indicating that the system is not able to drive as hard.  I will also attach
vmstat output for both configs.

Note that the vmstat data includes the rampup period before the measurements are
computed.
Comment 5 IBM Bug Proxy 2008-08-19 17:20:51 EDT
Created attachment 314580 [details]
dunnington oprofile support
Comment 6 IBM Bug Proxy 2008-08-19 17:20:55 EDT
Created attachment 314581 [details]
RHEL 5.2 (2.6.18-92.el5) vmstat
Comment 7 IBM Bug Proxy 2008-08-19 17:20:59 EDT
Created attachment 314582 [details]
RHEL 5.3 test (2.6.18-103.el5) vmstat
Comment 8 IBM Bug Proxy 2008-08-19 17:31:26 EDT
Created attachment 314583 [details]
RHEL 5.2 (2.6.18-92.el5) oprofile symbols
Comment 9 IBM Bug Proxy 2008-08-19 17:31:31 EDT
Created attachment 314584 [details]
RHEL 5.2 (2.6.18-92.el5) oprofile binaries
Comment 10 IBM Bug Proxy 2008-08-19 17:31:36 EDT
Created attachment 314585 [details]
RHEL 5.3 (2.6.18-103.el5) oprofile symbols
Comment 11 IBM Bug Proxy 2008-08-19 17:31:40 EDT
Created attachment 314586 [details]
RHEL 5.3 (2.6.18-103.el5) oprofile binaries
Comment 12 IBM Bug Proxy 2008-08-27 10:01:37 EDT
It turns out that this is not a RHEL 5.3 kernel bug.  A bug in a script that
tunes the scheduling priority of the DB2 log writer thread is at fault.  When
the thread is tuned correctly the performance returns to where it should be.  It
is unclear why this happened when the kernel was changed, but the logic in the
script is sufficiently broken for me to say that this is not a kernel problem.

I am changing the status to not a bug and will wait awhile to close it out in
case anyone has additional comments.
Comment 13 IBM Bug Proxy 2008-08-27 10:32:41 EDT
Closing

Note You need to log in before you can comment on or make changes to this bug.