Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 459520

Summary:

performance degrade on oltp workload

Product:

Red Hat Enterprise Linux 5

Reporter:

IBM Bug Proxy <bugproxy>

Component:

kernel

Assignee:

Red Hat Kernel Manager <kernel-mgr>

Status:

CLOSED NOTABUG

QA Contact:

Martin Jenner <mjenner>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

5.3

CC:

epollard, jmoyer

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

All

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-11-13 20:23:22 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
sysctl -a output	none
sysctl diff	none
dunnington oprofile support	none
RHEL 5.2 (2.6.18-92.el5) vmstat	none
RHEL 5.3 test (2.6.18-103.el5) vmstat	none
RHEL 5.2 (2.6.18-92.el5) oprofile symbols	none
RHEL 5.2 (2.6.18-92.el5) oprofile binaries	none
RHEL 5.3 (2.6.18-103.el5) oprofile symbols	none
RHEL 5.3 (2.6.18-103.el5) oprofile binaries	none

Description IBM Bug Proxy 2008-08-19 16:50:56 UTC

=Comment: #0=================================================
KARL RISTER <krister.com> - 2008-08-14 21:30 EDT
---Problem Description---
On a large OLTP benchmark, a 4.5% performance degrade was observed when
upgrading the kernel from the RHEL5.2 kernel (2.6.18-92.el5) to a pre-release
RHEL 5.3 kernel (2.6.18-103.el5).

The pre-release kernel was built by installing the source rpm and then running
rpmbuild (source was used instead of a binary because this was being done in
anticipation of testing a patch).
 
Contact Information = Karl Rister (kmr.com) / Steve Pratt
(slpratt.com)
 
---Additional Hardware Info---
2 node x3950 M2
8 x Six Core processors (48 cores, 48 threads)
512GB RAM
Large Disk Setup
  80 block devices (each a 24 disk RAID 0)
  Fiber Switches
  8 Dual Port Fiber Channel Adapters
 
---uname output---
Linux itcopus83.austin.ibm.com 2.6.18-103.el5 #1 SMP Tue Aug 12 13:27:11 CDT
2008 x86_64 x86_64 x86_64 GNU/Linux
 
Machine Type = x3950 M2 4RZ-7141
 

---Steps to Reproduce---
N/A
This is a large specialized setup.
 
=Comment: #1=================================================
KARL RISTER <krister.com> - 2008-08-14 21:31 EDT

sysctl -a output

=Comment: #4=================================================
KARL RISTER <krister.com> - 2008-08-15 11:28 EDT
The previous kernel was the RHEL5.2 GA distribution binary.  I am working on
getting detailed profiling information to make comparisons.
=Comment: #5=================================================
KARL RISTER <krister.com> - 2008-08-18 10:54 EDT

sysctl diff

Here is a diff of the sysctl output from 2.6.18-92.el5 to 2.6.18-103.el5.  The
most obvious changes that I see are some new blocks of nfs related parameters. 
We do have some nfs interaction in our test because that is where the binaries
we are loading are located, but any data should be faulted in before the
measurement period begins.

Comment 1 IBM Bug Proxy 2008-08-19 16:51:01 UTC

Created attachment 314561 [details]
sysctl -a output

Comment 2 IBM Bug Proxy 2008-08-19 16:51:05 UTC

Created attachment 314562 [details]
sysctl diff

Comment 3 Ed Pollard 2008-08-19 17:02:41 UTC

Can we get the exact test being run and some before and after kernel upgrade output?

If I remember right from the first I heard about this "TPC C" was the test but it would be good to get exact details about the test run and the results seen rather than just a high level problem definition. We do need to be able to reproduce in house to possibly fix and verify the any possible fix.

Comment 4 IBM Bug Proxy 2008-08-19 21:20:47 UTC

The test being run is on a just disclosed system using Intel Dunnington
processors.  A result was published on tpc.org today for a 1,200,632 tpmC score:

http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=108081902

The published score is a 3 tier run.  For the bug, I have been running in 2 tier
mode which will allow the system to run faster than what the database was built
for.  This was done to facilitate testing of the fastgup patch.  While testing
fastgup I made a baseline score on the RHEL 5.2 kernel and a baseline score on
the RHEL 5.3 kernel before building the 5.3 kernel with the fastgup patch.  The
scores for the 2 baseline runs are:

2 tier TPC-C Score
RHEL 5.2 (2.6.18-92.el5)              1,209,812.48
RHEL 5.3 test (2.6.18-103.el5)        1,157,443.24

The degrade from 5.2 to 5.3 here is measured as 4.3%.  The 4.3% number is lower
than the 4.5% earlier reported because I re-ran the test after applying a small
patch to enable Oprofile support for Dunnington.  I will attach that patch soon.

The most noticeable difference I have found in the profiling data so far is that
the 5.3 configuration has more idle time (shown in vmstat output as iowait)
indicating that the system is not able to drive as hard.  I will also attach
vmstat output for both configs.

Note that the vmstat data includes the rampup period before the measurements are
computed.

Comment 5 IBM Bug Proxy 2008-08-19 21:20:51 UTC

Created attachment 314580 [details]
dunnington oprofile support

Comment 6 IBM Bug Proxy 2008-08-19 21:20:55 UTC

Created attachment 314581 [details]
RHEL 5.2 (2.6.18-92.el5) vmstat

Comment 7 IBM Bug Proxy 2008-08-19 21:20:59 UTC

Created attachment 314582 [details]
RHEL 5.3 test (2.6.18-103.el5) vmstat

Comment 8 IBM Bug Proxy 2008-08-19 21:31:26 UTC

Created attachment 314583 [details]
RHEL 5.2 (2.6.18-92.el5) oprofile symbols

Comment 9 IBM Bug Proxy 2008-08-19 21:31:31 UTC

Created attachment 314584 [details]
RHEL 5.2 (2.6.18-92.el5) oprofile binaries

Comment 10 IBM Bug Proxy 2008-08-19 21:31:36 UTC

Created attachment 314585 [details]
RHEL 5.3 (2.6.18-103.el5) oprofile symbols

Comment 11 IBM Bug Proxy 2008-08-19 21:31:40 UTC

Created attachment 314586 [details]
RHEL 5.3 (2.6.18-103.el5) oprofile binaries

Comment 12 IBM Bug Proxy 2008-08-27 14:01:37 UTC

It turns out that this is not a RHEL 5.3 kernel bug.  A bug in a script that
tunes the scheduling priority of the DB2 log writer thread is at fault.  When
the thread is tuned correctly the performance returns to where it should be.  It
is unclear why this happened when the kernel was changed, but the logic in the
script is sufficiently broken for me to say that this is not a kernel problem.

I am changing the status to not a bug and will wait awhile to close it out in
case anyone has additional comments.

Comment 13 IBM Bug Proxy 2008-08-27 14:32:41 UTC

Closing