Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 463652

Summary:	[LTC 6.0 FEAT] 201300:Thread scalability issues with TPC-C
Product:	Red Hat Enterprise Linux 6	Reporter:	IBM Bug Proxy <bugproxy>
Component:	kernel	Assignee:	James Takahashi (IBM) <nobody+PNT0273897>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.0	CC:	dshaks, ejratl, jjarvis, mgahagan, notting, peterm, snagar
Target Milestone:	alpha	Keywords:	FutureFeature, OtherQA
Target Release:	6.0
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:	kernel-2.6.31-1	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-11-15 14:08:19 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	356741, 465489, 531073, 554559, 555224

Description IBM Bug Proxy 2008-09-24 05:10:58 UTC

=Comment: #1=================================================
Emily J. Ratliff <emilyr.com> - 2008-09-19 13:42 EDT
1. Feature Overview:
Feature Id:	[201300]
a. Name of Feature:	Thread scalability issues with TPC-C
b. Feature Description
Improve thread scalability for TPC-C benchmarking, in particular via reduction
of DIO induced mmap_sem contention and lock contention in follow_hugetlb_page().

Additional Comments:	RHEL 5.3 integration is being tracked in RHBZ
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=447649 in MODIFIED state as
of 9/16/2008 so this will be a validation only request if it makes the 5.3 release.

2. Feature Details:
Sponsor:	LTC
Architectures:
x86
x86_64
ppc64

Arch Specificity: Both
Affects Core Kernel: Yes
Delivery Mechanism: Direct from community
Category:	Kernel
Request Type:	Kernel - Performance Enhancement from Upstream
d. Upstream Acceptance:	Accepted
Sponsor Priority	1
f. Severity: High
IBM Confidential:	no
Code Contribution:	3rd party code
g. Component Version Target:	2.6.27
Performance Assistance:	yes

3. Business Case
New threaded performance issues are being discovered today by high end TPC-C
benchmarks.  Addressing these types of bugs as early as possible saves money and
positions the distro to be as scalable as possible during its lifetime, which we
expect to witness proliferation of very multicore processors and threaded
applications. Also key for DB2 customers running the new threaded model. The
performance impacts varies depending on the configuration. This feature could
boost performance for the upcoming 2 node Dunnington TPC-C publish by up to 10%.
In 5.3 early test kernels we measured just over a 2% gain.

4. Primary contact at Red Hat: 
John Jarvis
jjarvis

5. Primary contacts at Partner:
Project Management Contact:
Michael Hohnbaum, hbaum.com, 503-578-5486

Technical contact(s):
Badari Pulavarty, badari.com
Vaidyanathan Srinivasan, svaidyan.com

IBM Manager:
Pat Gaughen, gaughen.com

Comment 1 Bill Nottingham 2008-10-02 20:59:29 UTC

Validation-only request - setting as MODIFIED.

The feature requested has already been accepted into the upstream code base
planned for the next major release of Red Hat Enterprise Linux.

When the next milestone release of Red Hat Enterprise Linux 6 is available,
please verify that the feature requested is present and functioning as
desired.

Comment 2 IBM Bug Proxy 2008-10-07 00:45:59 UTC

Changing the bug owner on the IBM side to shaggy.com

Comment 3 IBM Bug Proxy 2009-03-02 21:00:28 UTC

upstream in 2.6.27
sha1 id: ce0ad7f0952581ba75ab6aee55bb1ed9bb22cf4f

Comment 4 John Jarvis 2009-11-10 20:57:34 UTC

Is IBM planning to run RHEL 6 through TPC-C testing prior to release such that you would be able to provide feedback on this feature?

Comment 5 IBM Bug Proxy 2009-11-11 21:51:21 UTC

------- Comment From slpratt.com 2009-11-11 16:46 EDT-------
Have done testing of OLTP workload on RHEL6 Alpha1 base 2.6.29.4r5 as well as moving up to 2.6.31-rc kernel levels.

Base RHEL6 is 16% regressed from sles10/rhel5
Disabling some of the debug option in kernel config reduce regression to 10%
Changing from SLUB to SLAB reduces regression to 7%

Most of remaining regression appears to be caused by higher CPU consumption in scheduler functions.

An option to revert the process scheduler to O1 would be good.

Comment 6 IBM Bug Proxy 2009-11-13 14:52:23 UTC

------- Comment From slpratt.com 2009-11-13 09:41 EDT-------
Setting CONFIG_SCHED_DEBUG which is required to expose the CFS tunables, results in a 2% degradation.

Comment 7 IBM Bug Proxy 2009-11-16 00:10:36 UTC

------- Comment From yeohc.com 2009-11-15 19:09 EDT-------
A couple of things to try:

- Turn off SD_BALANCE_NEWIDLE if its on

- Try this patch that Anton posted a while back http://osdir.com/ml/linux-kernel/2009-08/msg06325.html
but only the second chunk, not the first, to see if it makes any difference. If it does will need to find
something smaller than the INT_MAX

Comment 8 IBM Bug Proxy 2009-11-16 16:00:36 UTC

------- Comment From slpratt.com 2009-11-16 10:51 EDT-------
Some comments on tpc-c workload:

All results here are for a 2 socket Nehelam EP with 48GB

No Java.
High Thread count (DB2 process has 1300-1400 threads)
Mostly random memory access
~40GB of shared memory pool
Lots of IO (300,000 io/sec)
Moderate Network traffic (2 x 1GB links)

Comment 10 IBM Bug Proxy 2009-11-16 16:30:54 UTC

------- Comment From balbir.com 2009-11-16 11:27 EDT-------
(In reply to comment #15)

Could you check the dirty limit on SLES10 SP2 versus RHEL(might not be relevant right now, but just checking)? I'll take a look at the URL you pointed to as well.

Comment 11 IBM Bug Proxy 2009-11-17 05:00:24 UTC

------- Comment From bharata.ibm.com 2009-11-16 23:52 EDT-------
If turning off CONFIG_CGROUPS helps, then it would be interesting to see if turning off just CONFIG_GROUP_SCHED gives the same benefit instead of turing the entire cgroups off.

Comment 12 IBM Bug Proxy 2009-11-18 04:00:44 UTC

------- Comment From bharata.ibm.com 2009-11-17 22:52 EDT-------
OLTP has been found to be sensitive to sched_shares_ratelimit. Could you try increasing it if you haven't already ?

Does OLTP has any realtime threads ? If so, could you try setting
/proc/sys/kernel/sched_rt_runtime_us to -1 ?

Comment 13 IBM Bug Proxy 2009-12-07 16:51:13 UTC

------- Comment From slpratt.com 2009-12-07 11:43 EDT-------
Oprofile was only run during a small portion of the run. We see no real impact from oprofile in the overall score.

Comment 14 John Shakshober 2009-12-07 18:06:06 UTC

You can disable cgroup memory function on stock RHEL6 alpha3 and beta1 kernels by specifing cgroup_disable=memory on the kernel grub.conf line

ie
 kernel /vmlinuz-2.6.32-0.54.el6.x86_64 ro root=/dev/mapper/vg_perf4 rhgb cgroup_disable=memory quiet 3

Also note - the beta1 kernel will enable performance optimizations which have been  set to debug in the rhel6 alpha kernels to date.   We assume you are already disabling upto 70 different debug parameters if you are already evaluating RHEL6 performance?

Shak

Comment 16 IBM Bug Proxy 2010-02-02 04:20:37 UTC

------- Comment From balbir.com 2010-02-01 23:10 EDT-------
On 2.6.32, the disable is not required

Commit id in 2.6.32 0c3e73e84fe3f64cf1c2e8bb4e91e8901cbcdc38 fixes the memory cgroup regression. The changelog is below.

Author: Balbir Singh <balbir.ibm.com>
Date:   Wed Sep 23 15:56:42 2009 -0700

memcg: improve resource counter scalability

Reduce the resource counter overhead (mostly spinlock) associated with the
root cgroup.  This is a part of the several patches to reduce mem cgroup
overhead.  I had posted other approaches earlier (including using percpu
counters).  Those patches will be a natural addition and will be added
iteratively on top of these.

The patch stops resource counter accounting for the root cgroup.  The data
for display is derived from the statisitcs we maintain via
mem_cgroup_charge_statistics (which is more scalable).  What happens today
is that, we do double accounting, once using res_counter_charge() and once
using memory_cgroup_charge_statistics().  For the root, since we don't
implement limits any more, we don't need to track every charge via
res_counter_charge() and check for limit being exceeded and reclaim.

The main mem->res usage_in_bytes can be derived by summing the cache and
rss usage data from memory statistics (MEM_CGROUP_STAT_RSS and
MEM_CGROUP_STAT_CACHE).  However, for memsw->res usage_in_bytes, we need
additional data about swapped out memory.  This patch adds a
MEM_CGROUP_STAT_SWAPOUT and uses that along with MEM_CGROUP_STAT_RSS and
MEM_CGROUP_STAT_CACHE to derive the memsw data.  This data is computed
recursively when hierarchy is enabled.

The tests results I see on a 24 way show that

1. The lock contention disappears from /proc/lock_stats
2. The results of the test are comparable to running with
cgroup_disable=memory.
...
Data from Prarit (kernel compile with make -j64 on a 64
CPU/32G machine)

For a single run

Without patch

real 27m8.988s
user 87m24.916s
sys 382m6.037s

With patch

real    4m18.607s
user    84m58.943s
sys     50m52.682s

With config turned off

real    4m54.972s
user    90m13.456s
sys     50m19.711s

Please look at http://www.mail-archive.com/fedora-kernel-list@redhat.com/msg02057.html as well.

Comment 17 IBM Bug Proxy 2010-05-05 21:51:25 UTC

------- Comment From shaggy.ibm.com 2010-05-05 17:45 EDT-------
I don't have the resource to run the benchmarks, but I can verify that the RHEL6 kernel does contain the patches.  No surprise since the code has been in the upstream kernel.

Comment 18 IBM Bug Proxy 2010-07-08 18:31:02 UTC

------- Comment From shaggy.ibm.com 2010-07-08 14:29 EDT-------
Closing.  The mmap_sem contention has been fixed.  Any addition performance issues are outside the scope of this feature.

Comment 19 releng-rhel@redhat.com 2010-11-15 14:08:19 UTC

Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.