Bug 463652
| Summary: | [LTC 6.0 FEAT] 201300:Thread scalability issues with TPC-C | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | IBM Bug Proxy <bugproxy> |
| Component: | kernel | Assignee: | James Takahashi (IBM) <nobody+PNT0273897> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.0 | CC: | dshaks, ejratl, jjarvis, mgahagan, notting, peterm, snagar |
| Target Milestone: | alpha | Keywords: | FutureFeature, OtherQA |
| Target Release: | 6.0 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-2.6.31-1 | Doc Type: | Enhancement |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-11-15 14:08:19 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 356741, 465489, 531073, 554559, 555224 | ||
|
Description
IBM Bug Proxy
2008-09-24 05:10:58 UTC
Validation-only request - setting as MODIFIED. The feature requested has already been accepted into the upstream code base planned for the next major release of Red Hat Enterprise Linux. When the next milestone release of Red Hat Enterprise Linux 6 is available, please verify that the feature requested is present and functioning as desired. Changing the bug owner on the IBM side to shaggy.com upstream in 2.6.27 sha1 id: ce0ad7f0952581ba75ab6aee55bb1ed9bb22cf4f Is IBM planning to run RHEL 6 through TPC-C testing prior to release such that you would be able to provide feedback on this feature? ------- Comment From slpratt.com 2009-11-11 16:46 EDT------- Have done testing of OLTP workload on RHEL6 Alpha1 base 2.6.29.4r5 as well as moving up to 2.6.31-rc kernel levels. Base RHEL6 is 16% regressed from sles10/rhel5 Disabling some of the debug option in kernel config reduce regression to 10% Changing from SLUB to SLAB reduces regression to 7% Most of remaining regression appears to be caused by higher CPU consumption in scheduler functions. An option to revert the process scheduler to O1 would be good. ------- Comment From slpratt.com 2009-11-13 09:41 EDT------- Setting CONFIG_SCHED_DEBUG which is required to expose the CFS tunables, results in a 2% degradation. ------- Comment From yeohc.com 2009-11-15 19:09 EDT------- A couple of things to try: - Turn off SD_BALANCE_NEWIDLE if its on - Try this patch that Anton posted a while back http://osdir.com/ml/linux-kernel/2009-08/msg06325.html but only the second chunk, not the first, to see if it makes any difference. If it does will need to find something smaller than the INT_MAX ------- Comment From slpratt.com 2009-11-16 10:51 EDT------- Some comments on tpc-c workload: All results here are for a 2 socket Nehelam EP with 48GB No Java. High Thread count (DB2 process has 1300-1400 threads) Mostly random memory access ~40GB of shared memory pool Lots of IO (300,000 io/sec) Moderate Network traffic (2 x 1GB links) ------- Comment From balbir.com 2009-11-16 11:27 EDT------- (In reply to comment #15) Could you check the dirty limit on SLES10 SP2 versus RHEL(might not be relevant right now, but just checking)? I'll take a look at the URL you pointed to as well. ------- Comment From bharata.ibm.com 2009-11-16 23:52 EDT------- If turning off CONFIG_CGROUPS helps, then it would be interesting to see if turning off just CONFIG_GROUP_SCHED gives the same benefit instead of turing the entire cgroups off. ------- Comment From bharata.ibm.com 2009-11-17 22:52 EDT------- OLTP has been found to be sensitive to sched_shares_ratelimit. Could you try increasing it if you haven't already ? Does OLTP has any realtime threads ? If so, could you try setting /proc/sys/kernel/sched_rt_runtime_us to -1 ? ------- Comment From slpratt.com 2009-12-07 11:43 EDT------- Oprofile was only run during a small portion of the run. We see no real impact from oprofile in the overall score. You can disable cgroup memory function on stock RHEL6 alpha3 and beta1 kernels by specifing cgroup_disable=memory on the kernel grub.conf line ie kernel /vmlinuz-2.6.32-0.54.el6.x86_64 ro root=/dev/mapper/vg_perf4 rhgb cgroup_disable=memory quiet 3 Also note - the beta1 kernel will enable performance optimizations which have been set to debug in the rhel6 alpha kernels to date. We assume you are already disabling upto 70 different debug parameters if you are already evaluating RHEL6 performance? Shak ------- Comment From balbir.com 2010-02-01 23:10 EDT------- On 2.6.32, the disable is not required Commit id in 2.6.32 0c3e73e84fe3f64cf1c2e8bb4e91e8901cbcdc38 fixes the memory cgroup regression. The changelog is below. Author: Balbir Singh <balbir.ibm.com> Date: Wed Sep 23 15:56:42 2009 -0700 memcg: improve resource counter scalability Reduce the resource counter overhead (mostly spinlock) associated with the root cgroup. This is a part of the several patches to reduce mem cgroup overhead. I had posted other approaches earlier (including using percpu counters). Those patches will be a natural addition and will be added iteratively on top of these. The patch stops resource counter accounting for the root cgroup. The data for display is derived from the statisitcs we maintain via mem_cgroup_charge_statistics (which is more scalable). What happens today is that, we do double accounting, once using res_counter_charge() and once using memory_cgroup_charge_statistics(). For the root, since we don't implement limits any more, we don't need to track every charge via res_counter_charge() and check for limit being exceeded and reclaim. The main mem->res usage_in_bytes can be derived by summing the cache and rss usage data from memory statistics (MEM_CGROUP_STAT_RSS and MEM_CGROUP_STAT_CACHE). However, for memsw->res usage_in_bytes, we need additional data about swapped out memory. This patch adds a MEM_CGROUP_STAT_SWAPOUT and uses that along with MEM_CGROUP_STAT_RSS and MEM_CGROUP_STAT_CACHE to derive the memsw data. This data is computed recursively when hierarchy is enabled. The tests results I see on a 24 way show that 1. The lock contention disappears from /proc/lock_stats 2. The results of the test are comparable to running with cgroup_disable=memory. ... Data from Prarit (kernel compile with make -j64 on a 64 CPU/32G machine) For a single run Without patch real 27m8.988s user 87m24.916s sys 382m6.037s With patch real 4m18.607s user 84m58.943s sys 50m52.682s With config turned off real 4m54.972s user 90m13.456s sys 50m19.711s Please look at http://www.mail-archive.com/fedora-kernel-list@redhat.com/msg02057.html as well. ------- Comment From shaggy.ibm.com 2010-05-05 17:45 EDT------- I don't have the resource to run the benchmarks, but I can verify that the RHEL6 kernel does contain the patches. No surprise since the code has been in the upstream kernel. ------- Comment From shaggy.ibm.com 2010-07-08 14:29 EDT------- Closing. The mmap_sem contention has been fixed. Any addition performance issues are outside the scope of this feature. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |