Bug 474913

Summary: [LTC 5.4 FEAT] Thread scalability issues with TPC-C [201300]
Product: Red Hat Enterprise Linux 5 Reporter: IBM Bug Proxy <bugproxy>
Component: kernelAssignee: Jesse Larrew <jlarrew>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: cward, dzickus, jjarvis, jlarrew, peterm, sglass, syeghiay
Target Milestone: alphaKeywords: FutureFeature, OtherQA
Target Release: 5.4   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 08:26:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 445204, 483701, 483784    
Attachments:
Description Flags
Fast GUP backport to RHEL kernel
none
Updated Fast GUP backport to RHEL kernel
none
Incremental patch to check for VM_RESERVED rather than cause a BUG()
none
Updated patches to enable fast gup only for non-xen x86_64
none
dmidecode from 64bit rpm
none
xmdmesg from 64bit rpm
none
xminfo from 64bit rpm
none
dmesg from 64bit rpm
none
dmidecode from 32bit rpm
none
xmdmesg from 32bit rpm
none
xminfo from 32bit rpm
none
dmesg from 32bit rpm
none
combined patch
none
get_user_pages_fast() for x86_64 for the 2.6.18-128-el5 kernel none

Description IBM Bug Proxy 2008-12-05 21:30:51 UTC
=Comment: #0=================================================
Emily J. Ratliff <ratliff.com> - 
1. Feature Overview:
Feature Id:	[201300]
a. Name of Feature:	Thread scalability issues with TPC-C
b. Feature Description
Improve thread scalability for TPC-C benchmarking, in particular via reduction of DIO induced
mmap_sem contention and lock contention in follow_hugetlb_page().

The work for 5.3 was tracked in https://bugzilla.redhat.com/show_bug.cgi?id=447649

2. Feature Details:
Sponsor:	LTC
Architectures:
x86
x86_64
ppc64

Arch Specificity: Both
Affects Core Kernel: Yes
Delivery Mechanism: Backport
Category:	Kernel
Request Type:	Kernel - Performance Enhancement from Upstream
d. Upstream Acceptance:	Accepted
Sponsor Priority	1
f. Severity: High
IBM Confidential:	no
Code Contribution:	3rd party code
g. Component Version Target:	2.6.27
Performance Assistance:	yes

3. Business Case
New threaded performance issues are being discovered today by high end TPC-C benchmarks.  Addressing
these types of bugs as early as possible saves money and positions the distro to be as scalable as
possible during its lifetime, which we expect to witness proliferation of very multicore processors
and threaded applications. Also key for DB2 customers running the new threaded model. The
performance impacts varies depending on the configuration. 

4. Primary contact at Red Hat: 
John Jarvis
jjarvis

5. Primary contacts at Partner:
Project Management Contact:
Michael Hohnbaum, hbaum.com, 503-578-5486

Technical contact(s):
Badari Pulavarty, badari.com
David Kleikamp, shaggy.com

IBM Manager:
Pat Gaughen, gaughen.com

Comment 1 John Jarvis 2008-12-18 21:23:47 UTC
*** Bug 447649 has been marked as a duplicate of this bug. ***

Comment 2 IBM Bug Proxy 2008-12-18 21:34:52 UTC
Emily J. Ratliff <emilyr.com> - 2008-05-20 18:29 EDT
1. Feature Id:	[201663]
Feature Name:	fast_gup patchset for TPCC performance improvements
Sponsor:	Performance
Category:	LTC
Request Type:	Kernel

2. Short Description
Improve thread scalability for TPC-C benchmarking, in particular via reduction
of DIO induced mmap_sem contention and lock contention in follow_hugetlb_page().

Some issues with thread performance (lock contention) were recently identified
by high end TPC-C benchmark. Nick Piggin developed a solution to address these
lock contention issues known as fast-gup patches (see
http://lwn.net/Articles/275185/). Linus acknowledged the patches and willing
consider for next major release (2.6.27). As these patches provide significant
improvements (9-10%), we would like to request Red Hat to consider the patchset
for inclusion into RHEL5.3 if the patches make it into the 2.6.27 merge window
before the Red Hat internal kernel code freeze date. This is an unusual request
to consider a feature which is not yet in mainline for inclusion. We ask this
because we anticipate this patch being a significant factor in the possibility
for making the first 1 M tpmC publish on x86_64. Thread performance is also
critical for DB2. This patchset provides a new interface, which gets used in
very selective places in the kernel. IBM is willing to port, test, verify the
patchset and (if needed) even restrict the patchset only to x86-64 to minimize
the risk.

4. Sponsor Priority     2
IBM Confidential:	yes
Upstream Acceptance:	Pending
Component Version Target:	2.6.27

5. PM Contact:	Mike Wortman, wortman.com, 512-838-8582

6. Technical contact(s):
I'm changing the owner of this one to John Jarvis, because when I asked, he was
going to talk to Shak about the advisability of making this request.

Adding Ed Pollard to the CC list since he has looked into this and had some
initial discussions with Peter.

John, I have not added it to the 5.3 tracker, pending a response from you.
Update: The fast_gup patches have made it into the -mm tree so they are still on
track for 2.6.27.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
looks like the code is actually detailed here http://lwn.net/Articles/275724/
Latest GUP patches are in -mm and scheduled for inclusion into 2.6.27.

Here is the patchset for consideration:

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc8/2.6.26-rc8-mm1/broken-out/

x86-implement-pte_special.patch
mm-introduce-get_user_pages_fast.patch
mm-introduce-get_user_pages_fast-fix.patch
mm-introduce-get_user_pages_fast-checkpatch-fixes.patch
x86-lockless-get_user_pages_fast.patch
x86-lockless-get_user_pages_fast-checkpatch-fixes.patch
x86-lockless-get_user_pages_fast-fix.patch
x86-lockless-get_user_pages_fast-fix-2.patch
x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch
x86-lockless-get_user_pages_fast-fix-warning.patch
dio-use-get_user_pages_fast.patch
splice-use-get_user_pages_fast.patch
x86-support-1gb-hugepages-with-get_user_pages_lockless.patch
IBM is signed up to test and provide feedback on the implementation of this feature.
This enhancement request was evaluated by the full Red Hat Enterprise Linux team
for inclusion in a Red Hat Enterprise Linux minor release.   As a result of this
evaluation, Red Hat has tentatively approved inclusion of this feature in the
next Red Hat Enterprise Linux Update minor release.   While it is a goal to
include this enhancement in the next minor release of Red Hat Enterprise Linux,
the enhancement is not yet committed for inclusion in the next minor release
pending the next phase of actual code integration and successful Red Hat and
partner testing.

How much time are we talking here because the runway to submit patches is pretty short and RH needs time to review this as well.
(From update of attachment 314072 [details])
new patch attached making this one obsolete
I have tested the patch on a RHEL5.3 test kernel (2.6.18-103.el5) using our
large OLTP workload and observed the following performance increase:

2.6.18-103.el5
100%

2.6.18-103.el5+fastgup
102.78%

I also did not observe any stability issues during the testing of the patch.
I'm guessing that the HAVE_PTE_SPECIAL code isn't compatible with the version of
xen code that is in the rhel5.3 kernel.  I think it would be best to revert the
changes to include/asm-x86_64/mach-xen/asm/pgtable.h in
x86-implement-pte_special.patch, so that __HAVE_ARCH_PTE_SPECIAL is only defined
for the non-xen x86_64 kernel.  This way, both xen kernels will resort to the
old behavior.  (This assumes the previous patch to revert the other part is good.)
I will build some kernels tonight/tomorrow based on these and post them so they can be tested but we will need to line up xen testers at IBM.
Installed RHEL5.2 64bit Xen and applied
kernel-xen-2.6.18-105.el5.fast_gup7.x86_64.rpm, rebooted and ran dmidecode
without seeing any issues.

Installed RHEL5.2 32bit Xen and applied
kernel-xen-2.6.18-105.el5.fast_gup7.i686.rpm, rebooted and ran dmidecode without
seeing any issues.

Attaching some outputs in case they are meaningful.
in kernel-2.6.18-107.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
The patch has been reverted, but why?

From the changelog:

- Revert: [mm] add support for fast get user pages (Ed Pollard ) [447649]
because it broke the xen kernels and then we submitted a new set of patches to
resolve the problem, but not before a new kernel was needed to be built, so this
was reverted and the new patch is being run through testing at RH now.
in kernel-2.6.18-111.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
this patch is causing a regression on x86-64 on Intel 965 hardware with X.org.

X no longer starts, and we get an oops.

Bug 463853 + 438400 contain the saga so far.
(In reply to comment #46)
> this patch is causing a regression on x86-64 on Intel 965 hardware with X.org.
>
> X no longer starts, and we get an oops.
>
> Bug 463853 + 438400 contain the saga so far.

I don't have access to view these bugs.  Could you add me as a cc or send me the details?

Thanks,
shaggy.ibm.com
Cut and Paste the developer's review comment on this patch on the
mailing list:

This patch has broken the Intel X.org driver on x86_64 with Intel 965GM
GPU.

We get an oops nearly like the dmidecode one.

vma normal page ffff8100740293f8, addr 2b603caae000 pte_pfn d2000
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at /shared/airlied/kernel/mm/memory.c:425
invalid opcode: 0000 [1] SMP
last sysfs file: /class/drm/card0/dev
CPU 3
Modules linked in: i915(U) drm(U) netconsole(U) autofs4(U) sunrpc(U)
ipv6(U) xfrm_nalgo(U) crypto_api(U) cpufreq_ondemand(U) video(U)
backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U)
acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U)
snd_hda_intel(U) snd_seq_dummy(U) snd_seq_oss(U) snd_seq_midi_event(U)
snd_seq(U) snd_seq_device(U) snd_pcm_oss(U) sr_mod(U) cdrom(U)
snd_mixer_oss(U) snd_pcm(U) e1000(U) snd_timer(U) snd_page_alloc(U)
snd_hwdep(U) i2c_i801(U) i2c_core(U) snd(U) soundcore(U) e1000e(U) sg(U)
pcspkr(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_log(U) dm_mod(U)
ahci(U) libata(U) usb_storage(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U)
ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) i965.
Pid: 3699, comm: Xorg Tainted: G      2.6.18.4 #5
RIP: 0010:[<ffffffff8000c4f5>]  [<ffffffff8000c4f5>] vm_normal_page
+0x99/0x10a
RSP: 0018:ffff81006853fd28  EFLAGS: 00010246
RAX: ffff8100010000d0 RBX: 000000000000001a RCX: 0000000000000286
RDX: ffff810001000000 RSI: 0000000000000000 RDI: ffffffff802fdb5c
RBP: 0000000000000000 R08: 0000000002000000 R09: 0000000000000000
R10: 0000000010000042 R11: 0000000000000000 R12: 00000000000d2000
R13: 00002b603caae000 R14: ffff810063dfc570 R15: 0000000000084408
FS:  00002b6037b64ad0(0000) GS:ffff810037d26640(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000053e270 CR3: 000000006847c000 CR4: 00000000000006e0
Process Xorg (pid: 3699, threadinfo ffff81006853e000, task
ffff810077fe4100)
Stack:  80000000d2000007 80000000d2000007 ffff81007b5f9100
ffffffff800084d4
00002b603aaac000 ffff8100740293f8 ffff8100747c61c0 ffff810076b2f440
ffff81006847c2b0 ffff81006381b2b0 00002b603d8ee000 00002b603d8ee000
Call Trace:
[<ffffffff800084d4>] copy_page_range+0x5b3/0x73e
[<ffffffff80063bbc>] mutex_lock+0xd/0x1d
[<ffffffff8001f90c>] copy_process+0xce6/0x1550
[<ffffffff80030c4d>] do_fork+0x69/0x1be
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
[<ffffffff8005d427>] ptregscall_common+0x67/0xac

I've spent a day digging into it and its due to the intel driver doing
some mprotect calls. It mmaps the framebuffer at 0xd0000000 for 256MB,
then mprotects a 128MB chunk of it here from 0xd2000000. When the
process forks or exits after that mprotect we get the oops like above.

Its the mapping that is created from d2000000 that is breaking with the
pte valid check.

These calls while not entirely necessary shouldn't cause the kernel to
oops.

I think upstream some fixes for mprotect may have gone in to stop this
from happening.

Dave.
Moving back into POST to let me revert this patch.  The next time it goes to MODIFIED, the patch will have been reverted.
in kernel-2.6.18-118.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Just to be clear, what is in kernel-2.6.18-118.el5 is the revert of this patch.
(In reply to comment #48)
> Cut and Paste the developer's review comment on this patch on the
> mailing list:

> I think upstream some fixes for mprotect may have gone in to stop this
> from happening.

I'm not able to find an upstream fix.  I'd appreciate any more information that can help me find it.

>
> Dave.
>

Thanks,
Shaggy
Is there any more information available on this regression?  I don't have access to the Redhat bugs or mailing list.  If someone knows of an upstream patch that fixes the regression, or at least an idea of what to look for, it would be very helpful.
I'm adding myself to the cc list since the mirroring between IBM and Redhat seems to be having issues.

I've lost a lot of time on this, but could it seems that there is more information about the regression and an upstream fix that I don't have access to.  Is a fix still possible?
Redhat,

The comments posted indicate that you have some idea what the problem is, and that you know of a fix in the upstream kernel.  Please provide more details.

Thanks,
Shaggy
not really, its just never been seen in the upstream kernel from what I know.

However whether that is just luck due to nobody upstream running the xorg driver that causes the problem to trip or not is unknown.

The basic issue is to do with the driver calling some mprotects on some memory
the X server mmaped and only on 64-bit systems.
This patch missed the beta deadline so will not make RHEL 5.3.
I understand that this will not make RHEL 5.3 but I like someone to look at it for RHEL 5.4 so I want to defer it to 5.4.
This request is deferred to 5.4 to allow IBM time to rework it.  Removing from 5.3 tracker, adding to 5.4 tracker and requesting for 5.4
Taking this over from Ed Pollard.
*** This bug has been marked as a duplicate of 474913 ***

Comment 3 IBM Bug Proxy 2008-12-18 21:34:58 UTC
Created attachment 327369 [details]
Fast GUP backport to RHEL kernel



Attached is a backport of the fast GUP patches to the 100.el5 kernel for RHEL
5.3  developed by Dave Kleikamp. They have been functionally tested against
101.el5 and known to apply to 103.el5. Performance testing is starting and so
will take a little while longer to confirm everything is behaving as expected.
In the meantime, the patches are here for review and the series file includes
what git commit each patch is based on.

Comment 4 IBM Bug Proxy 2008-12-18 21:35:03 UTC
Created attachment 327370 [details]
Updated Fast GUP backport to RHEL kernel



The later kernel was using mach headers which caused problems. The changelog
since V3 is

o Fix a typo in s390 that prevent compilation (patch pte_special.patch)
o Added special bit information for mach-xen on i386 (patch pte_special.patch)
o Added special bit information for mach-xen on x86_64 (patch
pte_special.patch)

Comment 5 IBM Bug Proxy 2008-12-18 21:35:08 UTC
Created attachment 327371 [details]
Incremental patch to check for VM_RESERVED rather than cause a BUG()



This patch reverts a change in the non-HAVE_PTE_SPECIAL code that replace some
sanity checks with a BUG_ON().	This change was probably valid in the latest
mainline kernel, but breaks xen in the rhel5 kernel.

This patch has not been tested yet.

Comment 6 IBM Bug Proxy 2008-12-18 21:35:13 UTC
Created attachment 327372 [details]
Updated patches to enable fast gup only for non-xen x86_64



I've update the patchset to leave the x86_64 xen build alone, only enabling the
PTE_SPECIAL code on the non-xen x86_64 kernel.	It also incorporates my
previous patch to leave the VM_RESERVED check in vm_normal_page().

I don't know how to test the xen builds, so I hope someone else can do the
testing

Comment 7 IBM Bug Proxy 2008-12-18 21:35:18 UTC
Created attachment 327373 [details]
dmidecode from 64bit rpm

Comment 8 IBM Bug Proxy 2008-12-18 21:35:26 UTC
Created attachment 327374 [details]
xmdmesg from 64bit rpm

Comment 9 IBM Bug Proxy 2008-12-18 21:35:31 UTC
Created attachment 327375 [details]
xminfo from 64bit rpm

Comment 10 IBM Bug Proxy 2008-12-18 21:35:37 UTC
Created attachment 327376 [details]
dmesg from 64bit rpm

Comment 11 IBM Bug Proxy 2008-12-18 21:35:43 UTC
Created attachment 327377 [details]
dmidecode from 32bit rpm

Comment 12 IBM Bug Proxy 2008-12-18 21:35:48 UTC
Created attachment 327378 [details]
xmdmesg from 32bit rpm

Comment 13 IBM Bug Proxy 2008-12-18 21:35:53 UTC
Created attachment 327379 [details]
xminfo from 32bit rpm

Comment 14 IBM Bug Proxy 2008-12-18 21:35:58 UTC
Created attachment 327380 [details]
dmesg from 32bit rpm

Comment 15 IBM Bug Proxy 2008-12-18 21:36:03 UTC
Created attachment 327381 [details]
combined patch



This is the patch combined into one file (applied individually to source tree
and then diff'd into a single patch) with all of the most recent fixes,
including one that was emailed to me directly to resolve an x86_64 compile
error.

this has been posted to the rkml list.

Comment 16 John Jarvis 2008-12-23 20:15:06 UTC
IBM is signed up to test and provide feedback.

Comment 17 IBM Bug Proxy 2009-01-12 21:02:23 UTC
I've found the patch to mainline that fixes the Xorg problem found in rhel5.3 beta.

http://lkml.org/lkml/2008/7/4/278

I'll re-roll the patch for rhel5.4 with this fix.

Shaggy

Comment 18 IBM Bug Proxy 2009-01-13 12:42:15 UTC
Created attachment 328861 [details]
get_user_pages_fast() for x86_64 for the 2.6.18-128-el5 kernel



I have updated the combined patch for the 2.6.18-128-el5 kernel and included the fix for the Xorg problem reported against the beta rhel5.3 kernel.

Comment 19 IBM Bug Proxy 2009-01-14 18:05:45 UTC
*** Bug 44954 has been marked as a duplicate of this bug. ***

Comment 20 IBM Bug Proxy 2009-01-14 18:12:34 UTC
*** This bug has been marked as a duplicate of bug 50374 ***

Comment 21 IBM Bug Proxy 2009-01-21 14:03:58 UTC
(In reply to comment #23)
> *** This bug has been marked as a duplicate of bug 50374 ***
>

Redhat, pay no attention to this comment.  This bug is NOT a duplicate.  Just a stray comment from bugproxy.

Comment 22 RHEL Program Management 2009-02-16 15:14:44 UTC
Updating PM score.

Comment 23 John Jarvis 2009-03-17 21:09:45 UTC
IBM, any progress on this BZ?

Comment 24 IBM Bug Proxy 2009-04-01 02:52:12 UTC
------- Comment From shaggy.ibm.com 2009-03-31 22:40 EDT-------
Redhat,
The patch has been successfully tested.  X runs fine on a patched kernel now.

Shaggy

Comment 27 John Jarvis 2009-04-14 14:57:01 UTC
This enhancement request was evaluated by the full Red Hat Enterprise Linux 
team for inclusion in a Red Hat Enterprise Linux minor release.   As a 
result of this evaluation, Red Hat has tentatively approved inclusion of 
this feature in the next Red Hat Enterprise Linux Update minor release.   
While it is a goal to include this enhancement in the next minor release 
of Red Hat Enterprise Linux, the enhancement is not yet committed for 
inclusion in the next minor release pending the next phase of actual 
code integration and successful Red Hat and partner testing.

Comment 28 Don Zickus 2009-05-14 19:34:39 UTC
in kernel-2.6.18-148.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 30 Chris Ward 2009-06-14 23:17:19 UTC
~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~

RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should
be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner!

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks!

Comment 31 Chris Ward 2009-07-03 18:15:45 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 32 IBM Bug Proxy 2009-07-06 19:52:31 UTC
------- Comment From shaggy.ibm.com 2009-07-06 15:49 EDT-------
Verified as working on RHEL 5.4 Beta

Comment 34 errata-xmlrpc 2009-09-02 08:26:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html