Bug 474913
Summary: | [LTC 5.4 FEAT] Thread scalability issues with TPC-C [201300] | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | IBM Bug Proxy <bugproxy> |
Component: | kernel | Assignee: | Jesse Larrew <jlarrew> |
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.4 | CC: | cward, dzickus, jjarvis, jlarrew, peterm, sglass, syeghiay |
Target Milestone: | alpha | Keywords: | FutureFeature, OtherQA |
Target Release: | 5.4 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-09-02 08:26:00 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 445204, 483701, 483784 | ||
Attachments: |
Description
IBM Bug Proxy
2008-12-05 21:30:51 UTC
*** Bug 447649 has been marked as a duplicate of this bug. *** Emily J. Ratliff <emilyr.com> - 2008-05-20 18:29 EDT 1. Feature Id: [201663] Feature Name: fast_gup patchset for TPCC performance improvements Sponsor: Performance Category: LTC Request Type: Kernel 2. Short Description Improve thread scalability for TPC-C benchmarking, in particular via reduction of DIO induced mmap_sem contention and lock contention in follow_hugetlb_page(). Some issues with thread performance (lock contention) were recently identified by high end TPC-C benchmark. Nick Piggin developed a solution to address these lock contention issues known as fast-gup patches (see http://lwn.net/Articles/275185/). Linus acknowledged the patches and willing consider for next major release (2.6.27). As these patches provide significant improvements (9-10%), we would like to request Red Hat to consider the patchset for inclusion into RHEL5.3 if the patches make it into the 2.6.27 merge window before the Red Hat internal kernel code freeze date. This is an unusual request to consider a feature which is not yet in mainline for inclusion. We ask this because we anticipate this patch being a significant factor in the possibility for making the first 1 M tpmC publish on x86_64. Thread performance is also critical for DB2. This patchset provides a new interface, which gets used in very selective places in the kernel. IBM is willing to port, test, verify the patchset and (if needed) even restrict the patchset only to x86-64 to minimize the risk. 4. Sponsor Priority 2 IBM Confidential: yes Upstream Acceptance: Pending Component Version Target: 2.6.27 5. PM Contact: Mike Wortman, wortman.com, 512-838-8582 6. Technical contact(s): I'm changing the owner of this one to John Jarvis, because when I asked, he was going to talk to Shak about the advisability of making this request. Adding Ed Pollard to the CC list since he has looked into this and had some initial discussions with Peter. John, I have not added it to the 5.3 tracker, pending a response from you. Update: The fast_gup patches have made it into the -mm tree so they are still on track for 2.6.27. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. looks like the code is actually detailed here http://lwn.net/Articles/275724/ Latest GUP patches are in -mm and scheduled for inclusion into 2.6.27. Here is the patchset for consideration: http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc8/2.6.26-rc8-mm1/broken-out/ x86-implement-pte_special.patch mm-introduce-get_user_pages_fast.patch mm-introduce-get_user_pages_fast-fix.patch mm-introduce-get_user_pages_fast-checkpatch-fixes.patch x86-lockless-get_user_pages_fast.patch x86-lockless-get_user_pages_fast-checkpatch-fixes.patch x86-lockless-get_user_pages_fast-fix.patch x86-lockless-get_user_pages_fast-fix-2.patch x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch x86-lockless-get_user_pages_fast-fix-warning.patch dio-use-get_user_pages_fast.patch splice-use-get_user_pages_fast.patch x86-support-1gb-hugepages-with-get_user_pages_lockless.patch IBM is signed up to test and provide feedback on the implementation of this feature. This enhancement request was evaluated by the full Red Hat Enterprise Linux team for inclusion in a Red Hat Enterprise Linux minor release. As a result of this evaluation, Red Hat has tentatively approved inclusion of this feature in the next Red Hat Enterprise Linux Update minor release. While it is a goal to include this enhancement in the next minor release of Red Hat Enterprise Linux, the enhancement is not yet committed for inclusion in the next minor release pending the next phase of actual code integration and successful Red Hat and partner testing. How much time are we talking here because the runway to submit patches is pretty short and RH needs time to review this as well. (From update of attachment 314072 [details]) new patch attached making this one obsolete I have tested the patch on a RHEL5.3 test kernel (2.6.18-103.el5) using our large OLTP workload and observed the following performance increase: 2.6.18-103.el5 100% 2.6.18-103.el5+fastgup 102.78% I also did not observe any stability issues during the testing of the patch. I'm guessing that the HAVE_PTE_SPECIAL code isn't compatible with the version of xen code that is in the rhel5.3 kernel. I think it would be best to revert the changes to include/asm-x86_64/mach-xen/asm/pgtable.h in x86-implement-pte_special.patch, so that __HAVE_ARCH_PTE_SPECIAL is only defined for the non-xen x86_64 kernel. This way, both xen kernels will resort to the old behavior. (This assumes the previous patch to revert the other part is good.) I will build some kernels tonight/tomorrow based on these and post them so they can be tested but we will need to line up xen testers at IBM. Installed RHEL5.2 64bit Xen and applied kernel-xen-2.6.18-105.el5.fast_gup7.x86_64.rpm, rebooted and ran dmidecode without seeing any issues. Installed RHEL5.2 32bit Xen and applied kernel-xen-2.6.18-105.el5.fast_gup7.i686.rpm, rebooted and ran dmidecode without seeing any issues. Attaching some outputs in case they are meaningful. in kernel-2.6.18-107.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 The patch has been reverted, but why? From the changelog: - Revert: [mm] add support for fast get user pages (Ed Pollard ) [447649] because it broke the xen kernels and then we submitted a new set of patches to resolve the problem, but not before a new kernel was needed to be built, so this was reverted and the new patch is being run through testing at RH now. in kernel-2.6.18-111.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 this patch is causing a regression on x86-64 on Intel 965 hardware with X.org. X no longer starts, and we get an oops. Bug 463853 + 438400 contain the saga so far. (In reply to comment #46) > this patch is causing a regression on x86-64 on Intel 965 hardware with X.org. > > X no longer starts, and we get an oops. > > Bug 463853 + 438400 contain the saga so far. I don't have access to view these bugs. Could you add me as a cc or send me the details? Thanks, shaggy.ibm.com Cut and Paste the developer's review comment on this patch on the mailing list: This patch has broken the Intel X.org driver on x86_64 with Intel 965GM GPU. We get an oops nearly like the dmidecode one. vma normal page ffff8100740293f8, addr 2b603caae000 pte_pfn d2000 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at /shared/airlied/kernel/mm/memory.c:425 invalid opcode: 0000 [1] SMP last sysfs file: /class/drm/card0/dev CPU 3 Modules linked in: i915(U) drm(U) netconsole(U) autofs4(U) sunrpc(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) cpufreq_ondemand(U) video(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) snd_hda_intel(U) snd_seq_dummy(U) snd_seq_oss(U) snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U) sr_mod(U) cdrom(U) snd_mixer_oss(U) snd_pcm(U) e1000(U) snd_timer(U) snd_page_alloc(U) snd_hwdep(U) i2c_i801(U) i2c_core(U) snd(U) soundcore(U) e1000e(U) sg(U) pcspkr(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_log(U) dm_mod(U) ahci(U) libata(U) usb_storage(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) i965. Pid: 3699, comm: Xorg Tainted: G 2.6.18.4 #5 RIP: 0010:[<ffffffff8000c4f5>] [<ffffffff8000c4f5>] vm_normal_page +0x99/0x10a RSP: 0018:ffff81006853fd28 EFLAGS: 00010246 RAX: ffff8100010000d0 RBX: 000000000000001a RCX: 0000000000000286 RDX: ffff810001000000 RSI: 0000000000000000 RDI: ffffffff802fdb5c RBP: 0000000000000000 R08: 0000000002000000 R09: 0000000000000000 R10: 0000000010000042 R11: 0000000000000000 R12: 00000000000d2000 R13: 00002b603caae000 R14: ffff810063dfc570 R15: 0000000000084408 FS: 00002b6037b64ad0(0000) GS:ffff810037d26640(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000053e270 CR3: 000000006847c000 CR4: 00000000000006e0 Process Xorg (pid: 3699, threadinfo ffff81006853e000, task ffff810077fe4100) Stack: 80000000d2000007 80000000d2000007 ffff81007b5f9100 ffffffff800084d4 00002b603aaac000 ffff8100740293f8 ffff8100747c61c0 ffff810076b2f440 ffff81006847c2b0 ffff81006381b2b0 00002b603d8ee000 00002b603d8ee000 Call Trace: [<ffffffff800084d4>] copy_page_range+0x5b3/0x73e [<ffffffff80063bbc>] mutex_lock+0xd/0x1d [<ffffffff8001f90c>] copy_process+0xce6/0x1550 [<ffffffff80030c4d>] do_fork+0x69/0x1be [<ffffffff8005d28d>] tracesys+0xd5/0xe0 [<ffffffff8005d427>] ptregscall_common+0x67/0xac I've spent a day digging into it and its due to the intel driver doing some mprotect calls. It mmaps the framebuffer at 0xd0000000 for 256MB, then mprotects a 128MB chunk of it here from 0xd2000000. When the process forks or exits after that mprotect we get the oops like above. Its the mapping that is created from d2000000 that is breaking with the pte valid check. These calls while not entirely necessary shouldn't cause the kernel to oops. I think upstream some fixes for mprotect may have gone in to stop this from happening. Dave. Moving back into POST to let me revert this patch. The next time it goes to MODIFIED, the patch will have been reverted. in kernel-2.6.18-118.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Just to be clear, what is in kernel-2.6.18-118.el5 is the revert of this patch. (In reply to comment #48) > Cut and Paste the developer's review comment on this patch on the > mailing list: > I think upstream some fixes for mprotect may have gone in to stop this > from happening. I'm not able to find an upstream fix. I'd appreciate any more information that can help me find it. > > Dave. > Thanks, Shaggy Is there any more information available on this regression? I don't have access to the Redhat bugs or mailing list. If someone knows of an upstream patch that fixes the regression, or at least an idea of what to look for, it would be very helpful. I'm adding myself to the cc list since the mirroring between IBM and Redhat seems to be having issues. I've lost a lot of time on this, but could it seems that there is more information about the regression and an upstream fix that I don't have access to. Is a fix still possible? Redhat, The comments posted indicate that you have some idea what the problem is, and that you know of a fix in the upstream kernel. Please provide more details. Thanks, Shaggy not really, its just never been seen in the upstream kernel from what I know. However whether that is just luck due to nobody upstream running the xorg driver that causes the problem to trip or not is unknown. The basic issue is to do with the driver calling some mprotects on some memory the X server mmaped and only on 64-bit systems. This patch missed the beta deadline so will not make RHEL 5.3. I understand that this will not make RHEL 5.3 but I like someone to look at it for RHEL 5.4 so I want to defer it to 5.4. This request is deferred to 5.4 to allow IBM time to rework it. Removing from 5.3 tracker, adding to 5.4 tracker and requesting for 5.4 Taking this over from Ed Pollard. *** This bug has been marked as a duplicate of 474913 *** Created attachment 327369 [details]
Fast GUP backport to RHEL kernel
Attached is a backport of the fast GUP patches to the 100.el5 kernel for RHEL
5.3 developed by Dave Kleikamp. They have been functionally tested against
101.el5 and known to apply to 103.el5. Performance testing is starting and so
will take a little while longer to confirm everything is behaving as expected.
In the meantime, the patches are here for review and the series file includes
what git commit each patch is based on.
Created attachment 327370 [details]
Updated Fast GUP backport to RHEL kernel
The later kernel was using mach headers which caused problems. The changelog
since V3 is
o Fix a typo in s390 that prevent compilation (patch pte_special.patch)
o Added special bit information for mach-xen on i386 (patch pte_special.patch)
o Added special bit information for mach-xen on x86_64 (patch
pte_special.patch)
Created attachment 327371 [details]
Incremental patch to check for VM_RESERVED rather than cause a BUG()
This patch reverts a change in the non-HAVE_PTE_SPECIAL code that replace some
sanity checks with a BUG_ON(). This change was probably valid in the latest
mainline kernel, but breaks xen in the rhel5 kernel.
This patch has not been tested yet.
Created attachment 327372 [details]
Updated patches to enable fast gup only for non-xen x86_64
I've update the patchset to leave the x86_64 xen build alone, only enabling the
PTE_SPECIAL code on the non-xen x86_64 kernel. It also incorporates my
previous patch to leave the VM_RESERVED check in vm_normal_page().
I don't know how to test the xen builds, so I hope someone else can do the
testing
Created attachment 327373 [details]
dmidecode from 64bit rpm
Created attachment 327374 [details]
xmdmesg from 64bit rpm
Created attachment 327375 [details]
xminfo from 64bit rpm
Created attachment 327376 [details]
dmesg from 64bit rpm
Created attachment 327377 [details]
dmidecode from 32bit rpm
Created attachment 327378 [details]
xmdmesg from 32bit rpm
Created attachment 327379 [details]
xminfo from 32bit rpm
Created attachment 327380 [details]
dmesg from 32bit rpm
Created attachment 327381 [details]
combined patch
This is the patch combined into one file (applied individually to source tree
and then diff'd into a single patch) with all of the most recent fixes,
including one that was emailed to me directly to resolve an x86_64 compile
error.
this has been posted to the rkml list.
IBM is signed up to test and provide feedback. I've found the patch to mainline that fixes the Xorg problem found in rhel5.3 beta. http://lkml.org/lkml/2008/7/4/278 I'll re-roll the patch for rhel5.4 with this fix. Shaggy Created attachment 328861 [details]
get_user_pages_fast() for x86_64 for the 2.6.18-128-el5 kernel
I have updated the combined patch for the 2.6.18-128-el5 kernel and included the fix for the Xorg problem reported against the beta rhel5.3 kernel.
*** Bug 44954 has been marked as a duplicate of this bug. *** *** This bug has been marked as a duplicate of bug 50374 *** (In reply to comment #23) > *** This bug has been marked as a duplicate of bug 50374 *** > Redhat, pay no attention to this comment. This bug is NOT a duplicate. Just a stray comment from bugproxy. Updating PM score. IBM, any progress on this BZ? ------- Comment From shaggy.ibm.com 2009-03-31 22:40 EDT------- Redhat, The patch has been successfully tested. X runs fine on a patched kernel now. Shaggy This enhancement request was evaluated by the full Red Hat Enterprise Linux team for inclusion in a Red Hat Enterprise Linux minor release. As a result of this evaluation, Red Hat has tentatively approved inclusion of this feature in the next Red Hat Enterprise Linux Update minor release. While it is a goal to include this enhancement in the next minor release of Red Hat Enterprise Linux, the enhancement is not yet committed for inclusion in the next minor release pending the next phase of actual code integration and successful Red Hat and partner testing. in kernel-2.6.18-148.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. ~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~ RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner! If you encounter any issues, please set the bug back to the ASSIGNED state and describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks! ~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative. ------- Comment From shaggy.ibm.com 2009-07-06 15:49 EDT------- Verified as working on RHEL 5.4 Beta An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html |