RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1772133 - Transparent Huge Pages set to [always] is sub-optimal for many applications
Summary: Transparent Huge Pages set to [always] is sub-optimal for many applications
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel
Version: ---
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: Andrea Arcangeli
QA Contact: Ping Fang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-13 17:47 UTC by Mark Nelson
Modified: 2023-08-08 02:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-03 01:40:30 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mark Nelson 2019-11-13 17:47:17 UTC
Description of problem:

Transparent Huge Pages provides real benefit to certain applications by potentially reducing TLB misses and improving performance. For other applications, it can bloat memory usage and cause performance regressions.  By default, the kernel enables THP for applications that explicitly ask for it via MADV_HUGEPAGE:

> "madvise" will enter direct reclaim like "always" but only for regions
> that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.

https://www.kernel.org/doc/Documentation/vm/transhuge.txt

RHEL, CentOS, and CoreOS (but not Fedora) all appear to override this behavior and set THP to [always].  This unfortunately causes issues with a large variety of software including, but not limited to:

splunk: https://docs.splunk.com/Documentation/Splunk/7.3.2/ReleaseNotes/SplunkandTHP
mongodb: https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/
couchbase: https://docs.couchbase.com/server/current/install/thp-disable.html
oracle: https://blogs.oracle.com/linux/performance-issues-with-transparent-huge-pages-thp
nuodb: http://doc.nuodb.com/4.0/Content/OpenShift-disable-THP.htm
Go runtime: https://github.com/golang/go/issues/8832
jemalloc: https://blog.digitalocean.com/transparent-huge-pages-and-alternative-memory-allocators/
node.js: https://github.com/nodejs/node/issues/11077
tcmalloc: https://github.com/gperftools/gperftools/issues/1073

More recently, we've also seen memory usage bloat in Ceph (using tcmalloc) when THP is set to always potentially resulting in OOM when running inside containers.  There are various ways to potentially work around this at the application level including using MADV_NOHUGEPAGE or a prctl flag.  Requiring these workarounds to disable THP for a given application is counter-intuitive for several reasons:

1) It deviates from the default kernel behavior without a strong justification as to why.

2) It puts the onus on developers to explicitly stop the kernel from engaging in sub-optimal behavior.

3) It's incredibly confusing to have a system-wide default that claims to "always" enable a setting that many applications may or may not silently disable through workarounds.

Finally, when another prominent distribution was faced with a similar choice, they ran stream and malloc tests showing improvement at various allocation sizes when THP was disabled.  Ultimately that lead them to switching back to the kernel default (ie madvise) with no apparent performance regressions:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1703742

Version-Release number of selected component (if applicable):


How reproducible:

This is a well known issue that can be reproduced via a variety of software.  Steps to reproduce in ceph are listed below.

Steps to Reproduce:
1. Install a single OSD ceph cluster.
2. Run a background write workload using hsbench or fio sufficient to fill the ceph-osd caches.
3. compare memory usage of the OSD process when THP is set to [always] vs [madvise]

Actual results:

https://docs.google.com/spreadsheets/d/1Xl3nWapi7ZKEmpnsSHHWO96iopEG0hK6GeDWhWKSfDo/edit?usp=sharing

Expected results:

These are the expected results when THP is set to [always] instead of [madvise] and the application does not explicitly override the kernel settings.  Optimally THP would only be used in situations where it provides a benefit and not a regression.

Additional Information:

https://unix.stackexchange.com/questions/495816/which-distributions-enable-transparent-huge-pages-for-all-applications
https://www.percona.com/blog/2019/03/06/settling-the-myth-of-transparent-hugepages-for-databases/
https://blog.nelhage.com/post/transparent-hugepages/
https://alexandrnikitin.github.io/blog/transparent-hugepages-measuring-the-performance-impact/
https://dl.acm.org/citation.cfm?id=3359640

Comment 1 Mark Nelson 2019-11-13 19:04:33 UTC
Update:

While the kernel documentation claims that madvise is the default, the actual code in mm/Kconfig shows that "always" is the default choice, so I retract the statement about differing from the kernel.  See:

https://github.com/torvalds/linux/blob/master/mm/Kconfig#L385-L407

Still, I think the rest stands.

Comment 2 Rafael Aquini 2019-11-19 03:02:25 UTC
Patch posted upstream suggesting the config change:

  https://lkml.org/lkml/2019/11/18/1031


Note You need to log in before you can comment on or make changes to this bug.