Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
DescriptionLuiz Capitulino
2015-06-16 14:27:09 UTC
The HugeTLB subsystem in the Linux kernel provides bigger page sizes for user-space processes. In x86_64 for example, it provides 2MB and 1GB pages. Bigger pages is an important feature for high performance computing and virtualization.
Before processes can allocate a bigger page, they have to be manually reserved by the system administrator. One of the ways of doing this, and for most use-cases the recommended way, is to write the number of pages to be reserved in sysfs. For example, this reserves 10 1GB pages in node0:
# echo 10 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
And this reserves 256 2MB pages in node4:
# echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB
The problem here is that, we need a way to make this configuration persistent. Not only that, but we also need the reservation to happen as early as possible during boot so that we avoid memory fragmentation to some extent.
PS: I'm not sure what's the right component for this, it could be libhugetlbfs, libhugetlbfs-utils or maybe systemd itself.
This is an _example_ of what I've provided to a customer in the past:
1. Create a file named /usr/lib/systemd/system/hugetlb-gigantic-pages.service
with the following contents:
[Unit]
Description=HugeTLB Gigantic Pages Reservation
DefaultDependencies=no
Before=dev-hugepages.mount
ConditionPathExists=/sys/devices/system/node
ConditionKernelCommandLine=hugepagesz=1G
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/lib/systemd/hugetlb-reserve-pages
[Install]
WantedBy=sysinit.target
2. Create a file named /usr/lib/systemd/hugetlb-reserve-pages with the
following contents:
#!/bin/bash
nodes_path=/sys/devices/system/node/
if [ ! -d $nodes_path ]; then
echo "ERROR: $nodes_path does not exist"
exit 1
fi
reserve_pages()
{
echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages
}
# This example reserves 2 1G pages on node0 and 1 1G page on node1. You
# can modify it to your needs or add more lines to reserve memory in
# other nodes. Don't forget to uncomment the lines, otherwise then won't
# be executed.
# reserve_pages 2 node0
# reserve_pages 1 node1
3. Run the following commands to enable early boot reservation:
# chmod +x /usr/lib/systemd/hugetlb-reserve-pages
# systemctl enable hugetlb-gigantic-pages
4. Modify /usr/lib/systemd/hugetlb-reserve-pages according to the
comments in the file
5. Reboot the machine
(In reply to Luiz Capitulino from comment #1)
>
> 4. Modify /usr/lib/systemd/hugetlb-reserve-pages according to the
> comments in the file
Did you let customer edit the file manually or was there any wrapper for doing that? Anyway, using systemd for persistent but configurable hugepage allocation is smart and I don't see any obstacle to do it like that.
This is not a libhugetlbfs library or hugetlbfs kernel problem. Its an issue with sysctl or even tuned where the tunabkle parameters are stored and held over reboots. Let me look further into where this should be enhanced.
Larry Woodman
Comment 15Stephen Finucane
2017-10-18 10:52:34 UTC
Is this looking like something that might ship in RHEL 7.5? Persistent, per-NUMA node hugepage configuration is something that we'd like to be able to rely on from an OpenStack perspective.
As noted on comment 14, this is not libhugetlbfs ground and it seems that systemd is not the right place to carry these requirements -- comment 9. Probably, this should be dealt with by QEMU KVM-RT team under their own tooling and/or documentation.
I'm switching the component over to qemu-kvm, but if neigher QEMU team thinks this requests suits there, then please just get this ticket closed.
Regards,
-- Rafael
This is certainly not a qemu or kvm issue, since kvm guests are not the only users of hugepages in Linux (I'd even guess HugeTLB exists before KVM).
We (as most users) solved this problem by using init scripts such as /etc/rc.d/rc.local. Since this workaround has always worked and since we seem unable to give a better solution, I'll just close the BZ.