Bug 1411971
Summary: | HPE [RFE] [Director] Allow configuring numa affinity of hugepages | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | hrushi <hrushikesh.gangur> |
Component: | rhosp-director | Assignee: | Angus Thomas <athomas> |
Status: | CLOSED WONTFIX | QA Contact: | Amit Ugol <augol> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 12.0 (Pike) | CC: | chegu_vinod, dbecker, fbaudin, hrushikesh.gangur, jcoufal, lyarwood, mburns, morazi, owalsh, rhel-osp-director-maint, sgordon, skramaja, stephenfin |
Target Milestone: | --- | Keywords: | FutureFeature, Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-10-01 16:36:59 UTC | Type: | Feature Request |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1341176, 1476900, 1521118 |
Description
hrushi
2017-01-10 21:38:09 UTC
Franck did the NFV QE team look at this type of scenario at all? Is this currently feasible from a kernel POV? If it is I would have thought the direct nature of the kernel argument passthrough at the moment would allow this. Looking at the /sys structure it seems like it should be: $ tree /sys/devices/system/node/node0/hugepages/ /sys/devices/system/node/node0/hugepages/ ├── hugepages-1048576kB │ ├── free_hugepages │ ├── nr_hugepages │ └── surplus_hugepages └── hugepages-2048kB ├── free_hugepages ├── nr_hugepages └── surplus_hugepages 2 directories, 6 files I'm just not sure there is a way to do this from the kernel arguments for *both* sizes you want (versus runtime allocation which you can do by direct manipulation of nr_hugepages albeit this is not likely to pan out for 1G pages). Yes, it can't be done just through kernel arguments. We typically follow these steps and some of these needs to be configured through ansible roles and requires reboot of the node. -- A) Prepare the Compute host first and then reboot it : i) Setup mount points : # mkdir –p /mnt/hugepages_2M # mkdir –p /mnt/hugepages_1G ii) Add the following to /etc/fstab hugetlbfs /mnt/hugepages_2M hugetlbfs defaults 0 0 hugetlbfs /mnt/hugepages_1G hugetlbfs pagesize=1GB 0 0 iii) Add the following to the kernel command line (grub config) : hugepagesz=1G hugepages=4 Note: This could also be a place where you can setup other kernel cmdline arguments like isolcpus etc. (based on CRM inputs) iv) Reboot the Compute host. B) After the Host OS reboots one can setup huge pages on each NUMA node…( Note: This is the step which needs to be repeated each time the Compute Host reboots…based on what was previously set up by HOS-HLM (via CRM inputs)). (In this example: setting up 100 2MB pages on numa node 0 and 125 2MB pages on numa node 1 and 8 1G huge pages on numa node 0 and 16 1G huge pages on numa node 1) Using virsh : # virsh allocpages 2048KiB 100 --cellno 0 # virsh allocpages 2048KiB 125 --cellno 1 # virsh allocpages 1048576KiB 16 --cellno 0 # virsh allocpages 1048576KiB 8 --cellno 1 If you don’t like to use virsh you can do the following : echo 100 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages echo 125 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages echo 16 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages echo 8 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages Just for verification : # virsh freepages --all Node 0: 4KiB: 3896664 2048KiB: 100 1048576KiB: 16 Node 1: 4KiB: 6024538 2048KiB: 125 1048576KiB: 8 -- Right, unfortunately I do not believe we will be able to achieve this in the Pike timeframe, I would like to re-evaluate for the Queens release. Thanks, Steve Moving target to Rocky/RHOSP 14. Multiple hugepage sizes can be allocated via kernel cmdline e.g default_hugepagesz=1G hugepagesz=1G hugepages=1 hugepagesz=2M hugepages=10: # hugeadm --pool-list Size Minimum Current Maximum Default 2097152 10 10 10 1073741824 1 1 1 * However AFAIK it's not possible to control the numa placement of these hugepages. Updated bugzilla summary as both 2M and 1G hugepages can already be allocated at boot time. Numa affinity cannot be controlled via kernel boot cmdline however. I think it's about time this was closed. While this is definitely a useful feature, I think it's something that we really ought to have support for in the kernel and sysctl to do and just consume that. Given this isn't the case and probably won't be for some time, I think it's time to close this out. |