RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1076957 (libvirt-api-hugepages) - Expose huge pages information through libvirt API
Summary: Expose huge pages information through libvirt API
Keywords:
Status: CLOSED ERRATA
Alias: libvirt-api-hugepages
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1057941 1078542
TreeView+ depends on / blocked
 
Reported: 2014-03-16 18:45 UTC by Stephen Gordon
Modified: 2016-04-26 14:36 UTC (History)
10 users (show)

Fixed In Version: libvirt-1.2.7-1.el7
Doc Type: Enhancement
Doc Text:
Feature: Expoose huge page information through a libvirt API Reason: When allocating bigger chunks of memory, huge pages come handy. They have less overhead than if the memory was allocated the standard way. However, huge pages require hardware (CPU) cooperation. Therefore, not every page size is available everywhere - it depends on host CPU, host OS configuration, and so on. Having said that, when users want to use huge pages to back memory on their guests, they need to know what huge page sizes are available. So we need to learn libvirt to gather this kind of info and expose it. Result: The huge pages info is exposed in capability XML ('virsh capabilities') and the amount of free pages is again exposed via API ('virsh freepages'). Note, that each host NUMA node has its own huge pages pool.
Clone Of:
Environment:
Last Closed: 2015-03-05 07:31:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0323 0 normal SHIPPED_LIVE Low: libvirt security, bug fix, and enhancement update 2015-03-05 12:10:54 UTC

Description Stephen Gordon 2014-03-16 18:45:30 UTC
Description of problem:

Customer requests the ability to determine the following via the Libvirt API:

- The host's large page size.
- The total number of large pages available per host, and ideally per NUMA node.
- The total number of large pages free (versus in use) per host, and ideally per NUMA node.

It is anticipated that additional changes to lower level components including qemu may be required to facilitate the above.

Additional information:

Gap 1. It is not possible to get through libvirt the large page size configured in the host. This is necessary to know if a host will accept a VM using large pages of 1GiB size.
Command for solution in Linux:
$ sudo hugeadm --page-sizes
1073741824
The large page size is expressed in bytes

Gap 2. It is not possible to get through libvirt the number of large pages configured per NUMA node. This feature and the next one are necessary to know if a host will accept a VM using a specific number of large pages of 1GiB size in a specific NUMA.
Command for solution in Linux:
$ hugepage_sz=1048576; nodes_nr=2; for node_id in `seq 0 $((nodes_nr-1))`; do echo -n "Node $node_id: "; cat /sys/devices/system/node/node$node_id/hugepages/hugepages-$((hugepage_sz))kB/nr_hugepages; done
Node 0: 28
Node 1: 28
Where nodes_nr is the number of NUMA nodes and hugepase_sz is the size of the large page

Gap 3. It is not possible to get through libvirt the number of free large pages configured per NUMA node. This feature and the previous one are necessary to know if a host will accept a VM using a specific number of large pages of 1GiB size in a specific NUMA.
Command for solution in Linux:
$ hugepage_sz=1048576; nodes_nr=2; for node_id in `seq 0 $((nodes_nr-1))`; do echo -n "Node $node_id: "; cat /sys/devices/system/node/node$node_id/hugepages/hugepages-$((hugepage_sz))kB/free_hugepages; done
Node 0: 24

Comment 6 Michal Privoznik 2014-05-29 09:13:22 UTC
I've just proposed patches upstream:

https://www.redhat.com/archives/libvir-list/2014-May/msg00991.html

It's available in C as virNodeHugeTLB() API, in virsh (`virsh hugepages`) and libvirt-python too:
To get overall info pass -1 as NODE#, to get info on specific node, pass valid NODE#:

#virsh hugepages
Supported hugepage sizes:
        hugepage_size   1048576
        hugepage_available      4
        hugepage_free   4
        hugepage_size   2048
        hugepage_available      12
        hugepage_free   12

As we can see the host supports 1GiB and 2MiB hugepages (and none of them is being used right now).

Comment 7 Michal Privoznik 2014-06-10 17:22:13 UTC
Another attempt:

https://www.redhat.com/archives/libvir-list/2014-June/msg00435.html

Comment 8 Michal Privoznik 2014-06-16 15:09:51 UTC
Yet another one:

https://www.redhat.com/archives/libvir-list/2014-June/msg00710.html

Comment 9 Michal Privoznik 2014-06-19 13:35:09 UTC
So I've just pushed patches upstream:

commit 38fa03f4b0f5f84642cd99b6b8704f5028984770
Author:     Michal Privoznik <mprivozn>
AuthorDate: Tue Jun 10 16:16:44 2014 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jun 19 15:10:50 2014 +0200

    nodeinfo: Implement nodeGetFreePages
    
    And add stubs to other drivers like: lxc, qemu, uml and vbox.
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit 9e3efe53ded95e6b3284f7f55f625da87018e484
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Jun 9 17:56:43 2014 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jun 19 15:10:50 2014 +0200

    virsh: Expose virNodeGetFreePages
    
    The new API is exposed under 'freepages' command.
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit 34f2d0319d2098c77c8cc27d8350616029125a2b
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Jun 9 17:14:47 2014 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jun 19 15:10:49 2014 +0200

    Introduce virNodeGetFreePages
    
    The aim of the API is to get information on number of free pages
    on the system. The API behaves similar to the
    virNodeGetCellsFreeMemory(). User passes starting NUMA cell, the
    count of nodes that he's interested in, pages sizes (yes,
    multiple sizes can be queried at once) and the counts are
    returned in an array.
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit 02129b7c0e581898f03468e0bfb5472dc9903339
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri Jun 6 18:12:51 2014 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jun 19 15:10:49 2014 +0200

    virCaps: expose pages info
    
    There are two places where you'll find info on page sizes. The first
    one is under <cpu/> element, where all supported pages sizes are
    listed. Then the second one is under each <cell/> element which refers
    to concrete NUMA node. At this place, the size of page's pool is
    reported. So the capabilities XML looks something like this:
    
    <capabilities>
    
      <host>
        <uuid>01281cda-f352-cb11-a9db-e905fe22010c</uuid>
        <cpu>
          <arch>x86_64</arch>
          <model>Westmere</model>
          <vendor>Intel</vendor>
          <topology sockets='1' cores='1' threads='1'/>
          ...
          <pages unit='KiB' size='4'/>
          <pages unit='KiB' size='2048'/>
          <pages unit='KiB' size='1048576'/>
        </cpu>
        ...
        <topology>
          <cells num='4'>
            <cell id='0'>
              <memory unit='KiB'>4054408</memory>
              <pages unit='KiB' size='4'>1013602</pages>
              <pages unit='KiB' size='2048'>3</pages>
              <pages unit='KiB' size='1048576'>1</pages>
              <distances/>
              <cpus num='1'>
                <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
              </cpus>
            </cell>
            <cell id='1'>
              <memory unit='KiB'>4071072</memory>
              <pages unit='KiB' size='4'>1017768</pages>
              <pages unit='KiB' size='2048'>3</pages>
              <pages unit='KiB' size='1048576'>1</pages>
              <distances/>
              <cpus num='1'>
                <cpu id='1' socket_id='0' core_id='0' siblings='1'/>
              </cpus>
            </cell>
            ...
          </cells>
        </topology>
        ...
      </host>
    
      <guest/>
    
    </capabilities>
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit 35f1095e12abf333903915f96f029612648346d4
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri Jun 6 18:09:01 2014 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jun 19 15:10:49 2014 +0200

    virnuma: Introduce pages helpers
    
    For future work we need two functions that fetches total number of
    pages and number of free pages for given NUMA node and page size
    (virNumaGetPageInfo()).
    
    Then we need to learn pages of what sizes are supported on given node
    (virNumaGetPages()).
    
    Note that system page size is disabled at the moment as there's one
    issue connected. If you have a NUMA node with huge pages allocated the
    kernel would return the normal size of memory for that node. It
    basically ignores the fact that huge pages steal size from the system
    memory. Until we resolve this, it's safer to not confuse users and
    hence not report any system pages yet.
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit 99a63aed2d3a660b61a21f30da677d9e625510a6
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Jun 16 14:02:34 2014 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jun 19 15:10:49 2014 +0200

    nodeinfo: Rename nodeGetFreeMemory to nodeGetMemory
    
    For future work we want to get info for not only the free memory
    but overall memory size too. That's why the function must have
    new signature too.
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit 356c6f389fcff5ca74b393a0d94f7542c1be9d81
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Jun 16 14:29:15 2014 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jun 19 15:10:49 2014 +0200

    virnuma: Introduce virNumaNodeIsAvailable
    
    Not on all hosts the set of NUMA nodes IDs is continuous. This is
    critical, because our code currently assumes the set doesn't contain
    holes. For instance in nodeGetFreeMemory() we can see the following
    pattern:
    
        if ((max_node = virNumaGetMaxNode()) < 0)
            return 0;
    
        for (n = 0; n <= max_node; n++) {
            ...
        }
    
    while it should be something like this:
    
        if ((max_node = virNumaGetMaxNode()) < 0)
            return 0;
    
        for (n = 0; n <= max_node; n++) {
            if (!virNumaNodeIsAvailable(n))
                continue;
            ...
        }
    
    Signed-off-by: Michal Privoznik <mprivozn>

v1.2.5-166-g38fa03f

Comment 11 Jincheng Miao 2014-11-24 11:06:17 UTC
Hi Michal,

I found freepages reports wrong number of hugepages.

1. setup hugepages
# echo 512 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

# echo 513 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

# echo 1 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages

# echo 2 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

2. check from capabilities, and it's correct
# virsh capabilities | grep page
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
          <pages unit='KiB' size='4'>15985225</pages>
          <pages unit='KiB' size='2048'>512</pages>
          <pages unit='KiB' size='1048576'>2</pages>
          <pages unit='KiB' size='4'>15728128</pages>
          <pages unit='KiB' size='2048'>513</pages>
          <pages unit='KiB' size='1048576'>3</pages>

3. check from freepages, it's wrong
# virsh freepages --all
Node 0:
4KiB: 15461962
2048KiB: 512
1048576KiB: 0

Node 1:
4KiB: 15197930
2048KiB: 1
1048576KiB: 3

Comment 12 Michal Privoznik 2014-11-24 13:01:39 UTC
(In reply to Jincheng Miao from comment #11)
> Hi Michal,
> 
> I found freepages reports wrong number of hugepages.
> 
> 1. setup hugepages
> # echo 512 >
> /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
> 
> # echo 513 >
> /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
> 
> # echo 1 >
> /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
> 
> # echo 2 >
> /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
> 

While this tells kernel to allocate hugepages, due to memory fragmentation the operation may not succeed. And while it's easier to find smaller chunks of continuous free memory, 2M is more likely to succeed.

> 2. check from capabilities, and it's correct
> # virsh capabilities | grep page
>       <pages unit='KiB' size='4'/>
>       <pages unit='KiB' size='2048'/>
>       <pages unit='KiB' size='1048576'/>
>           <pages unit='KiB' size='4'>15985225</pages>
>           <pages unit='KiB' size='2048'>512</pages>
>           <pages unit='KiB' size='1048576'>2</pages>
>           <pages unit='KiB' size='4'>15728128</pages>
>           <pages unit='KiB' size='2048'>513</pages>
>           <pages unit='KiB' size='1048576'>3</pages>

If you check it with what kernel reports, is there any difference? I mean, what number is shown in in nr_hugepages file on nodes 0 and 1 for 1G hugepage? Does it correspond to what libvirt reports?

cat /sys/devices/system/node/node{0,1}/hugepages/hugepages-1048576kB/nr_hugepages ; virsh capabilities | grep pages; virsh freepages --all

Moreover, it takes some time for kernel to allocate the pages, so it's better to run the commands above at once.

Comment 13 Jincheng Miao 2014-11-25 02:42:27 UTC
(In reply to Michal Privoznik from comment #12)
> If you check it with what kernel reports, is there any difference? I mean,
> what number is shown in in nr_hugepages file on nodes 0 and 1 for 1G
> hugepage? Does it correspond to what libvirt reports?
> 
> cat
> /sys/devices/system/node/node{0,1}/hugepages/hugepages-1048576kB/
> nr_hugepages ; virsh capabilities | grep pages; virsh freepages --all
> 
> Moreover, it takes some time for kernel to allocate the pages, so it's
> better to run the commands above at once.

Yes, you are right. Wait for a while, the 1G hugepage is consistent from
nr_hugepages and freepages. Thanks for your advice.

Comment 14 Jincheng Miao 2014-12-08 08:14:12 UTC
This feature is implemented:

1. add hugepages allocation in kernel cmdline:
'default_hugepagesz=1G hugepagesz=1G hugepages=2 hugepagesz=2M hugepages=300'

2. configure hugepage mount point for libvirt
# vim /etc/libvirt/qemu.conf
...
hugetlbfs_mount = ["/dev/hugepages2M", "/dev/hugepages1G"]
...

# mkdir /dev/hugepages2M

# mount -t hugetlbfs -o pagesize=2M none /dev/hugepages2M

# mkdir /dev/hugepages1G

# mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G



3. check it via virsh capabilities
# virsh capabilities
...
    <cpu>
    ...
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>
...
    <topology>
      <cells num='1'>
        <cell id='0'>
          <memory unit='KiB'>7863696</memory>
          <pages unit='KiB' size='4'>1288036</pages>
          <pages unit='KiB' size='2048'>300</pages>
          <pages unit='KiB' size='1048576'>2</pages>
...


4. use freepages to query free pages
# virsh freepages --all
Node 0:
4KiB: 482318
2048KiB: 300
1048576KiB: 2

Comment 16 errata-xmlrpc 2015-03-05 07:31:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html


Note You need to log in before you can comment on or make changes to this bug.