RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1217537 - Unable to create a NUMA node with CPUs and 0 MB of RAM
Summary: Unable to create a NUMA node with CPUs and 0 MB of RAM
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Eduardo Habkost
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1217144 1662586
TreeView+ depends on / blocked
 
Reported: 2015-04-30 15:13 UTC by Daniel Berrangé
Modified: 2018-12-30 13:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-07-29 18:27:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Screenshot of what seems to be a guest kernel bug or limitation (32.86 KB, image/png)
2015-07-28 20:42 UTC, Eduardo Habkost
no flags Details

Description Daniel Berrangé 2015-04-30 15:13:35 UTC
Description of problem:
In order to reproduce some customer problems running OpenStack on host with unusual NUMA topology, QE have been attempting to create a guest with a NUMA node which has CPUs but not RAM.

To this end we created a guest with 8 GB of RAM and 8 CPUs

  <memory unit='KiB'>8192000</memory>
  <currentMemory unit='KiB'>8192000</currentMemory>
  <vcpu placement='static'>8</vcpu>

And configured it to have 3 NUMA nodes, 4 CPUS + 4 GB RAM in first node. 2 CPUs and 4 GB RAM in the second node. 2 CPUs and 0 ram in the 3rd node

  <cpu mode='host-passthrough'>
    <numa>
      <cell id='0' cpus='0-3' memory='4096000' unit='KiB'/>
      <cell id='1' cpus='4-5' memory='4096000' unit='KiB'/>
      <cell id='2' cpus='6-7' memory='0' unit='KiB'/>
    </numa>
  </cpu>

Libvirt is turning that config into the following QEMU command line parmaeters:

 -smp 8,sockets=8,cores=1,threads=1 -numa node,nodeid=0,cpus=0-3,mem=4000 -numa node,nodeid=1,cpus=4-5,mem=4000 -numa node,nodeid=2,cpus=6-7,mem=0 


When the guest boots up though, and we look at the NUMA topology inside it, KVM has only created 2 nodes.  The CPUs from the 3rd NUMA node we requested, have been placed into the 1st NUMA node.

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 6 7
node 0 size: 3856 MB
node 0 free: 3392 MB
node 1 cpus: 4 5
node 1 size: 3937 MB
node 1 free: 3736 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

So it looks as if KVM is incorrectly configuring the NUMA tables when memory=0


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.1.2-23.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Launch guest with multiple NUMA nodes, where one of the nodes has 0 MB of RAM

Actual results:
Node with 0 MB RAM is not created, and its CPUs are silently merged into another node.

Expected results:
All requested NUMA nodes are created (or an error is raised at startup explaining why it is forbidden)

Additional info:

Comment 3 Eduardo Habkost 2015-07-28 20:42:24 UTC
Created attachment 1057120 [details]
Screenshot of what seems to be a guest kernel bug or limitation

Comment 4 Eduardo Habkost 2015-07-28 20:44:56 UTC
As this was never supported before, marking as FutureFeature. Is this kind of NUMA topology really supported by Linux and numactl?

The guest kernel is ignoring the CPU affinity entries in the SRAT table for the no-RAM nodes. I need to compare this with real hardware, to find out if this is really a guest kernel bug, a QEMU bug, or something that was never supported by Linux. Do we have any physical host having NUMA nodes with no RAM in our labs?

Comment 5 Daniel Berrangé 2015-07-29 09:48:58 UTC
Hmm, so perhaps this might be architecture dependant.

This RFE came about as a result of a bug filed against OpenStack for not considering the possibility that NUMA nodes can have CPUS without any RAM, so tgis was on real physical hardware

 https://bugs.launchpad.net/nova/+bug/1418187

From the libvirt capabilities attached to that bug report you can see the real hardware config:

  https://launchpadlibrarian.net/196616667/capabilities.txt

In particular though notice this is powerpc64 host hardware, not x86.  

So I guess it is conceivable that this might not be supported on x86 kernels. I'm not really sure if x86 takes different codepath than ppc64 when doing NUMA setup.

Comment 6 Eduardo Habkost 2015-07-29 17:25:05 UTC
(In reply to Daniel Berrange from comment #5)
> So I guess it is conceivable that this might not be supported on x86
> kernels. I'm not really sure if x86 takes different codepath than ppc64 when
> doing NUMA setup.

The mapping of CPUs to NUMA nodes is arch-specific code inside arch/{x86,powerpc}, so this behavior is likely to be arch-specific.

Comment 7 Eduardo Habkost 2015-07-29 18:27:32 UTC
Just confirmed that there's x86-specific code in Linux that ignores nodes without enough RAM:

At arch/x86/mm/numa.c:numa_register_memblks()
        for_each_node_mask(nid, node_possible_map) {
                /* [...] */
                /*
                 * Don't confuse VM with a node that doesn't have the
                 * minimum amount of memory:
                 */
                if (end && (end - start) < NODE_MIN_SIZE)
                        continue;

                /* alloc_node_data() will call node_set_online(nid) */
                alloc_node_data(nid);
        }

At arch/x86/mm/numa.c:init_cpu_to_node():
        for_each_possible_cpu(cpu) {
                int node = numa_cpu_node(cpu);

                if (node == NUMA_NO_NODE)
                        continue;
                if (!node_online(node))
                        node = find_near_online_node(node);
                numa_set_node(cpu, node);
        }


Note You need to log in before you can comment on or make changes to this bug.