Bug 1310122

Summary: Incorrect NUMA settings in libvirt/KVM
Product: Red Hat Enterprise Linux 6 Reporter: jblume <jblume>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Luyao Huang <lhuang>
Severity: high Docs Contact:
Priority: high    
Version: 6.7CC: dyuan, fdanapfe, jberan, jsuchane, lhuang, mzhan, rbalakri, tbowling, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.10.2-61.el6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-21 10:38:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 896092    
Bug Blocks: 1359965    

Description jblume@redhat.com 2016-02-19 14:13:28 UTC
Description of problem:
A partner needs to find the same NUMA structure as found on bare-metal inside a VM. On bare-metal, numactl -H shows the following:

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 24559 MB
node 0 free: 21922 MB
node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
node 1 size: 24576 MB
node 1 free: 23832 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

The virtual CPUs are pinned to physical ones inside the domain definition:
<vcpu placement='static' cpuset='0-21'>22</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2'/>
    ...
    <vcpupin vcpu='21' cpuset='21'/>
  </cputune>

Below the <feature policy.../> lines in the <cpu> section I have appended the following:
    <numa>
      <cell cpus='0,1,2,3,4,5,12,13,14,15,16,17' memory='4194304'/>
      <cell cpus='6,7,8,9,10,11,18,19,20,21' memory='4194304'/>
    </numa>

The VM has a total of 8GiByte of RAM.
After the VM is started, the command numactl -H shows the following:

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17 18 19 20 21
node 0 size: 4095 MB
node 0 free: 3480 MB
node 1 cpus: 6 7 8 9 10 11
node 1 size: 4096 MB
node 1 free: 3960 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
So the issue here is that node 1 only has the first six VCPUs, while the rest was assigned to node 0.

Version-Release number of selected component (if applicable):
libvirt-0.10.2-54.el6_7.3.x86_64

Steps to Reproduce:
1. Set up a physical machine with more than one socket(s) using RHEL6.7
2. Install a RHEL6.7 VM and assign all logical cores minus 2 to it. Use pinning and numa.
3. Start VM and compare the output from numactl -H with the one on bare-metal.

Actual results:
Wrong or at least plausibleness assignment of VCPUs inside the guest

Expected results:
Same output from numactl -H from bare-metal and guest with exception of the two VCPUs in the guest

Additional info:

Comment 2 Luyao Huang 2016-02-24 07:14:53 UTC
I can reproduce this issue with libvirt-0.10.2-54.el6_7.3.x86_64 and libvirt-0.10.2-54.el6_7.3.x86_64:

Guest xml:
...
  <vcpu placement='static' cpuset='0-21'>22</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2'/>
...
    <vcpupin vcpu='21' cpuset='21'/>
  </cputune>
...
  <cpu>
    <numa>
      <cell cpus='0,1,2,3,4,5,12,13,14,15,16,17' memory='4194304'/>
      <cell cpus='6,7,8,9,10,11,18,19,20,21' memory='4194304'/>
    </numa>
  </cpu>


And qemu cmd line:

# ps aux|grep qemu
qemu     26296 15.1  2.0 14470864 664412 ?     Sl   15:04   0:25 /usr/libexec/qemu-kvm -name test4 -S -M rhel6.6.0 -enable-kvm -m 8192 -realtime mlock=off -smp 22,sockets=22,cores=1,threads=1 -numa node,nodeid=0,cpus=0-5,12-17,mem=4096 -numa node,nodeid=1,cpus=6-11,18-21,mem=4096 -uuid b1e7936b-104b-430e-9211-d6c61b8df313 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test4.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/var/lib/libvirt/images/test3.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,ioeventfd=on,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/guestfwd1,server,nowait -netdev user,guestfwd=tcp:10.0.2.1:4600-chardev:charchannel0,id=user-channel0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xf -msg timestamp=on

IN Guest:

# numactl --ha
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17 18 19 20 21
node 0 size: 4095 MB
node 0 free: 3669 MB
node 1 cpus: 6 7 8 9 10 11
node 1 size: 4096 MB
node 1 free: 3988 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 


But i saw libvirt generated right qemu command line:

-numa node,nodeid=0,cpus=0-5,12-17,mem=4096 -numa node,nodeid=1,cpus=6-11,18-21,mem=4096

Comment 3 Peter Krempa 2016-02-24 08:34:02 UTC
(In reply to Luyao Huang from comment #2)
> I can reproduce this issue with libvirt-0.10.2-54.el6_7.3.x86_64 and
> libvirt-0.10.2-54.el6_7.3.x86_64:

[...]
 
> But i saw libvirt generated right qemu command line:
> 
> -numa node,nodeid=0,cpus=0-5,12-17,mem=4096 -numa
> node,nodeid=1,cpus=6-11,18-21,mem=4096

No. The command line is invalid. The correct one is
-numa node,nodeid=0,cpus=0-5,cpus=12-17,mem=4096
-numa node,nodeid=1,cpus=6-11,cpus=18-21,mem=4096
(extra 'cpus=' for the second range)

This though isn't supported with rhel-6 qemu. The correct formatter was added by commit:

commit 001b9dc1dcd568159cbe253e3736b873534bd254
Author: Martin Kletzander <mkletzan>
Date:   Tue Jun 17 14:16:59 2014 +0200

    qemu: enable disjoint numa cpu ranges

Prior to that the configuration was rejected by commit:

commit 25dc8ba08b32c7430d81228718c90d277f902f18
Author: Eric Blake <eblake>
Date:   Tue Feb 26 17:43:12 2013 -0700

    qemu: -numa doesn't (yet) support disjoint range
    
    https://bugzilla.redhat.com/show_bug.cgi?id=896092 mentions that
    qemu 1.4 and earlier only accept a simple start-stop range for
    the cpu=... argument of -numa.  Libvirt would attempt to use
    -numa cpu=1,3 for a disjoint range, which did not work as intended.
    
    Upstream qemu will be adding a new syntax for disjoint cpu ranges
    in 1.5; but the design for that syntax is still under discussion
    at the time of this patch.  So for libvirt 1.0.3, it is safest to
    just reject attempts to build an invalid qemu command line; in the
    future, we can add a capability bit and translate to the final
    accepted design for selecting a disjoint cpu range in numa.

Neither of those are backkported to rhel-6. As of this BZ we could possibly backport the second commit that will reject such configuration right away.

The suggested workaround for this case is to use a joint cpu range for the cells and then use disjoint pinning to host vcpus so that the numa nodes of the guest will be equivalent to host numa node. Unfortunately there's no other way to make the requested config work in rhel-6.

Comment 5 jblume@redhat.com 2016-02-24 18:13:31 UTC
I tried out the workaround as stated in Comment 3. It went fine without hassle.

If the enhanced functionality cannot be backported to RHEL6, I agree that it was better if this kind of "misconfiguration" was rejected so to backport the second commit.

Comment 8 Mike McCune 2016-03-28 23:19:36 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 14 Luyao Huang 2016-11-11 07:19:54 UTC
Verify this bug with libvirt-0.10.2-62.el6.x86_64 and qemu-kvm-rhev-0.12.1.2-2.496.el6.x86_64:

1. prepare a guest with numa settings like this:

# virsh dumpxml r6
...
  <cpu>
    <numa>
      <cell cpus='0,2' memory='1048576'/>
      <cell cpus='1,3' memory='1048576'/>
    </numa>
  </cpu>
...

2. try to start guest:

# virsh start r6
error: Failed to start domain r6
error: unsupported configuration: disjoint NUMA cpu ranges are not supported with this QEMU

3. change numa settings to:

# virsh dumpxml r6
  <cpu>
    <numa>
      <cell cpus='0-1' memory='1048576'/>
      <cell cpus='2-3' memory='1048576'/>
    </numa>
  </cpu>

4. start guest and check numa:

# virsh start r6
Domain r6 started

5.

IN GUEST:

# numactl --har
available: 2 nodes (0-1)
node 0 cpus: 0 1
node 0 size: 1023 MB
node 0 free: 675 MB
node 1 cpus: 2 3
node 1 size: 1023 MB
node 1 free: 888 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10

Comment 16 errata-xmlrpc 2017-03-21 10:38:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0682.html