Bug 1474327 - libvirt support for "numa_node" QEMU -device spapr-pci-host-bridge option
libvirt support for "numa_node" QEMU -device spapr-pci-host-bridge option
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.4-Alt
ppc64le Linux
urgent Severity urgent
: rc
: 7.4-Alt
Assigned To: Andrea Bolognani
Virtualization Bugs
: FutureFeature
Depends On:
Blocks: 1399177 1438583 1440030
  Show dependency treegraph
 
Reported: 2017-07-24 08:20 EDT by IBM Bug Proxy
Modified: 2017-11-09 06:26 EST (History)
11 users (show)

See Also:
Fixed In Version: libvirt-3.2.0-18.el7a
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-11-09 06:26:03 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 146186 None None None 2017-07-24 08:22 EDT

  None (edit)
Description IBM Bug Proxy 2017-07-24 08:20:31 EDT

    
Comment 1 IBM Bug Proxy 2017-07-24 08:21:18 EDT
For Performance reasons, customers using Mellanox adapters would like to pin their application to the NUMA node that adapter is using and also driver can be using the numa_node information to pin some of their resources to the numa node. 
If I do pci passthru of a Mellanox card to a  PKVM guest if you do 
cat /sys/bus/pci/devices/0000:00:06.0/numa_node
-1
(0000:00:06.0 is the device that I did pci passthru)

I was talking to Mike Roth and looks like the property that deals with this numa attribute is not propagated in device tree at the guest. 

I can see this with any mlx5 device like hyde park or CX4 or mlx4 device upstream code. 
If you use MOFED for mlx4 device I was seeing a 0 but it was that the driver was writing this value to the numa_node.

Opening bug to see if we can find a way to implement this because probably people looking for performance at the guest may ask for this. 

I can setup a guest for this bugzilla to work with. Let me know. 

I am using PKVM build 20 4.4.11-1.el7_1.pkvm3_1_1.2000.0.ppc64le


== Comment: #27 - Shivaprasad G. Bhat <shivapbh@in.ibm.com> - 2017-07-24 07:06:08 ==
The Patches that fix this are 

Minor formatting fix
313274a qemu_capabilities: Honour caps values formatting

Actual Fixes.
e5a0579 qemu: Enable NUMA node tag in pci-root for PPC64
11b2ebf qemu: capabilitity: Introduce QEMU_CAPS_SPAPR_PCI_HOST_BRIDGE_NUMA_NODE
eb56cc6 Add capabilities for qemu-2.9.0 ppc64

Thanks,
Shivaprasad
Comment 5 Dan Zheng 2017-09-04 04:22:13 EDT
Test packages:
kernel-4.11.0-23.el7a.ppc64le
libvirt-3.2.0-21.el7a.ppc64le
qemu-kvm-2.9.0-22.el7a.ppc64le


Cases:
1. Configure numa node  element to default pci-root

    <controller type='pci' index='0' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='0'>
        <node>0</node>
      </target>
    </controller>

error: XML error: The PCI controller with index=0 can't be associated with a NUMA node

2. Configure numa node  to non-default pci-root without <cpu><numa> setting
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='3'>
        <node>263</node>
      </target>
    </controller>

error: XML error: pci-root with index 1 is configured for a NUMA node (263) not present in the domain's <cpu><numa> array (0)

3. Configure invalid node to pci-root 1 in the guest with numa setting
Configure 3 numa nodes in the guest and add node 3 to pci-root index 1, 
Save the guest.
  
<vcpu placement='static'>24</vcpu>
<cpu>    
    <numa>
      <cell id='0' cpus='0-7' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='8-15' memory='1048576' unit='KiB'/>
      <cell id='2' cpus='16-23' memory='1048576' unit='KiB'/>
    </numa>
</cpu>
<devices>
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='1'>
        <node>**3**</node>
      </target>
    </controller>
...
</devices>

Fail to save the guest with error:
error: XML error: pci-root with index 1 is configured for a NUMA node (3) not present in the domain's <cpu><numa> array (3)

4. Configure numa node element in created pci-root (with attached device)  in the guest wtih numa setting 
a. Configure 3 numa nodes in the guest and add node element to pci-root index 1,  attach any device to the pci-root 1, like interface, memballon and so on. 
Save the guest.

Take interface for example.

  <vcpu placement='static'>24</vcpu>
  <cpu>
    <numa>
      <cell id='0' cpus='0-7' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='8-15' memory='1048576' unit='KiB'/>
      <cell id='2' cpus='16-23' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
<devices>
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='1'>
        <node>*2*</node>
      </target>
    </controller>

    <interface type='bridge'>
      <mac address='52:54:00:56:1b:82'/>
      <source bridge='virbr0'/>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='*0x01*' slot='0x01' function='0x0'/>
    </interface>

b. Start the guest
c. Check qemu command line for the pci-root 1
   Check index (1) is equal to pci-root <target index>,
   Check  id is equal to pci.<bus> 
   Check numa_node is equal to <node>
 
   -device spapr-pci-host-bridge,index=1,id=pci.1,numa_node=2 


d. Check in guest using lspci that the interface's NUMA node should be 2
   # lspci
0001:00:01.0 Ethernet controller: Red Hat, Inc Virtio network device
# lspci -vv -s 0001:00:01.0|grep 'NUMA node:'
        NUMA node: 2"

e. Check the interface can get ip
# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.232  netmask 255.255.255.0  broadcast 192.168.122.255
        inet6 fe80::5054:ff:fe56:1b82  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:56:1b:82  txqueuelen 1000  (Et



All are pass.
Comment 7 errata-xmlrpc 2017-11-09 06:26:03 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3174

Note You need to log in before you can comment on or make changes to this bug.