Bug 1474327

Summary: libvirt support for "numa_node" QEMU -device spapr-pci-host-bridge option
Product: Red Hat Enterprise Linux 7 Reporter: IBM Bug Proxy <bugproxy>
Component: libvirtAssignee: Andrea Bolognani <abologna>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.4-AltCC: abologna, bugproxy, dzheng, gsun, haizhao, hannsj_uhl, jkachuck, jsuchane, knoel, lmiksik, rbalakri
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.4-Alt   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-3.2.0-18.el7a Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-09 11:26:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1299988, 1438583, 1440030    

Description IBM Bug Proxy 2017-07-24 12:20:31 UTC

Comment 1 IBM Bug Proxy 2017-07-24 12:21:18 UTC
For Performance reasons, customers using Mellanox adapters would like to pin their application to the NUMA node that adapter is using and also driver can be using the numa_node information to pin some of their resources to the numa node. 
If I do pci passthru of a Mellanox card to a  PKVM guest if you do 
cat /sys/bus/pci/devices/0000:00:06.0/numa_node
-1
(0000:00:06.0 is the device that I did pci passthru)

I was talking to Mike Roth and looks like the property that deals with this numa attribute is not propagated in device tree at the guest. 

I can see this with any mlx5 device like hyde park or CX4 or mlx4 device upstream code. 
If you use MOFED for mlx4 device I was seeing a 0 but it was that the driver was writing this value to the numa_node.

Opening bug to see if we can find a way to implement this because probably people looking for performance at the guest may ask for this. 

I can setup a guest for this bugzilla to work with. Let me know. 

I am using PKVM build 20 4.4.11-1.el7_1.pkvm3_1_1.2000.0.ppc64le


== Comment: #27 - Shivaprasad G. Bhat <shivapbh.com> - 2017-07-24 07:06:08 ==
The Patches that fix this are 

Minor formatting fix
313274a qemu_capabilities: Honour caps values formatting

Actual Fixes.
e5a0579 qemu: Enable NUMA node tag in pci-root for PPC64
11b2ebf qemu: capabilitity: Introduce QEMU_CAPS_SPAPR_PCI_HOST_BRIDGE_NUMA_NODE
eb56cc6 Add capabilities for qemu-2.9.0 ppc64

Thanks,
Shivaprasad

Comment 5 Dan Zheng 2017-09-04 08:22:13 UTC
Test packages:
kernel-4.11.0-23.el7a.ppc64le
libvirt-3.2.0-21.el7a.ppc64le
qemu-kvm-2.9.0-22.el7a.ppc64le


Cases:
1. Configure numa node  element to default pci-root

    <controller type='pci' index='0' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='0'>
        <node>0</node>
      </target>
    </controller>

error: XML error: The PCI controller with index=0 can't be associated with a NUMA node

2. Configure numa node  to non-default pci-root without <cpu><numa> setting
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='3'>
        <node>263</node>
      </target>
    </controller>

error: XML error: pci-root with index 1 is configured for a NUMA node (263) not present in the domain's <cpu><numa> array (0)

3. Configure invalid node to pci-root 1 in the guest with numa setting
Configure 3 numa nodes in the guest and add node 3 to pci-root index 1, 
Save the guest.
  
<vcpu placement='static'>24</vcpu>
<cpu>    
    <numa>
      <cell id='0' cpus='0-7' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='8-15' memory='1048576' unit='KiB'/>
      <cell id='2' cpus='16-23' memory='1048576' unit='KiB'/>
    </numa>
</cpu>
<devices>
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='1'>
        <node>**3**</node>
      </target>
    </controller>
...
</devices>

Fail to save the guest with error:
error: XML error: pci-root with index 1 is configured for a NUMA node (3) not present in the domain's <cpu><numa> array (3)

4. Configure numa node element in created pci-root (with attached device)  in the guest wtih numa setting 
a. Configure 3 numa nodes in the guest and add node element to pci-root index 1,  attach any device to the pci-root 1, like interface, memballon and so on. 
Save the guest.

Take interface for example.

  <vcpu placement='static'>24</vcpu>
  <cpu>
    <numa>
      <cell id='0' cpus='0-7' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='8-15' memory='1048576' unit='KiB'/>
      <cell id='2' cpus='16-23' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
<devices>
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='1'>
        <node>*2*</node>
      </target>
    </controller>

    <interface type='bridge'>
      <mac address='52:54:00:56:1b:82'/>
      <source bridge='virbr0'/>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='*0x01*' slot='0x01' function='0x0'/>
    </interface>

b. Start the guest
c. Check qemu command line for the pci-root 1
   Check index (1) is equal to pci-root <target index>,
   Check  id is equal to pci.<bus> 
   Check numa_node is equal to <node>
 
   -device spapr-pci-host-bridge,index=1,id=pci.1,numa_node=2 


d. Check in guest using lspci that the interface's NUMA node should be 2
   # lspci
0001:00:01.0 Ethernet controller: Red Hat, Inc Virtio network device
# lspci -vv -s 0001:00:01.0|grep 'NUMA node:'
        NUMA node: 2"

e. Check the interface can get ip
# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.232  netmask 255.255.255.0  broadcast 192.168.122.255
        inet6 fe80::5054:ff:fe56:1b82  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:56:1b:82  txqueuelen 1000  (Et



All are pass.

Comment 7 errata-xmlrpc 2017-11-09 11:26:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3174

Comment 8 IBM Bug Proxy 2018-01-04 12:21:34 UTC
------- Comment From bssrikanth.com 2018-01-04 07:15 EDT-------
Versions against which tests were executed:

kernel-4.14.0-1.rel.git68b4afb.el7.centos.ppc64le
libvirt-3.9.0-1.rel.git99ed075.el7.centos.ppc64le
qemu-2.11.0-1.rel.gite7153e0.el7.centos.ppc64le

Testcase 1:
Created two numa nodes in guest, made non default pci-root part of numa node 1 --> had a virtio-net device part of this.. booted guest and verified same.
Result:
Guest xml:

<cell id='0' cpus='0-15' memory='4194304' unit='KiB'/>
<cell id='1' cpus='16-31' memory='4194304' unit='KiB'/> <---

..
..
<target index='0'/>
<node>1</node>  <--

..

Inside guest:

[root@localhost ~]# lspci -vvs 0001:00:01.0
Subsystem: Red Hat, Inc Device 0001
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 21
*NUMA node: 1*
Region 0: I/O ports at 0020 [size=32]
Region 1: Memory at 200100000000 (32-bit, non-prefetchable) [size=4K]
Region 4: Memory at 220000000000 (64-bit, prefetchable) [size=16K]
Expansion ROM at 200100040000 [disabled] [size=256K]
Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
Vector table: BAR=1 offset=00000000
PBA: BAR=1 offset=00000800
Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
BAR=0 offset=00000000 size=00000000
Capabilities: [70] Vendor Specific Information: VirtIO: Notify
BAR=4 offset=00003000 size=00001000 multiplier=00000004
Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
BAR=4 offset=00002000 size=00001000
Capabilities: [50] Vendor Specific Information: VirtIO: ISR
BAR=4 offset=00001000 size=00001000
Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
BAR=4 offset=00000000 size=00001000
Kernel driver in use: virtio-pci

[root@localhost ~]# lspci
0000:00:01.0 Ethernet controller: Red Hat, Inc Virtio network device
0000:00:02.0 SCSI storage controller: Red Hat, Inc Virtio SCSI
0000:00:03.0 USB controller: Red Hat, Inc. Device 000d (rev 01)
0000:00:04.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
0001:00:01.0 Ethernet controller: Red Hat, Inc Virtio network device

Testcase 2:
Tried associating default pci-root with numa node
Result: error: XML error: The PCI controller with index=0 can't be associated with a NUMA node

Testcase 3:
Tried associating pci-root to non existing numa node [with numa and without numa nodes in xml]
Result: error: XML error: pci-root with index 1 is configured for a NUMA node (20) not present in the domain's <cpu><numa> array (2)

Tests passed. Setting to verified.