Bug 1103314 - RFE: configure guest NUMA node locality for guest PCI devices
Summary: RFE: configure guest NUMA node locality for guest PCI devices
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Laine Stump
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 1230131 (view as bug list)
Depends On: guestNUMALocalityPCIdev 1235381
Blocks: 1078542 1227278
TreeView+ depends on / blocked
 
Reported: 2014-05-30 17:39 UTC by Daniel Berrangé
Modified: 2020-08-09 13:29 UTC (History)
14 users (show)

Fixed In Version: libvirt-1.3.4-1.el7
Doc Type: Enhancement
Doc Text:
Clone Of: guestNUMALocalityPCIdev
Environment:
Last Closed: 2016-11-03 18:09:03 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2577 0 normal SHIPPED_LIVE Moderate: libvirt security, bug fix, and enhancement update 2016-11-03 12:07:06 UTC

Description Daniel Berrangé 2014-05-30 17:39:32 UTC
+++ This bug was initially created as a clone of Bug #1103313 +++

Description of problem:
In a guest which has multiple NUMA nodes which are bound to specific host nodes, and is also given host PCI devices, it is desirable to specify which guest NUMA nodes are associated with the PCI device in the guest. This is important to ensure that the guest does not use the PCI device in a way which causes it to do I/O across host NUMA nodes.

This appears to involve ACPI SLIT setup in some manner, either against the PCI bus or the device.

cf  http://www.osronline.com/showthread.cfm?link=241954

This feature is important for OpenStack's NFV use cases.

Comment 3 Daniel Berrangé 2014-06-11 16:25:16 UTC
(In reply to Marcelo Tosatti from comment #2)
> (In reply to Daniel Berrange from comment #0)
> > +++ This bug was initially created as a clone of Bug #1103313 +++
> > 
> > Description of problem:
> > In a guest which has multiple NUMA nodes which are bound to specific host
> > nodes, and is also given host PCI devices, it is desirable to specify which
> > guest NUMA nodes are associated with the PCI device in the guest. This is
> > important to ensure that the guest does not use the PCI device in a way
> > which causes it to do I/O across host NUMA nodes.
> > 
> > This appears to involve ACPI SLIT setup in some manner, either against the
> > PCI bus or the device.
> > 
> > cf  http://www.osronline.com/showthread.cfm?link=241954
> > 
> > This feature is important for OpenStack's NFV use cases.
> 
> Why is it important for NFV's use cases ?
> 
> AFAIK NFV's use cases perform manual pinning, not relying on automatic
> placement 
> of memory via PCI driver NUMA locality information.
> 
> Please clarify.
> 
> Note: its a nice to have feature, but its not an NFV requirement AFAICS.

Manual pinning of guests by the users is a data center virt use case. In OpenStack cloud world the people launching the images do *not* have any ability to manually pin guests to specific nodes - the cloud decides.

They will merely instruct OpenStack that the guest needs to have N  numa nodes visible, that the guest should be strictly pinned to the host and that the PCI device should be allocated from the same host nodes that the guest is pinned too.  The user does not get to say exactly which host NUMA node is used for pinning.

So if the guest has 2 NUMA nodes, and these are mapped to 2 host NUMA nodes, and the guest is given a PCI device, the end user in the guest has no way of knowing which host NUMA node the assigned PCI device came from. They thus have no way of knowing which guest NUMA node is local to the PCI device.

Comment 8 Marcel Apfelbaum 2015-06-10 10:18:16 UTC
*** Bug 1230131 has been marked as a duplicate of this bug. ***

Comment 9 Marcel Apfelbaum 2015-06-10 10:20:12 UTC
The PXB device was accepted into QEMU.
Command line arguments can be found in  QEMU git under docs/pci_expander_bridge.txt.

Comment 16 Marcel Apfelbaum 2016-03-01 08:40:40 UTC
Hi,

Please be aware we have pxb device for PC machines and pxb-pcie device for Q35 machines.

They have the same purpose, to expose a new PCI root bus, the differences are:
 - pxb-pcie has a PCIe bus instead of a PCI one
 - pxb-pcie has no internal pci bridge, but in order to attach a device to it, a root port should be used

Thanks,
Marcel

Comment 17 Laine Stump 2016-03-24 19:47:40 UTC
I just posted patches to libvir-list that support both pxb (pci-expander-bus) and pxb-pcie (pcie-expander-bus):

 https://www.redhat.com/archives/libvir-list/2016-March/msg01213.html

Comment 18 Laine Stump 2016-04-14 18:47:42 UTC
The following patches have been pushed upstream to implement support
for the pci-expander-bus and pcie-expander-bus PCI controllers.

The first 9 are listed here only because they are prerequisites to the
final 6 patches which actually implement the feature.

commit 51156bcff371b2622465a0de269872f7779295cf
Author: Laine Stump <laine@laine.org>
Date:   Tue Mar 22 11:38:53 2016 -0400

    schema: make pci slot and function optional

commit f97a03e70c451b669acf773140d2b59866170d70
Author: Laine Stump <laine@laine.org>
Date:   Tue Mar 22 12:00:40 2016 -0400

    schema: rename uint8range/uint24range to uint8/uint24
    
commit 8995ad1179fc8ae4eba4e0e5424464eec7fd5455
Author: Laine Stump <laine@laine.org>
Date:   Tue Mar 22 12:10:16 2016 -0400

    schema: new basic type - uint16
    
commit 5863b6e0c1efb93416ff83274619e7da8a399f5e
Author: Laine Stump <laine@laine.org>
Date:   Tue Mar 22 12:24:08 2016 -0400

    schema: allow pci address attributes to be in decimal
    
commit 6d0902a5ca64d1cd5ddc0e7a89053e477da5d7c1
Author: Laine Stump <laine@laine.org>
Date:   Wed Mar 2 15:31:02 2016 -0500

    conf: use #define instead of literal for highest slot in upstream port
    
commit 0d668434f46165c337a4f6bab6ffb398a0594eec
Author: Laine Stump <laine@laine.org>
Date:   Wed Mar 2 15:29:33 2016 -0500

    conf: allow use of slot 0 in a dmi-to-pci-bridge
    
commit d1cc4605d72b1df7075e615760a41936d8c6fb96
Author: Laine Stump <laine@laine.org>
Date:   Tue Mar 15 15:49:22 2016 -0400

    conf/qemu: change the way VIR_PCI_CONNECT_TYPE_* flags work
    
commit a0616ee8a8436e3784c20313b93ddd8909834541
Author: Laine Stump <laine@laine.org>
Date:   Wed Mar 16 14:20:52 2016 -0400

    conf: utility function to convert PCI controller model into connect type
    
commit 1da284736ed2456f12bae5b8dfff4a6b8aee9355
Author: Laine Stump <laine@laine.org>
Date:   Wed Mar 16 15:14:03 2016 -0400

    qemu: set PCI controller default modelName in a separate function
    
commit 5d4e2b1721d98c6f9fc24b2be760267b96d97092
Author: Laine Stump <laine@laine.org>
Date:   Wed Feb 24 16:40:49 2016 -0500

    qemu: add capabilities bit for device "pxb"
    
commit 52f3d0a4d2de2f3a633bd49b4ebc46a31329a04c
Author: Laine Stump <laine@laine.org>
Date:   Fri Mar 4 10:26:23 2016 -0500

    conf: new pci controller model pci-expander-bus
    
commit 400b297692c76746046e672e59d186776171b1c1
Author: Laine Stump <laine@laine.org>
Date:   Fri Mar 4 14:35:20 2016 -0500

    qemu: support new pci controller model "pci-expander-bus"
    
commit 0ec0bc85d0f7ba1847b1bcf066c09957aff1855e
Author: Laine Stump <laine@laine.org>
Date:   Tue Mar 8 16:25:22 2016 -0500

    qemu: add capabilities bit for device "pxb-pcie"
    
commit bc07251f59ecf828c3f9a82c9ef485937fd6c9bb
Author: Laine Stump <laine@laine.org>
Date:   Wed Mar 16 13:37:14 2016 -0400

    conf: new pci controller model pcie-expander-bus
    
commit 8b62c65d24bdb20121d3147b4f4dc98bac4f024b
Author: Laine Stump <laine@laine.org>
Date:   Wed Mar 23 15:49:29 2016 -0400

    qemu: support new pci controller model "pcie-expander-bus"

Comment 19 Laine Stump 2016-04-14 19:26:30 UTC
To test these new controllers *without* specifying a NUMA node, you can add the new controllers to the domain config and connect any kind of PCI device to them.

For example, on a 440fx-based machinetype, you could do this:

    ...
    <controller type='pci' index='1' model='pci-expander-bus'/>
    <interface type='network'>
      <source network='default'/>
      <address type='pci' bus='1'/>
    </interface>

In this case, bus 1 would have slots 0 - 0x1f available to plug in devices (the above device would go into slot 0)

For a Q35 machinetype, it's more complicated, since the pcie-expander-bus only provides a single slot, and that slot only accepts a pcie-root-port or pcie-switch-upstream-port. For a single device, youo could do this (assuming controllers with index=1 and index=2 already exist, but slot 4 of pcie-root is currently empty):


    <controller type='pci' index='3' model='pcie-expander-bus'>
      <address type='pci' bus='0' slot='4'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <address type='pci' bus='3'/>
    </controller>
    <interface type='network'>
      <source network='default'/>
      <address type='pci' bus='4'/>
    </interface>

(the interface would be plugged into slot 0 (the only slot) of bus 4 (the pcie-root-port controller).

For multiple devices on Q35, you would need to construct a pcie-switch out of upstream and downstream ports, like this:


    
    <controller type='pci' index='3' model='pcie-expander-bus'>
      <address type='pci' bus='0' slot='4'/>
    </controller>
    <controller type='pci' index='4' model='pcie-switch-upstream-port'>
      <address type='pci' bus='3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-switch-downstream-port'/>
    <controller type='pci' index='6' model='pcie-switch-downstream-port'/>
    <controller type='pci' index='7' model='pcie-switch-downstream-port'/>
    ...
    <interface type='network'>
      <source network='default'/>
      <address type='pci' bus='5'/>
    </interface>
    <interface type='network'>
      <source network='default'/>
      <address type='pci' bus='6'/>
    </interface>
    <interface type='network'>
      <source network='default'/>
      <address type='pci' bus='7'/>
    </interface>
    ...

(keep in mind that any type of device can be used in place of the <interface> devices in the examples).

To test the guest-visible numa placement of devices using these controllers (which is the real reason for their existence), you just need to add a <node> subelement to the <target> subelement of the pci[e]-expander-bus controller. It must be a node that is also listed in the domain's <cpu><numa> array. For example, in a 440fx domain:

  ...
  <cpu>
    <topology sockets='2' cores='4' threads='2'/>
    <numa>
      <cell cpus='0-7' memory='109550' unit='KiB'/>
      <cell cpus='8-15' memory='109550' unit='KiB'/>
    </numa>
  </cpu>
  ...
  <devices>
  ...
    <controller type='pci' index='1' model='pci-expander-bus'>
      <target>
        <node>1</node>
      </target>
    </controller>
    <interface type='hostdev'>
      <source>
        <address type='pci' bus='0x03' slot='0x10' function='0x1'/>
      </source>
      <address type='pci' bus='1'/>
    </interface>

The guest will be able to determine that the interface defined above is on numa node 1 (which is CPU cores 8-15). Again, the <target><node> subelement is placed *in the pci[e]-expander-bus controller*, not in the device(s) attached to it.

(NB: while the previous examples, connecting emulated network devices to the expander bus, are useful for verifying proper setup of the bus plumbing in a test, this final example is more useful in the real world, since it uses vfio device assignment to assign a host device (SRIOV VF) to the guest that presumably is really on a particular NUMA node on the host - this could have a significant effect on the guest's network performance.)

Comment 21 Luyao Huang 2016-07-26 10:11:22 UTC
Test with libvirt-2.0.0-3.el7.x86_64:

A) test interface with i440fx machine type

1. check host vf numa node info:
# virsh nodedev-dumpxml pci_0000_86_10_1
<device>
  <name>pci_0000_86_10_1</name>
  <path>/sys/devices/pci0000:80/0000:80:01.0/0000:86:10.1</path>
  <parent>pci_0000_80_01_0</parent>
  <driver>
    <name>ixgbevf</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>134</bus>
    <slot>16</slot>
    <function>1</function>
    <product id='0x10ed'>82599 Ethernet Controller Virtual Function</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='phys_function'>
      <address domain='0x0000' bus='0x86' slot='0x00' function='0x1'/>
    </capability>
    <iommuGroup number='83'>
      <address domain='0x0000' bus='0x86' slot='0x10' function='0x1'/>
    </iommuGroup>
    <numa node='1'/>
    <pci-express>
      <link validity='cap' port='0' width='0'/>
      <link validity='sta' width='0'/>
    </pci-express>
  </capability>
</device>


2. prepare a guest xml with network which use host vf:

<domain type='kvm'>
  <name>r7-lhuang</name>
  <uuid>09143880-044e-4a38-82cc-234c274c2c19</uuid>
  <maxMemory slots='16' unit='KiB'>25600000</maxMemory>
  <memory unit='KiB'>2621440</memory>
  <currentMemory unit='KiB'>2621440</currentMemory>
  <memtune>
    <hard_limit unit='KiB'>111111111111111</hard_limit>
  </memtune>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0-1'/>
    </hugepages>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>
 ...
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB' memAccess='shared'/>
      <cell id='1' cpus='2-3' memory='1048576' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
...
    <controller type='pci' index='1' model='pci-expander-bus'>
      <model name='pxb'/>
      <target busNr='254'>
        <node>1</node>
      </target>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>

...
    <interface type='network'>
      <mac address='52:54:00:fd:48:b3'/>
      <source network='default'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:20:51:94'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x09' function='0x0'/>
    </interface>

3. start guest and check qemu cmdline:

# virsh start r7-lhuang
Domain r7-lhuang started

# ps aux|grep  r7-lhuang
...
-device pxb,bus_nr=254,id=pci.1,numa_node=1,bus=pci.0,addr=0x3
...
-device vfio-pci,host=86:10.1,id=hostdev0,bus=pci.1,addr=0x9
...

4. login guest and check guest interface numa node info:

IN GUEST:

# lspci
...
fe:00.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
ff:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 20)
ff:09.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)

# cat /sys/devices/pci0000\:fe/0000\:fe\:00.0/0000\:ff\:09.0/numa_node 
1

# cat /sys/devices/pci0000\:fe/0000\:fe\:00.0/0000\:ff\:00.0/numa_node 
1


B)test interface with q35 machine type:

1. prepare a q35 machine
# virsh dumpxml r7-lhuang-q35
<domain type='kvm' id='34'>
  <name>r7-lhuang-q35</name>
  <uuid>336e084d-7dc1-4433-8887-ee1ca547dd3d</uuid>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <memtune>
    <hard_limit unit='KiB'>111111111111111</hard_limit>
  </memtune>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0-1'/>
    </hugepages>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-rhel7.3.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB' memAccess='shared'/>
      <cell id='1' cpus='2-3' memory='1048576' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
...
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='2'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='254'>
        <node>1</node>
      </target>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='ioh3420'/>
      <target chassis='4' port='0x0'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
 ...
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:20:51:94'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x1'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>
 ...

2. check qemu cmdline:

# ps aux|grep qemu
...
-device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0 -device pxb-pcie,bus_nr=254,id=pci.3,numa_node=1,bus=pcie.0,addr=0x4 
...
-device vfio-pci,host=86:10.1,id=hostdev0,bus=pci.4,addr=0x0
...

3. login guest and check interface numa node info:

IN GUEST:

# lspci
...
fe:00.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
ff:00.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)

# cat /sys/devices/pci0000\:fe/0000\:fe\:00.0/0000\:ff\:00.0/numa_node
1

Comment 22 Luyao Huang 2016-07-26 10:25:06 UTC
1st issue:

1. prepare a guest have a xml like this:

    <controller type='pci' index='1' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='1' busNr='254'>
        <node>1</node>
      </target>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='ide' index='0'>

2. start guest:

# virsh start r7-lhuang
Domain r7-lhuang started

3. check qemu cmdline:

# ps aux|grep qemu

-device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x3 

4. check active xml:

# virsh dumpxml r7-lhuang
...
    <controller type='pci' index='1' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='1' busNr='254'>
        <node>1</node>
      </target>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
...

Libvirt still show the numa node and busNr in active guest xml but didn't use it to build qemu cmd line for pci-bridge (this is a negative test case since pci-bridge won't use these element).

Comment 23 Luyao Huang 2016-07-26 10:47:43 UTC
2nd issue:

1. prepare a running guest with pci-expander-bus which bind to guest numa node:

# virsh dumpxml r7-lhuang
...
    <controller type='pci' index='1' model='pci-expander-bus'>
      <model name='pxb'/>
      <target chassisNr='1' busNr='254'>
        <node>1</node>
      </target>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
...

2. attach an interface on that controller:

# cat net.xml 
    <interface type='network'>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x03' function='0x0'/>
    </interface>

# virsh attach-device r7-lhuang net.xml 
Device attached successfully

3. Login guest but cannot find new interface, and this is guest kernel log:

# dmesg
...
[  593.170819] ACPI Error: 
[  593.170825] [^S18_.PCNT] Namespace lookup failure, AE_NOT_FOUND (20130517/psargs-359)
[  593.170906] ACPI Error: Method parse/execution failed [\_SB_.PCI0.PCNT] (Node ffff88003e1e78e8), AE_NOT_FOUND (20130517/psparse-536)
[  593.170915] ACPI Error: Method parse/execution failed [\_GPE._E01] (Node ffff88003e1e1528), AE_NOT_FOUND (20130517/psparse-536)
[  593.170922] ACPI Exception: AE_NOT_FOUND, while evaluating GPE method [_E01] (20130517/evgpe-579)

Comment 24 Luyao Huang 2016-07-26 11:01:18 UTC
Hi Laine,

I have hit 2 issues after doing some basic test, you can check comment 22 and comment 23. Would you please help to check out if the issue describe in comment 22 worth to be fixed ?

Thanks a lot for your help

Comment 25 Luyao Huang 2016-08-05 02:07:53 UTC
Test multiple devices on Q35:

1. prepare a guest which have multiple devices (i used root port under pxb since bug 1361172 )

# virsh dumpxml rhel7-lhuang
<domain type='kvm' id='28'>
  <name>rhel7-lhuang</name>
  <uuid>cc4c9cfb-2504-4a08-adec-1aeabc581010</uuid>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <memtune>
    <hard_limit unit='KiB'>111111111111111</hard_limit>
  </memtune>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0-1'/>
    </hugepages>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-rhel7.3.0'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB' memAccess='shared'/>
      <cell id='1' cpus='2-3' memory='1048576' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  ...
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='2'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='100'>
        <node>0</node>
      </target>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='ioh3420'/>
      <target chassis='4' port='0x0'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='5' model='pcie-switch-upstream-port'>
      <model name='x3130-upstream'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='6' model='pcie-switch-downstream-port'>
      <model name='xio3130-downstream'/>
      <target chassis='6' port='0x0'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='7' model='pcie-switch-downstream-port'>
      <model name='xio3130-downstream'/>
      <target chassis='7' port='0x1'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x01' function='0x0'/>
    </controller>
    <controller type='pci' index='8' model='pcie-switch-downstream-port'>
      <model name='xio3130-downstream'/>
      <target chassis='8' port='0x2'/>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x02' function='0x0'/>
    </controller>
 ...
    <interface type='network'>
      <mac address='52:54:00:fd:48:b3'/>
      <source network='default' bridge='virbr0' macTableManager='libvirt'/>
      <target dev='vnet0'/>
      <model type='rtl8139'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </interface>
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:20:51:94'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x1'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>
 ...

2. start guest and login guest:

# virsh start rhel7-lhuang
Domain rhel7-lhuang started

3. check guest interface numa node (however the rtl8139 nic cannot get ip because bug 1362340):

# lspci 
...
64:00.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
65:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Upstream) (rev 02)
66:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
66:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
66:02.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
67:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 20)
68:00.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
69:00.0 SCSI storage controller: Red Hat, Inc Virtio block device

# cat /sys/devices/pci0000\:64/0000\:64\:00.0/0000\:65\:00.0/0000\:66\:01.0/0000\:68\:00.0/numa_node
0

# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp103s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:fd:48:b3 brd ff:ff:ff:ff:ff:ff
3: enp104s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:20:51:94 brd ff:ff:ff:ff:ff:ff
    inet 10.66.4.175/22 brd 10.66.7.255 scope global dynamic enp104s0
       valid_lft 86274sec preferred_lft 86274sec
    inet6 fe80::5054:ff:fe20:5194/64 scope link 
       valid_lft forever preferred_lft forever

4. check qemu cmdline:

# ps aux|grep qemu
...-device pxb-pcie,bus_nr=100,id=pci.3,numa_node=0,bus=pcie.0,addr=0x4...
-device ioh3420,port=0x0,chassis=4,id=pci.4,bus=pci.3,addr=0x0 -device x3130-upstream,id=pci.5,bus=pci.4,addr=0x0 -device xio3130-downstream,port=0x0,chassis=6,id=pci.6,bus=pci.5,addr=0x0 -device xio3130-downstream,port=0x1,chassis=7,id=pci.7,bus=pci.5,addr=0x1 -device xio3130-downstream,port=0x2,chassis=8,id=pci.8,bus=pci.5,addr=0x2
...-device vfio-pci,host=03:10.1,id=hostdev0,bus=pci.7,addr=0x0

Comment 26 Laine Stump 2016-08-15 14:28:42 UTC
(In reply to Luyao Huang from comment #24)

> I have hit 2 issues after doing some basic test, you can check comment 22
> and comment 23. Would you please help to check out if the issue describe in
> comment 22 worth to be fixed ?


Comment 23 is something that should be brought up with Marcel, who wrote the pxb support in qemu. Comment 22 is just validation. You can file a bug for it, but it isn't anything urgent.

Comment 27 Luyao Huang 2016-08-16 02:28:07 UTC
(In reply to Laine Stump from comment #26)
> (In reply to Luyao Huang from comment #24)
> 
> > I have hit 2 issues after doing some basic test, you can check comment 22
> > and comment 23. Would you please help to check out if the issue describe in
> > comment 22 worth to be fixed ?
> 
> 
> Comment 23 is something that should be brought up with Marcel, who wrote the
> pxb support in qemu. Comment 22 is just validation. You can file a bug for
> it, but it isn't anything urgent.

Thanks for your reply and i have confirmed qemu not support hot-plug a device to pxb controller for the problem in comment 23. For comment 22, i have created a new bug 1367238.

And verify this problem with comment 21 and comment 25

Comment 29 errata-xmlrpc 2016-11-03 18:09:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html


Note You need to log in before you can comment on or make changes to this bug.