Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1405036

Summary:	A vhost port is being added to numa node inconsistently
Product:	Red Hat Enterprise Linux 7	Reporter:	Jean-Tsung Hsiao <jhsiao>
Component:	openvswitch	Assignee:	Kevin Traynor <ktraynor>
Status:	CLOSED NOTABUG	QA Contact:	ovs-qe
Severity:	high	Docs Contact:
Priority:	medium
Version:	7.3	CC:	aconole, ailan, atelang, atragler, berrange, ctrautma, fbaudin, fherrman, fleitner, jhsiao, ktraynor, kzhang, osabart, pagupta, rcain, rkhan
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-07 20:44:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jean-Tsung Hsiao 2016-12-15 12:36:22 UTC

Description of problem: A vhost port is being added to numa node inconsistently --- between numa node 1 and numa node 0

Please see the following messages from the daemon log:

2016-12-15T06:36:04.717Z|00025|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhost0' has been added on numa node 1
2016-12-15T06:36:04.717Z|00027|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhost1' has been added on numa node 1

2016-12-15T08:14:06.637Z|00011|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhost0' has been added on numa node 0
2016-12-15T08:14:06.637Z|00012|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhost1' has been added on numa node 0
2016-12-15T08:14:06.870Z|00115|dpif_netdev|WARN|Cannot create pmd threads due to out of unpinned cores on numa node 0
2016-12-15T08:14:06.870Z|00116|dpif_netdev|WARN|Cannot create pmd threads due to out of unpinned cores on numa node 0
2016-12-15T08:14:06.872Z|00117|dpif_netdev|INFO|Created 4 pmd threads on numa node 1
2016-12-15T08:14:06.872Z|00118|dpif_netdev|WARN|There's no available pmd thread on numa node 0
2016-12-15T08:14:06.872Z|00119|dpif_netdev|WARN|There's no available pmd thread on numa node 0

Version-Release number of selected component (if applicable):
[root@netqe5 dpdk-multique-scripts]# rpm -qa | grep openvswitch
openvswitch-2.6.1-2.git20161206.el7fdb.x86_64
[root@netqe5 dpdk-multique-scripts]# rpm -qa | grep dpdk
kernel-kernel-networking-ovs-dpdk-vhostuser-1.0-6.noarch
dpdk-tools-16.11-2.el7fdb.x86_64
dpdk-16.11-2.el7fdb.x86_64
[root@netqe5 dpdk-multique-scripts]# uname -a
Linux netqe5.knqe.lab.eng.bos.redhat.com 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

How reproducible: Reproducible


Steps to Reproduce:
1. On netqe5 config OVS-dpdk bridge using /home/jhsiao/dpdk-multique-scripts/config.sh
2. virsh start mq-vhu-4
3. monitor the daemon log to see if both vhost0 and vhost1 are being added to numa node 1

Actual results:
Sometimes, both ports are being added to numa node 0.

Expected results:
Both ports should be added to numa node 1 all the time.

Additional info:

Comment 6 Jean-Tsung Hsiao 2016-12-15 23:43:34 UTC

### Please note that without this issue the Mpps rate from Xena to vhostuser 4Q testpmd is a perfect 14.88 Mpps.

[root@netqe5 XenaScripts]# python multiple_streams 1000000 64 32 60
rate_fration =  1000000
packet_length 64
num_of_streams =  32
test_duration =  60
INFO:root:XenaSocket: Connected
INFO:root:XenaManager: Logged succefully
INFO:root:XenaPort: 1/0 starting traffic
INFO:root:XenaPort: 1/0 stopping traffic
Average: 14880322.00 pps
[root@netqe5 XenaScripts]# 

### And, the queue/core alignment is perfect.

[root@netqe5 dpdk-multique-scripts]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 1 core_id 21:
	isolated : false
	port: vhost0	queue-id: 0
	port: vhost1	queue-id: 0
	port: dpdk0	queue-id: 0
	port: dpdk1	queue-id: 0
pmd thread numa_id 1 core_id 17:
	isolated : false
	port: vhost0	queue-id: 1
	port: vhost1	queue-id: 1
	port: dpdk0	queue-id: 1
	port: dpdk1	queue-id: 1
pmd thread numa_id 1 core_id 19:
	isolated : false
	port: vhost0	queue-id: 2
	port: vhost1	queue-id: 2
	port: dpdk0	queue-id: 2
	port: dpdk1	queue-id: 2
pmd thread numa_id 1 core_id 23:
	isolated : false
	port: vhost0	queue-id: 3
	port: vhost1	queue-id: 3
	port: dpdk0	queue-id: 3
	port: dpdk1	queue-id: 3

Comment 13 Kevin Traynor 2016-12-16 13:34:43 UTC

Your libvirt config seems to pin the VM to cores across different NUMA nodes. 

  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='13'/>
    <vcpupin vcpu='4' cpuset='15'/>
  </cputune>

I suspect the reason for inconsistent NUMA placement of vhost ports is that depending on which cores testpmd uses in the VM, it means the vhost ports will be on different NUMA nodes.

Please change your libvirt config to pin the VM to cores on NUMA 1 only and let me know if the vhost ports are always on NUMA 1 then. thanks.

Comment 14 Jean-Tsung Hsiao 2016-12-16 14:21:12 UTC

(In reply to Kevin Traynor from comment #13)
> Your libvirt config seems to pin the VM to cores across different NUMA
> nodes. 
> 
>   <cputune>
>     <vcpupin vcpu='0' cpuset='0'/>
>     <vcpupin vcpu='1' cpuset='1'/>
>     <vcpupin vcpu='2' cpuset='3'/>
>     <vcpupin vcpu='3' cpuset='13'/>
>     <vcpupin vcpu='4' cpuset='15'/>
>   </cputune>
> 
> I suspect the reason for inconsistent NUMA placement of vhost ports is that
> depending on which cores testpmd uses in the VM, it means the vhost ports
> will be on different NUMA nodes.

I don't think that's the case. Once I started the guest, after a few seconds, the issue happened as reported by the daemon log.

> 
> Please change your libvirt config to pin the VM to cores on NUMA 1 only and
> let me know if the vhost ports are always on NUMA 1 then. thanks.

Comment 15 Jean-Tsung Hsiao 2016-12-16 14:58:41 UTC

Hi Kevin,

I tried your suggestion, but the same issue still exists.

As you can see from below, both vhost0 and vhost1 were being added on numa node 0.

Thanks!

Jean
=========================================
  <cputune>
    <vcpupin vcpu='0' cpuset='5'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='13'/>
    <vcpupin vcpu='4' cpuset='15'/>
  </cputune>

2016-12-16T14:47:02.044Z|00011|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhost0' has been added on numa node 0
2016-12-16T14:47:02.045Z|00012|dpdk(vhost_thread1)|INFO|vHost Device '/var/run/openvswitch/vhost1' has been added on numa node 0

Comment 16 Kevin Traynor 2016-12-16 15:19:01 UTC

(In reply to Jean-Tsung Hsiao from comment #15)
> Hi Kevin,
> 
> I tried your suggestion, but the same issue still exists.

Thanks for trying, comment about config below. fyi, the kernel binds to the ports on boot up, which is what would explain you see any messages before testpmd is run.

> 
> As you can see from below, both vhost0 and vhost1 were being added on numa
> node 0.
> 
> Thanks!
> 
> Jean
> =========================================
>   <cputune>
>     <vcpupin vcpu='0' cpuset='5'/>
>     <vcpupin vcpu='1' cpuset='1'/>
>     <vcpupin vcpu='2' cpuset='3'/>
>     <vcpupin vcpu='3' cpuset='13'/>
>     <vcpupin vcpu='4' cpuset='15'/>
>   </cputune>

Typically cores on a 12 core, 2 socket system with HT cores are laid out like this: 
0-11: NUMA 0
12-23: NUMA 1
24-35: NUMA 0
36-47: NUMA 1

So I think the config is still using NUMA 0 cores. If you change to only use the 12-23 or 36-47 range they should be all be NUMA 1.

Kevin.

> 
> 2016-12-16T14:47:02.044Z|00011|dpdk(vhost_thread1)|INFO|vHost Device
> '/var/run/openvswitch/vhost0' has been added on numa node 0
> 2016-12-16T14:47:02.045Z|00012|dpdk(vhost_thread1)|INFO|vHost Device
> '/var/run/openvswitch/vhost1' has been added on numa node 0

Comment 17 Jean-Tsung Hsiao 2016-12-16 15:38:13 UTC

(In reply to Kevin Traynor from comment #16)
> (In reply to Jean-Tsung Hsiao from comment #15)
> > Hi Kevin,
> > 
> > I tried your suggestion, but the same issue still exists.
> 
> Thanks for trying, comment about config below. fyi, the kernel binds to the
> ports on boot up, which is what would explain you see any messages before
> testpmd is run.
> 
> > 
> > As you can see from below, both vhost0 and vhost1 were being added on numa
> > node 0.
> > 
> > Thanks!
> > 
> > Jean
> > =========================================
> >   <cputune>
> >     <vcpupin vcpu='0' cpuset='5'/>
> >     <vcpupin vcpu='1' cpuset='1'/>
> >     <vcpupin vcpu='2' cpuset='3'/>
> >     <vcpupin vcpu='3' cpuset='13'/>
> >     <vcpupin vcpu='4' cpuset='15'/>
> >   </cputune>
> 
> Typically cores on a 12 core, 2 socket system with HT cores are laid out
> like this: 
> 0-11: NUMA 0
> 12-23: NUMA 1
> 24-35: NUMA 0
> 36-47: NUMA 1
> 
No, this is not the layout of my test-bed. Each socket has only 6 cores/12 HTs.

Socket 0
0-12
2-14
4-16
6-18
8-20
10-22

Socket 1
1-13
3-15
5-17
7-19
9-21
11-23


> So I think the config is still using NUMA 0 cores. If you change to only use
> the 12-23 or 36-47 range they should be all be NUMA 1.
> 
> Kevin.
> 
> > 
> > 2016-12-16T14:47:02.044Z|00011|dpdk(vhost_thread1)|INFO|vHost Device
> > '/var/run/openvswitch/vhost0' has been added on numa node 0
> > 2016-12-16T14:47:02.045Z|00012|dpdk(vhost_thread1)|INFO|vHost Device
> > '/var/run/openvswitch/vhost1' has been added on numa node 0

Comment 18 Jean-Tsung Hsiao 2016-12-16 15:44:13 UTC

To prevent confusion, change "-" to "," from comment #17.
Socket 0
0,12
2,14
4,16
6,18
8,20
10,22

Socket 1
1,13
3,15
5,17
7,19
9,21
11,23

Comment 19 Jean-Tsung Hsiao 2016-12-21 15:19:15 UTC

While running OVS-dpdk bonding test I saw the same behavior. I need to allocate some even cores for vhost ports when NICs sit on numa node 1.

More interestingly, same config file produced different core-queue alignments.

*** Host netqe9 ***

[root@netqe9 dpdk-bond-ovs.2.6.1-dpdk-16.11]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 1 core_id 19:
	isolated : false
	port: vhost0	queue-id: 0
	port: dpdk0	queue-id: 0
	port: dpdk1	queue-id: 0
pmd thread numa_id 1 core_id 23:
	isolated : false
	port: vhost0	queue-id: 1
	port: dpdk0	queue-id: 1
	port: dpdk1	queue-id: 1
pmd thread numa_id 1 core_id 17:
	isolated : false
	port: vhost0	queue-id: 2
	port: dpdk0	queue-id: 2
	port: dpdk1	queue-id: 2
pmd thread numa_id 1 core_id 21:
	isolated : false
	port: vhost0	queue-id: 3
	port: dpdk0	queue-id: 3
	port: dpdk1	queue-id: 3

[root@netqe9 dpdk-bond-ovs.2.6.1-dpdk-16.11]# cat ovs_config_add_bond_dpdk0_dpdk1_vhost0_balance_tcp.sh
ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xaa0000
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,1"
sleep 5
systemctl restart openvswitch
sleep 5

ovs-vsctl --if-exists del-br ovsbr0
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xaa0154
ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
ovs-vsctl add-bond ovsbr0 dpdkbond dpdk0 dpdk1 "lacp=active" "bond-mode=balance-tcp" -- set Interface dpdk0 type=dpdk ofport_request=10 -- set Interface dpdk1 type=dpdk ofport_request=11
ovs-vsctl add-port ovsbr0 vhost0 \
    -- set interface vhost0 type=dpdkvhostuser ofport_request=20

ovs-vsctl --timeout 10 set Interface dpdk0 options:n_rxq=4
ovs-vsctl --timeout 10 set Interface dpdk1 options:n_rxq=4

chown qemu /var/run/openvswitch/vhost0
ll /var/run/openvswitch/vhost*

#ovs-ofctl del-flows ovsbr0
#ovs-ofctl add-flow ovsbr0 in_port=10,actions=output:20
#ovs-ofctl add-flow ovsbr0 in_port=20,actions=output:10
ovs-ofctl dump-flows ovsbr0

*** Host netqe10 ***

[root@netqe10 dpdk-bond-ovs.2.6.1-dpdk-16.11]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 6:
	isolated : false
	port: vhost0	queue-id: 0
pmd thread numa_id 1 core_id 17:
	isolated : false
	port: dpdk1	queue-id: 0
	port: dpdk0	queue-id: 0
pmd thread numa_id 1 core_id 21:
	isolated : false
	port: dpdk1	queue-id: 1
	port: dpdk0	queue-id: 1
pmd thread numa_id 0 core_id 4:
	isolated : false
	port: vhost0	queue-id: 1
pmd thread numa_id 0 core_id 8:
	isolated : false
	port: vhost0	queue-id: 2
pmd thread numa_id 0 core_id 2:
	isolated : false
	port: vhost0	queue-id: 3
pmd thread numa_id 1 core_id 19:
	isolated : false
	port: dpdk1	queue-id: 2
	port: dpdk0	queue-id: 2
pmd thread numa_id 1 core_id 23:
	isolated : false
	port: dpdk1	queue-id: 3
	port: dpdk0	queue-id: 3
[root@netqe10 dpdk-bond-ovs.2.6.1-dpdk-16.11]# 


[root@netqe10 dpdk-bond-ovs.2.6.1-dpdk-16.11]# cat ovs_config_add_bond_dpdk0_dpdk1_vhost0_balance_tcp.sh
ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xaa0000
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,1"
sleep 5
systemctl restart openvswitch
sleep 5

ovs-vsctl --if-exists del-br ovsbr0
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xaa0154
ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
ovs-vsctl add-bond ovsbr0 dpdkbond dpdk0 dpdk1 "lacp=active" "bond-mode=balance-tcp" -- set Interface dpdk0 type=dpdk ofport_request=10 -- set Interface dpdk1 type=dpdk ofport_request=11
ovs-vsctl add-port ovsbr0 vhost0 \
    -- set interface vhost0 type=dpdkvhostuser ofport_request=20

ovs-vsctl --timeout 10 set Interface dpdk0 options:n_rxq=4
ovs-vsctl --timeout 10 set Interface dpdk1 options:n_rxq=4

chown qemu /var/run/openvswitch/vhost0
ll /var/run/openvswitch/vhost*

#ovs-ofctl del-flows ovsbr0
#ovs-ofctl add-flow ovsbr0 in_port=10,actions=output:20
#ovs-ofctl add-flow ovsbr0 in_port=20,actions=output:10
ovs-ofctl dump-flows ovsbr0

Comment 20 Kevin Traynor 2017-01-05 15:55:36 UTC

I have run a similar test using qemu cmd line. I find if I do not pin qemu threads to numa 1, indeed vhost0 and vhost1 may appear on numa0 or numa1 as reported. 

When I taskset qemu (*at start time*) to numa 1, vhost0 and vhost1 always appear on numa 1. I tested this 10x times.
i.e. taskset -c 5,7,9,11 ./qemu-system-x86_64 <qemu_cmd_line_args>

taskset after qemu is run is not sufficient as the vhost devices are registered in OVS during vm boot up.


I have 2 suggestions to continue progress:

- Although it looks ok to me, maybe the libvirt/qemu config in the test is not sufficiently pinning all the threads to numa 1. Or at least not early enough. The libvirt commands around this are hard to follow, it would be good to get it checked from a libvirt expert.

- One check that won't rule out libvirt config, but may confirm it would be to run the test when vhost ports land on wrong numa node, then check last scheduled cpu for qemu threads.
i.e. top -H -p<qemu_pid> and show the last used cpu field for active threads. Jean, is this something you can run?

one small item I noticed was that the socket-mem in some of the configs is 4096,1. This should be something like 4096,4096 as in some of other the configs. I tried with 4096,1 and it didn't seem to have any impact on numa location of vhost devices, but best to keep it consistently 4096,4096, to rule out any side effects.

Comment 21 Christian Trautman 2017-01-05 19:28:53 UTC

I can confirm Kevins findings. My testbed does not use libvirt but was using QEMU command line. The test scripts would use taskset after the guest was booted to bind the qemu cpus to the correct cpus. I could reproduce this issue every time as my NIC was on numa 0, but my vhostuser ports were ending up on numa 1 CPUs even with a PMD mask set to use only numa 0 cpus. 

I modified the script to do taskset in the qemu cmdline startup and I no longer have the issue. The vhost user ports are correctly binding to numa 0 cpus.

Comment 22 Jean-Tsung Hsiao 2017-01-06 12:13:55 UTC

(In reply to Kevin Traynor from comment #20)
> I have run a similar test using qemu cmd line. I find if I do not pin qemu
> threads to numa 1, indeed vhost0 and vhost1 may appear on numa0 or numa1 as
> reported. 
> 
> When I taskset qemu (*at start time*) to numa 1, vhost0 and vhost1 always
> appear on numa 1. I tested this 10x times.
> i.e. taskset -c 5,7,9,11 ./qemu-system-x86_64 <qemu_cmd_lines>

So, this is a workaround if you run qemu manually. But, my case uses guest xml so this workaround does NOT apply to it.

> 
> taskset after qemu is run is not sufficient as the vhost devices are
> registered in OVS during vm boot up.
> 
> 
> I have 2 suggestions to continue progress:
> 
> - Although it looks ok to me, maybe the libvirt/qemu config in the test is
> not sufficiently pinning all the threads to numa 1. Or at least not early
> enough. The libvirt commands around this are hard to follow, it would be
> good to get it checked from a libvirt expert. 
> 
> - One check that won't rule out libvirt config, but may confirm it would be
> to run the test when vhost ports land on wrong numa node, then check last
> scheduled cpu for qemu threads.
> i.e. top -H -p<qemu_pid> and show the last used cpu field for active
> threads. Jean, is this something you can run?
> 

Ok, I'll try today.

> one small item I noticed was that the socket-mem in some of the configs is
> 4096,1. This should be something like 4096,4096 as in some of other the
> configs. I tried with 4096,1 and it didn't seem to have any impact on numa
> location of vhost devices, but best to keep it consistently 4096,4096, to
> rule out any side effects.

NOTE: I have been using "4096,1" for all OVS-dpdk testing since my ixgbe NIC is on numa #1. So, this is NOT an issue.

Comment 23 Kevin Traynor 2017-01-06 17:19:01 UTC

> > - One check that won't rule out libvirt config, but may confirm it would be
> > to run the test when vhost ports land on wrong numa node, then check last
> > scheduled cpu for qemu threads.
> > i.e. top -H -p<qemu_pid> and show the last used cpu field for active
> > threads. Jean, is this something you can run?
> > 
> 
> Ok, I'll try today.
> 

Jean and I tested this today and we saw that the vcpus were being pinned to the correct cores but the emulator threads were not being pinned.

Later on, I edited the libvirt xml to pin the emulator as well. I've tested this and the numa node info for vhost ports is consistent with the emulator pinning.

  <cputune>
    <vcpupin vcpu='0' cpuset='5'/>
    <vcpupin vcpu='1' cpuset='7'/>
    <vcpupin vcpu='2' cpuset='9'/>
    <vcpupin vcpu='3' cpuset='11'/>
    <emulatorpin cpuset='13'/>
  </cputune>

Comment 24 Kevin Traynor 2017-01-10 10:03:13 UTC

OVS-DPDK will only poll for rx pkts from a DPDK port with a PMD thread
that is on the same numa node. This is true for both physical and virtual NICs and is done to avoid cross-numa performance issues. If the user has not permitted any PMDs to run on the matching numa node, then OVS will report an error and not poll that DPDK port.

For OVS 2.5, the dpdk vhost ports are associated with the numa node that the dpdk master lcore runs on (from the -c vswitchd cmd line args). 

For OVS 2.6, the dpdk vhost ports are associated with the numa node the virtqueue memory has been allocated on by the emulator.  

In this bz, what was observed was that OVS 2.6 vhost ports were being associated with different numa nodes on different trials. This was due to Linux scheduling the emulator across 2 numa nodes. What was then observed was that if there are no PMDs associated with the selected numa node, the vhost ports would not be polled (which is expected).

To solve this, taskset can be used for qemu or emulatorpin for libvirt. e.g 
taskset 3,5,7,9,11,13 qemu-kvm <qemu_args>
or
  <cputune>
    <vcpupin vcpu='0' cpuset='5'/>
    <vcpupin vcpu='1' cpuset='7'/>
    <vcpupin vcpu='2' cpuset='9'/>
    <vcpupin vcpu='3' cpuset='11'/>
    <emulatorpin cpuset='13'/>
  </cputune>

Comment 25 Jean-Tsung Hsiao 2017-01-12 11:00:28 UTC

 
(In reply to Kevin Traynor from comment #24)


> or
>   <cputune>
>     <vcpupin vcpu='0' cpuset='5'/>
>     <vcpupin vcpu='1' cpuset='7'/>
>     <vcpupin vcpu='2' cpuset='9'/>
>     <vcpupin vcpu='3' cpuset='11'/>
>     <emulatorpin cpuset='13'/>
>   </cputune>

With this change got up to 14.88 Mpps  one way from Xena to testpmd at vhostuser.

*** Guest xml ***
[root@netqe5 dpdk-multique-scripts]# virsh dumpxml mq-vhu-4
<domain type='kvm' id='14'>
  <name>mq-vhu-4</name>
  <uuid>e6ddf28c-3af9-43ee-a9ac-13ee5c2cf39d</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>5</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='13'/>
    <vcpupin vcpu='4' cpuset='15'/>
    <emulatorpin cpuset='13'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Haswell-noTSX</model>
    <numa>
      <cell id='0' cpus='0' memory='4194304' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/mnt/test/vhostuser/mq-vhu.img'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='vhostuser'>
      <mac address='52:54:00:7e:c4:1c'/>
      <source type='unix' path='/var/run/openvswitch/vhost0' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost' queues='4'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='52:54:00:83:fd:6b'/>
      <source type='unix' path='/var/run/openvswitch/vhost1' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost' queues='4'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:3b:d1:3a'/>
      <source bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/2'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-14-mq-vhu-4/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <video>
      <model type='cirrus' vram='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c479,c694</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c479,c694</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>

*** OVS-dpdk config file ***
 
ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xaa0000
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,1"
sleep 5
systemctl restart openvswitch
sleep 5

# config ovs-dpdk bridge with dpdk0, dpdk1, vhost0 and vhost1
ovs-vsctl --if-exists del-br ovsbr0
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xaa0000
ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
ovs-vsctl add-port ovsbr0 dpdk0 \
    -- set interface dpdk0 type=dpdk ofport_request=10
ovs-vsctl add-port ovsbr0 dpdk1 \
    -- set interface dpdk1 type=dpdk ofport_request=11

ovs-vsctl add-port ovsbr0 vhost0 \
    -- set interface vhost0 type=dpdkvhostuser ofport_request=20
ovs-vsctl add-port ovsbr0 vhost1 \
    -- set interface vhost1 type=dpdkvhostuser ofport_request=21

ovs-vsctl --timeout 10 set Interface dpdk0 options:n_rxq=4
ovs-vsctl --timeout 10 set Interface dpdk1 options:n_rxq=4

chown qemu /var/run/openvswitch/vhost0
chown qemu /var/run/openvswitch/vhost1
ls -l /var/run/openvswitch/vhost*

ovs-ofctl del-flows ovsbr0
ovs-ofctl add-flow ovsbr0 in_port=10,actions=output:20
ovs-ofctl add-flow ovsbr0 in_port=21,actions=output:11
ovs-ofctl dump-flows ovsbr0
[root@netqe5 dpdk-multique-scripts]#

*** queue-core alignment ***

[root@netqe5 dpdk-multique-scripts]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 1 core_id 17:
    isolated : false
    port: vhost0    queue-id: 0
    port: vhost1    queue-id: 0
    port: dpdk0    queue-id: 0
    port: dpdk1    queue-id: 0
pmd thread numa_id 1 core_id 19:
    isolated : false
    port: vhost0    queue-id: 1
    port: vhost1    queue-id: 1
    port: dpdk0    queue-id: 1
    port: dpdk1    queue-id: 1
pmd thread numa_id 1 core_id 21:
    isolated : false
    port: vhost0    queue-id: 2
    port: vhost1    queue-id: 2
    port: dpdk0    queue-id: 2
    port: dpdk1    queue-id: 2
pmd thread numa_id 1 core_id 23:
    isolated : false
    port: vhost0    queue-id: 3
    port: vhost1    queue-id: 3
    port: dpdk0    queue-id: 3
    port: dpdk1    queue-id: 3
[root@netqe5 dpdk-multique-scripts]#


*** One way Xena to testpmd/vhostuser Mpps throughput ***

Test 1
rate_fration =  1000000
packet_length 64
num_of_streams =  32
test_duration =  60
INFO:root:XenaSocket: Connected
INFO:root:XenaManager: Logged succefully
INFO:root:XenaPort: 1/0 starting traffic
INFO:root:XenaPort: 1/0 stopping traffic
Average: 14773626.00 pps
Test 2
rate_fration =  1000000
packet_length 64
num_of_streams =  32
test_duration =  60
INFO:root:XenaSocket: Connected
INFO:root:XenaManager: Logged succefully
INFO:root:XenaPort: 1/0 starting traffic
INFO:root:XenaPort: 1/0 stopping traffic
Average: 14738007.00 pps
Test 3
rate_fration =  1000000
packet_length 64
num_of_streams =  32
test_duration =  60
INFO:root:XenaSocket: Connected
INFO:root:XenaManager: Logged succefully
INFO:root:XenaPort: 1/0 starting traffic
INFO:root:XenaPort: 1/0 stopping traffic
Average: 14880375.00 pps
Test 4
rate_fration =  1000000
packet_length 64
num_of_streams =  32
test_duration =  60
INFO:root:XenaSocket: Connected
INFO:root:XenaManager: Logged succefully
INFO:root:XenaPort: 1/0 starting traffic
INFO:root:XenaPort: 1/0 stopping traffic
Average: 14880254.00 pps
Test 5
rate_fration =  1000000
packet_length 64
num_of_streams =  32
test_duration =  60
INFO:root:XenaSocket: Connected
INFO:root:XenaManager: Logged succefully
INFO:root:XenaPort: 1/0 starting traffic
INFO:root:XenaPort: 1/0 stopping traffic
Average: 14843679.00 pps

Comment 26 Amnon Ilan 2017-01-26 23:59:35 UTC

(In reply to Kevin Traynor from comment #24)

>   <cputune>
>     <vcpupin vcpu='0' cpuset='5'/>
>     <vcpupin vcpu='1' cpuset='7'/>
>     <vcpupin vcpu='2' cpuset='9'/>
>     <vcpupin vcpu='3' cpuset='11'/>
>     <emulatorpin cpuset='13'/>
>   </cputune>

Should an upper layer bz (libvirt/Nova) be opened for that?

Comment 27 Kevin Traynor 2017-02-10 17:25:16 UTC

Hi Amnon, 

I'm not sure what they currently do wrt NUMA. I assume they would have some general purpose cores on each NUMA node, so it would be a case of using emulatorpin/taskset/numactrl to ensure all the qemu threads are on the same NUMA node, if they don't already do that. 

Feel free to share comment 24, or contact me if someone needs a summary of the findings. 

Kevin.