RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1295952 - Some cpumasks generate error with ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask
Summary: Some cpumasks generate error with ovs-vsctl set Open_vSwitch . other_config:p...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch-dpdk
Version: 7.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Eelco Chaudron
QA Contact: Jean-Tsung Hsiao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-05 21:35 UTC by Andrew Theurer
Modified: 2016-12-19 10:41 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-19 10:41:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andrew Theurer 2016-01-05 21:35:45 UTC
Description of problem:

DPDK poll-mode threads can be created with the following command:

ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<cpumask in hex)

Some valid cpumasks work, but some do not.
Version-Release number of selected component (if applicable):

How reproducible:
Easily, however, some systems don't experience this problem.  I can reproduce this on a 2-node system where a DPDK port used by ovs is on node1.  When the port is from node0, the problem does not appear to happen.

Steps to Reproduce:
1. create a ovs-dpdk bridge with a port containing a port type "DPDK", which corresponds to a 10Gb port (bound to vfio-pci module)
   typically we would create two bridges, each with a dpdk port (10Gb) and a dpdkvhostuser port).  This would allow us to create 4 DPDK threads, one per port.

2. run the command: ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=AA0 (this example is for 4 threads on cpus 5,7,9,11, which happen to be on node1)
   The CPU mask should contain cpus only in node0 or node1, but not both  

Actual results:
from ovs log: 00088|dpif_netdev|ERR|Cannot create pmd threads due to out of unpinned cores on numa node
A current work-around is to use a cpumask that does not cause an error, like "F", then use taskset to re-place the new threads.

Expected results:
The number of threads and their placement should be implemented.  There is no lack of "unpinned cores".


Additional info:

using openvswitch-dpdk-debuginfo-2.4.0-0.10346.git97bab959.1.el7.x86_64

Comment 2 Flavio Leitner 2016-01-28 13:43:20 UTC
Hi Andrew,

The current code forces the PMD thread to be in the same NUMA node as the port. So, if you have a port in node 0, you need to allow at least one CPU otherwise it will complain about the lack of unpinned cores.

If you have one port in each node, you will need to enable at least one core in each node as a consequence.

The idea is to avoid running the PMD threads on a remote node which would affect the performance significantly.

Please let me know your thoughts.

Thanks,
fbl

Comment 3 Andrew Theurer 2016-04-15 15:58:38 UTC
Flavio, in my case all of the physical ports were in node1, and all of the cpus in the mask were in node1, but ovs still rejected the cpumask.  I'll try again on latest code to confirm this.

Comment 4 Andrew Theurer 2016-08-25 16:38:58 UTC
Bill M, can you share you recent experience with this?

Comment 5 Flavio Leitner 2016-09-29 14:47:49 UTC
Bill M, can you share you recent experience with this?

Comment 6 bmichalo 2016-09-29 15:11:50 UTC
(In reply to Flavio Leitner from comment #5)
> Bill M, can you share you recent experience with this?

With:

Kernel version:
Linux perf84.perf.lab.eng.bos.redhat.com 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

OVS version:
ovs-vsctl (Open vSwitch) 2.5.90
Compiled Mar 15 2016 12:20:02
DB Schema 7.12.1

DPDK version:
16.07

I am not seeing this issue with a simple two port-bare metal-OVS test.  I was able to use a cpumask=A and create PMD threads upon cpus 1 and 3 (associated with NUMA node 1) and successfully forward packets.  

ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=A

There is not an error within the CPU log.  Historically, I would have needed to use a cpumask=3 and then migrate the one via 'taskset' from CPU 0 to CPU 3

Comment 8 Flavio Leitner 2016-10-05 15:00:08 UTC
Thanks Bill for verifying it.
Since we are not updating 2.4 anymore and that 2.5 is fine, I am closing this based on comment#6.

Comment 9 bmichalo 2016-10-13 15:15:53 UTC


* Bare metal testing seems immune to the problem.  The PMD threads are successfully created all on a single NUMA node are behave accordingly

* However, as soon as you introduce a VM such as for a {(P,V),(V,P)} test, the 'unpinned cores' issue manifests itself again

Both cases seem to be 100% repeatable.

Comment 10 bmichalo 2016-10-13 17:28:10 UTC
(In reply to bmichalo from comment #9)
> 
> 
> * Bare metal testing seems immune to the problem.  The PMD threads are
> successfully created all on a single NUMA node are behave accordingly
> 
> * However, as soon as you introduce a VM such as for a {(P,V),(V,P)} test,
> the 'unpinned cores' issue manifests itself again
> 
> Both cases seem to be 100% repeatable.

With a CPU mask of AA during a {(P1,V1),(V2,P2)} test, four PMD threads should be created upon NUMA node 0.  In the end this occurs, however even though the threads are there (htop one can see CPUs 1,3,5, and 7 at 100%) they fail to move data.  For a unidirectional test, packets will succeed:

P1-->V1--->V2

but then will be dropped at V2.  If you use testpmd at the guest forwarding engine, you will see the dropped packets.

Also, even with a cpu mask AA, there is an attempt to put the PMD threads upon NUMA node 0.  

Here's a summary of the ovs-vswitchd.log:

2016-10-13T17:16:09.453Z|00024|dpif_netdev|INFO|Created 1 pmd threads on numa node 1
2016-10-13T17:16:09.590Z|00069|dpif_netdev|INFO|Created 1 pmd threads on numa node 0
2016-10-13T17:16:10.720Z|00134|dpif_netdev|INFO|Created 4 pmd threads on numa node 1                                                                     
2016-10-13T17:16:10.720Z|00135|dpif_netdev|ERR|Cannot create pmd threads due to out of unpinned cores on numa node
2016-10-13T17:16:10.720Z|00136|dpif_netdev|ERR|Cannot create pmd threads due to out of unpinned cores on numa node


Here is the entire ovs-vswitchd.log:

cat /usr/local/var/log/openvswitch/ovs-vswitchd.log 
2016-10-13T17:16:08.175Z|00002|vlog|INFO|opened log file /usr/local/var/log/openvswitch/ovs-vswitchd.log
2016-10-13T17:16:08.175Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0
2016-10-13T17:16:08.175Z|00004|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1
2016-10-13T17:16:08.176Z|00005|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores
2016-10-13T17:16:08.176Z|00006|memory|INFO|3860 kB peak resident set size after 12.2 seconds
2016-10-13T17:16:08.176Z|00007|reconnect|INFO|unix:/usr/local/var/run/openvswitch/db.sock: connecting...
2016-10-13T17:16:08.176Z|00008|reconnect|INFO|unix:/usr/local/var/run/openvswitch/db.sock: connected
2016-10-13T17:16:08.194Z|00009|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation
2016-10-13T17:16:08.194Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: MPLS label stack length probed as 3
2016-10-13T17:16:08.194Z|00011|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports unique flow ids
2016-10-13T17:16:08.194Z|00012|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_state
2016-10-13T17:16:08.194Z|00013|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_zone
2016-10-13T17:16:08.194Z|00014|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_mark
2016-10-13T17:16:08.194Z|00015|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_label
2016-10-13T17:16:08.194Z|00016|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_state_nat
2016-10-13T17:16:08.194Z|00017|bridge|INFO|bridge ovsbr0: added interface ovsbr0 on port 65534
2016-10-13T17:16:08.198Z|00018|dpif_netlink|ERR|Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded.
2016-10-13T17:16:08.198Z|00019|bridge|INFO|bridge ovsbr0: using datapath ID 0000c6631ac3fd48
2016-10-13T17:16:08.199Z|00020|connmgr|INFO|ovsbr0: added service controller "punix:/usr/local/var/run/openvswitch/ovsbr0.mgmt"
2016-10-13T17:16:08.200Z|00021|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.5.90
2016-10-13T17:16:08.826Z|00022|dpdk|INFO|Port 0: 90:e2:ba:8a:20:28
2016-10-13T17:16:09.453Z|00023|dpdk|INFO|Port 0: 90:e2:ba:8a:20:28
2016-10-13T17:16:09.453Z|00024|dpif_netdev|INFO|Created 1 pmd threads on numa node 1
2016-10-13T17:16:09.453Z|00025|bridge|INFO|bridge ovsbr0: added interface dpdk0 on port 1
2016-10-13T17:16:09.453Z|00026|bridge|INFO|bridge ovsbr0: using datapath ID 000090e2ba8a2028
2016-10-13T17:16:09.453Z|00027|timeval|WARN|Unreasonably long 1250ms poll interval (1245ms user, 1ms system)
2016-10-13T17:16:09.453Z|00028|timeval|WARN|faults: 26 minor, 0 major
2016-10-13T17:16:09.454Z|00029|timeval|WARN|disk: 0 reads, 16 writes
2016-10-13T17:16:09.454Z|00030|timeval|WARN|context switches: 64 voluntary, 81 involuntary
2016-10-13T17:16:09.454Z|00031|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=4ce6b97e:
2016-10-13T17:16:09.454Z|00032|coverage|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:09.454Z|00033|coverage|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0000/sec   total: 1
2016-10-13T17:16:09.454Z|00034|coverage|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:09.454Z|00035|coverage|INFO|rev_reconfigure            0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:09.454Z|00036|coverage|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0000/sec   total: 1
2016-10-13T17:16:09.454Z|00037|coverage|INFO|cmap_expand                0.0/sec     0.000/sec        0.0000/sec   total: 6
2016-10-13T17:16:09.454Z|00038|coverage|INFO|cmap_shrink                0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:09.454Z|00039|coverage|INFO|dpif_port_add              0.0/sec     0.000/sec        0.0000/sec   total: 2
2016-10-13T17:16:09.455Z|00040|coverage|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0000/sec   total: 2
2016-10-13T17:16:09.455Z|00041|coverage|INFO|dpif_flow_get              0.0/sec     0.000/sec        0.0000/sec   total: 5
2016-10-13T17:16:09.455Z|00042|coverage|INFO|dpif_flow_put              0.0/sec     0.000/sec        0.0000/sec   total: 10
2016-10-13T17:16:09.455Z|00043|coverage|INFO|dpif_flow_del              0.0/sec     0.000/sec        0.0000/sec   total: 5
2016-10-13T17:16:09.455Z|00044|coverage|INFO|dpif_execute               0.0/sec     0.000/sec        0.0000/sec   total: 2
2016-10-13T17:16:09.456Z|00045|coverage|INFO|flow_extract               0.0/sec     0.000/sec        0.0000/sec   total: 1
2016-10-13T17:16:09.456Z|00046|coverage|INFO|miniflow_malloc            0.0/sec     0.000/sec        0.0000/sec   total: 57
2016-10-13T17:16:09.456Z|00047|coverage|INFO|hmap_pathological          0.0/sec     0.000/sec        0.0000/sec   total: 1
2016-10-13T17:16:09.456Z|00048|coverage|INFO|hmap_expand                0.0/sec     0.000/sec        0.0000/sec   total: 816
2016-10-13T17:16:09.456Z|00049|coverage|INFO|netdev_get_stats           0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:09.457Z|00050|coverage|INFO|txn_incomplete             0.0/sec     0.000/sec        0.0000/sec   total: 13
2016-10-13T17:16:09.457Z|00051|coverage|INFO|txn_success                0.0/sec     0.000/sec        0.0000/sec   total: 4
2016-10-13T17:16:09.457Z|00052|coverage|INFO|poll_create_node           0.0/sec     0.000/sec        0.0000/sec   total: 438
2016-10-13T17:16:09.457Z|00053|coverage|INFO|poll_zero_timeout          0.0/sec     0.000/sec        0.0000/sec   total: 4
2016-10-13T17:16:09.457Z|00054|coverage|INFO|seq_change                 0.0/sec     0.000/sec        0.0000/sec   total: 1810
2016-10-13T17:16:09.457Z|00055|coverage|INFO|pstream_open               0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:09.457Z|00056|coverage|INFO|stream_open                0.0/sec     0.000/sec        0.0000/sec   total: 1
2016-10-13T17:16:09.458Z|00057|coverage|INFO|util_xalloc                0.0/sec     0.000/sec        0.0000/sec   total: 16240
2016-10-13T17:16:09.458Z|00058|coverage|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0000/sec   total: 2
2016-10-13T17:16:09.458Z|00059|coverage|INFO|netdev_get_ifindex         0.0/sec     0.000/sec        0.0000/sec   total: 1
2016-10-13T17:16:09.458Z|00060|coverage|INFO|netdev_get_hwaddr          0.0/sec     0.000/sec        0.0000/sec   total: 7
2016-10-13T17:16:09.458Z|00061|coverage|INFO|netdev_set_hwaddr          0.0/sec     0.000/sec        0.0000/sec   total: 2
2016-10-13T17:16:09.458Z|00062|coverage|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:09.458Z|00063|coverage|INFO|netlink_received           0.0/sec     0.000/sec        0.0000/sec   total: 22
2016-10-13T17:16:09.458Z|00064|coverage|INFO|netlink_recv_jumbo         0.0/sec     0.000/sec        0.0000/sec   total: 2
2016-10-13T17:16:09.458Z|00065|coverage|INFO|netlink_sent               0.0/sec     0.000/sec        0.0000/sec   total: 8
2016-10-13T17:16:09.458Z|00066|coverage|INFO|nln_changed                0.0/sec     0.000/sec        0.0000/sec   total: 5
2016-10-13T17:16:09.458Z|00067|coverage|INFO|64 events never hit
2016-10-13T17:16:09.460Z|00068|dpdk|INFO|Socket /usr/local/var/run/openvswitch/vhost-user1 created for vhost-user port vhost-user1
2016-10-13T17:16:09.590Z|00069|dpif_netdev|INFO|Created 1 pmd threads on numa node 0
2016-10-13T17:16:09.590Z|00070|bridge|INFO|bridge ovsbr0: added interface vhost-user1 on port 2
2016-10-13T17:16:09.593Z|00071|connmgr|INFO|ovsbr0<->unix: 1 flow_mods in the last 0 s (1 deletes)
2016-10-13T17:16:09.595Z|00072|connmgr|INFO|ovsbr0<->unix: 1 flow_mods in the last 0 s (1 adds)
2016-10-13T17:16:09.597Z|00073|connmgr|INFO|ovsbr0<->unix: 1 flow_mods in the last 0 s (1 adds)
2016-10-13T17:16:09.604Z|00074|bridge|INFO|bridge ovsbr1: added interface ovsbr1 on port 65534
2016-10-13T17:16:09.604Z|00075|bridge|INFO|bridge ovsbr1: using datapath ID 000022bce6557d4d
2016-10-13T17:16:09.604Z|00076|connmgr|INFO|ovsbr1: added service controller "punix:/usr/local/var/run/openvswitch/ovsbr1.mgmt"
2016-10-13T17:16:09.609Z|00077|dpdk|INFO|Socket /usr/local/var/run/openvswitch/vhost-user2 created for vhost-user port vhost-user2
2016-10-13T17:16:09.609Z|00078|bridge|INFO|bridge ovsbr1: added interface vhost-user2 on port 1
2016-10-13T17:16:10.094Z|00079|dpdk|INFO|Port 1: 90:e2:ba:8a:20:29
2016-10-13T17:16:10.705Z|00080|dpdk|INFO|Port 1: 90:e2:ba:8a:20:29
2016-10-13T17:16:10.705Z|00081|bridge|INFO|bridge ovsbr1: added interface dpdk1 on port 2
2016-10-13T17:16:10.705Z|00082|bridge|INFO|bridge ovsbr1: using datapath ID 000090e2ba8a2029
2016-10-13T17:16:10.705Z|00083|timeval|WARN|Unreasonably long 1092ms poll interval (552ms user, 0ms system)
2016-10-13T17:16:10.705Z|00084|timeval|WARN|faults: 1 minor, 0 major
2016-10-13T17:16:10.705Z|00085|timeval|WARN|disk: 0 reads, 8 writes
2016-10-13T17:16:10.705Z|00086|timeval|WARN|context switches: 53 voluntary, 107 involuntary
2016-10-13T17:16:10.705Z|00087|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=5e4541bb:
2016-10-13T17:16:10.705Z|00088|coverage|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0000/sec   total: 9
2016-10-13T17:16:10.705Z|00089|coverage|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0000/sec   total: 2
2016-10-13T17:16:10.705Z|00090|coverage|INFO|ofproto_recv_openflow      0.0/sec     0.000/sec        0.0000/sec   total: 6
2016-10-13T17:16:10.705Z|00091|coverage|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0000/sec   total: 17
2016-10-13T17:16:10.705Z|00092|coverage|INFO|rev_reconfigure            0.0/sec     0.000/sec        0.0000/sec   total: 10
2016-10-13T17:16:10.705Z|00093|coverage|INFO|rev_port_toggled           0.0/sec     0.000/sec        0.0000/sec   total: 1
2016-10-13T17:16:10.705Z|00094|coverage|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0000/sec   total: 5
2016-10-13T17:16:10.705Z|00095|coverage|INFO|cmap_expand                0.0/sec     0.000/sec        0.0000/sec   total: 13
2016-10-13T17:16:10.705Z|00096|coverage|INFO|cmap_shrink                0.0/sec     0.000/sec        0.0000/sec   total: 9
2016-10-13T17:16:10.706Z|00097|coverage|INFO|dpif_port_add              0.0/sec     0.000/sec        0.0000/sec   total: 6
2016-10-13T17:16:10.706Z|00098|coverage|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:10.706Z|00099|coverage|INFO|dpif_flow_get              0.0/sec     0.000/sec        0.0000/sec   total: 5
2016-10-13T17:16:10.706Z|00100|coverage|INFO|dpif_flow_put              0.0/sec     0.000/sec        0.0000/sec   total: 10
2016-10-13T17:16:10.706Z|00101|coverage|INFO|dpif_flow_del              0.0/sec     0.000/sec        0.0000/sec   total: 5
2016-10-13T17:16:10.706Z|00102|coverage|INFO|dpif_execute               0.0/sec     0.000/sec        0.0000/sec   total: 2
2016-10-13T17:16:10.706Z|00103|coverage|INFO|flow_extract               0.0/sec     0.000/sec        0.0000/sec   total: 1
2016-10-13T17:16:10.706Z|00104|coverage|INFO|miniflow_malloc            0.0/sec     0.000/sec        0.0000/sec   total: 117
2016-10-13T17:16:10.706Z|00105|coverage|INFO|hmap_pathological          0.0/sec     0.000/sec        0.0000/sec   total: 17
2016-10-13T17:16:10.706Z|00106|coverage|INFO|hmap_expand                0.0/sec     0.000/sec        0.0000/sec   total: 1326
2016-10-13T17:16:10.706Z|00107|coverage|INFO|netdev_get_stats           0.0/sec     0.000/sec        0.0000/sec   total: 7
2016-10-13T17:16:10.706Z|00108|coverage|INFO|txn_unchanged              0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:10.707Z|00109|coverage|INFO|txn_incomplete             0.0/sec     0.000/sec        0.0000/sec   total: 24
2016-10-13T17:16:10.707Z|00110|coverage|INFO|txn_success                0.0/sec     0.000/sec        0.0000/sec   total: 8
2016-10-13T17:16:10.707Z|00111|coverage|INFO|poll_create_node           0.0/sec     0.000/sec        0.0000/sec   total: 2901
2016-10-13T17:16:10.707Z|00112|coverage|INFO|poll_zero_timeout          0.0/sec     0.000/sec        0.0000/sec   total: 14
2016-10-13T17:16:10.707Z|00113|coverage|INFO|rconn_queued               0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:10.707Z|00114|coverage|INFO|rconn_sent                 0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:10.707Z|00115|coverage|INFO|seq_change                 0.0/sec     0.000/sec        0.0000/sec   total: 66168
2016-10-13T17:16:10.707Z|00116|coverage|INFO|pstream_open               0.0/sec     0.000/sec        0.0000/sec   total: 5
2016-10-13T17:16:10.707Z|00117|coverage|INFO|stream_open                0.0/sec     0.000/sec        0.0000/sec   total: 1
2016-10-13T17:16:10.707Z|00118|coverage|INFO|util_xalloc                0.0/sec     0.000/sec        0.0000/sec   total: 27692
2016-10-13T17:16:10.707Z|00119|coverage|INFO|vconn_received             0.0/sec     0.000/sec        0.0000/sec   total: 9
2016-10-13T17:16:10.707Z|00120|coverage|INFO|vconn_sent                 0.0/sec     0.000/sec        0.0000/sec   total: 6
2016-10-13T17:16:10.707Z|00121|coverage|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0000/sec   total: 5
2016-10-13T17:16:10.708Z|00122|coverage|INFO|netdev_get_ifindex         0.0/sec     0.000/sec        0.0000/sec   total: 2
2016-10-13T17:16:10.708Z|00123|coverage|INFO|netdev_get_hwaddr          0.0/sec     0.000/sec        0.0000/sec   total: 12
2016-10-13T17:16:10.708Z|00124|coverage|INFO|netdev_set_hwaddr          0.0/sec     0.000/sec        0.0000/sec   total: 4
2016-10-13T17:16:10.708Z|00125|coverage|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0000/sec   total: 7
2016-10-13T17:16:10.708Z|00126|coverage|INFO|netlink_received           0.0/sec     0.000/sec        0.0000/sec   total: 39
2016-10-13T17:16:10.708Z|00127|coverage|INFO|netlink_recv_jumbo         0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-10-13T17:16:10.708Z|00128|coverage|INFO|netlink_sent               0.0/sec     0.000/sec        0.0000/sec   total: 15
2016-10-13T17:16:10.708Z|00129|coverage|INFO|nln_changed                0.0/sec     0.000/sec        0.0000/sec   total: 9
2016-10-13T17:16:10.708Z|00130|coverage|INFO|57 events never hit
2016-10-13T17:16:10.711Z|00131|connmgr|INFO|ovsbr1<->unix: 1 flow_mods in the last 0 s (1 deletes)
2016-10-13T17:16:10.712Z|00132|connmgr|INFO|ovsbr1<->unix: 1 flow_mods in the last 0 s (1 adds)
2016-10-13T17:16:10.714Z|00133|connmgr|INFO|ovsbr1<->unix: 1 flow_mods in the last 0 s (1 adds)
2016-10-13T17:16:10.720Z|00134|dpif_netdev|INFO|Created 4 pmd threads on numa node 1
2016-10-13T17:16:10.720Z|00135|dpif_netdev|ERR|Cannot create pmd threads due to out of unpinned cores on numa node
2016-10-13T17:16:10.720Z|00136|dpif_netdev|ERR|Cannot create pmd threads due to out of unpinned cores on numa node
2016-10-13T17:16:11.225Z|00137|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage)
2016-10-13T17:16:11.725Z|00138|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage)
2016-10-13T17:16:12.225Z|00139|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage)
2016-10-13T17:16:12.725Z|00140|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage)
2016-10-13T17:16:13.176Z|00141|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage)
2016-10-13T17:16:13.176Z|00142|poll_loop|INFO|wakeup due to [POLLIN] on fd 24 (<->/usr/local/var/run/openvswitch/db.sock) at lib/stream-fd.c:155 (64% CPU usage)
2016-10-13T17:16:13.201Z|00143|poll_loop|INFO|wakeup due to 24-ms timeout at vswitchd/bridge.c:2772 (64% CPU usage)
2016-10-13T17:16:13.226Z|00144|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage)
2016-10-13T17:16:13.727Z|00145|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage)
2016-10-13T17:16:18.202Z|00146|memory|INFO|peak resident set size grew 580% in last 10.0 seconds, from 3860 kB to 26240 kB
2016-10-13T17:16:18.202Z|00147|memory|INFO|handlers:17 ports:6 revalidators:7 rules:12

Comment 11 Flavio Leitner 2016-10-14 12:19:32 UTC
Provide provide the output of lstopo-no-graphics from the host running ovs-dpdk.
The command line used to start ovs:  ps auwwwwwx
The VMs xml/command lines and affinity commands (if manually set).

Thanks,
fbl

Comment 12 bmichalo 2016-10-24 14:12:01 UTC

lstopo-no-graphics
==================

Machine (256GB total)
  NUMANode L#0 (P#0 128GB)
    Package L#0 + L3 L#0 (30MB)
      L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#2)
      L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#4)
      L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#6)
      L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#8)
      L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#10)
      L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#12)
      L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#14)
      L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#16)
      L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#18)
      L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#20)
      L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#22)
    HostBridge L#0
      PCIBridge
        PCI 1000:005d
          Block(Disk) L#0 "sda"
          Block(Disk) L#1 "sdb"
          Block(Disk) L#2 "sdc"
      PCIBridge
        PCI 8086:1583
          Net L#3 "p3p1"
        PCI 8086:1583
          Net L#4 "p3p2"
      PCIBridge
        PCI 8086:10fb
          Net L#5 "em1"
        PCI 8086:10fb
          Net L#6 "em2"
      PCI 8086:8d62
      PCIBridge
        PCI 8086:1521
          Net L#7 "em3"
        PCI 8086:1521
          Net L#8 "em4"
      PCIBridge
        PCIBridge
          PCIBridge
            PCIBridge
              PCI 102b:0534
                GPU L#9 "card0"
                GPU L#10 "controlD64"
      PCI 8086:8d02
        Block(Removable Media Device) L#11 "sr0"
  NUMANode L#1 (P#1 128GB)
    Package L#1 + L3 L#1 (30MB)
      L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#1)
      L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#3)
      L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#5)
      L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#7)
      L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#9)
      L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 (P#11)
      L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 (P#13)
      L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 (P#15)
      L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20 + PU L#20 (P#17)
      L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21 + PU L#21 (P#19)
      L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22 + PU L#22 (P#21)
      L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23 + PU L#23 (P#23)
    HostBridge L#9
      PCIBridge
        PCI 8086:10fb
          Net L#12 "p2p1"
        PCI 8086:10fb
          Net L#13 "p2p2"
      PCIBridge
        PCI 8086:10fb
          Net L#14 "p1p1"
        PCI 8086:10fb
          Net L#15 "p1p2"
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)
  Misc(MemoryModule)


Starting OVS:
=============
root       4188  0.0  0.0  49960  2660 ?        Ss   10:01   0:00 /sbin/ovsdb-server -v --remote=punix:/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach
root       4190  0.0  0.0 125628  1268 ?        Ss   10:01   0:00 SCREEN -dmS ovs sudo su -g qemu -c umask 002; /sbin/ovs-vswitchd                     --dpdk  -c 0x1 -n 3                     --socket-mem 1024,1024                     -- unix:/var/run/openvswitch/db.sock                     --pidfile                     --log-file=/var/log/openvswitch/ovs-vswitchd.log 2>&1 >/var/log/openvswitch/ovs-launch.txt
root       4192  0.0  0.0 215508  3556 pts/1    Ss+  10:01   0:00 sudo su -g qemu -c umask 002; /sbin/ovs-vswitchd                     --dpdk  -c 0x1 -n 3                     --socket-mem 1024,1024                     -- unix:/var/run/openvswitch/db.sock                     --pidfile                     --log-file=/var/log/openvswitch/ovs-vswitchd.log 2>&1 >/var/log/openvswitch/ovs-launch.txt
root       4195  0.0  0.0 207872  3080 pts/1    S+   10:01   0:00 su -g qemu -c umask 002; /sbin/ovs-vswitchd                     --dpdk  -c 0x1 -n 3                     --socket-mem 1024,1024                     -- unix:/var/run/openvswitch/db.sock                     --pidfile                     --log-file=/var/log/openvswitch/ovs-vswitchd.log 2>&1 >/var/log/openvswitch/ovs-launch.txt
root       4198  0.0  0.0 113120  1192 pts/1    S+   10:01   0:00 bash -c umask 002; /sbin/ovs-vswitchd                     --dpdk  -c 0x1 -n 3                     --socket-mem 1024,1024                     -- unix:/var/run/openvswitch/db.sock                     --pidfile                     --log-file=/var/log/openvswitch/ovs-vswitchd.log 2>&1 >/var/log/openvswitch/ovs-launch.txt
root       4199  367  0.0 4478024 30720 pts/1   SLl+ 10:01   8:26 /sbin/ovs-vswitchd --dpdk -c 0x1 -n 3 --socket-mem 1024 1024 -- unix:/var/run/openvswitch/db.sock --pidfile --log-file=/var/log/openvswitch/ovs-vswitchd.log


VM XML file:
------------
<domain type='kvm'>
  <name>vm1-PVP-dpdkVhostUser-1Q-RT</name>
  <uuid>34d85fe9-037b-4468-8191-0a461b819663</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB'/>
    </hugepages>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>3</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='18'/>
    <vcpupin vcpu='1' cpuset='17'/>
    <vcpupin vcpu='2' cpuset='19'/>
    <vcpusched vcpus='1-2' scheduler='fifo' priority='1'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.1.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pmu state='off'/>
  </features>
  <cpu mode='host-passthrough'>
    <feature policy='require' name='tsc-deadline'/>
    <numa>
      <cell id='0' cpus='0-2' memory='4194304' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/home/vm1-PVP-dpdkVhostUser-1Q.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='none'/>
    <controller type='pci' index='0' model='pci-root'/>
    <interface type='bridge'>
      <mac address='52:54:00:d3:14:23'/>
      <source bridge='br-em3'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='52:54:00:c9:5d:fa'/>
      <source type='unix' path='/var/run/openvswitch/vhost-user1' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='52:54:00:9b:8e:01'/>
      <source type='unix' path='/var/run/openvswitch/vhost-user2' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <serial type='file'>
      <source path='/tmp/vm1-PVP-dpdkVhostUser-1Q.console'/>
      <target port='1'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
</domain>


Bridge creation:
================
        # create the bridges/ports with 1 phys dev and 1 virt dev per bridge, to be used for 1 VM to forward packets
        $prefix/bin/ovs-vsctl --if-exists del-br ovsbr0
        echo "creating ovsbr0 bridge"
        $prefix/bin/ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
        $prefix/bin/ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk
        $prefix/bin/ovs-vsctl add-port ovsbr0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser
        $prefix/bin/ovs-ofctl del-flows ovsbr0
        $prefix/bin/ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2"
        $prefix/bin/ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1"
        
        $prefix/bin/ovs-vsctl --if-exists del-br ovsbr1
        echo "creating ovsbr1 bridge"
        $prefix/bin/ovs-vsctl add-br ovsbr1 -- set bridge ovsbr1 datapath_type=netdev
        $prefix/bin/ovs-vsctl add-port ovsbr1 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuser
        $prefix/bin/ovs-vsctl add-port ovsbr1 dpdk1 -- set Interface dpdk1 type=dpdk
        $prefix/bin/ovs-ofctl del-flows ovsbr1
        $prefix/bin/ovs-ofctl add-flow ovsbr1 "in_port=1,idle_timeout=0 actions=output:2"
        $prefix/bin/ovs-ofctl add-flow ovsbr1 "in_port=2,idle_timeout=0 actions=output:1"

        ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=AA
        ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=1

Comment 13 Eelco Chaudron 2016-11-25 15:30:45 UTC
Digged trough the code, and the "dpdkvhostuser" PDM interface always gets initialized on the master lcore from DPDK. This is always the lowest DPDK lcore used, and as OVS takes all lcore's it's the numa of the first lcore in the system.

In the configuration mentioned only lcores on numa 1 are assigned, and this is causing the traffic loss (did a quick test in my setup to confirm).

So please try again in your setup with at least one lcore on numa 0.

If you still see the issue, can you let me take a look at your setup? Else please close this bz.

Thanks,

Eelco

Comment 14 Andrew Theurer 2016-11-28 16:01:29 UTC
Does this problem exist because vhostuser ports in OVS 2.5 are always assumed to be on node0?  If so, should we test OVS 2.6 to see if this problem goes away?

Comment 15 Eelco Chaudron 2016-11-28 16:50:19 UTC
Yes this should be supported in OVS 2.6, and dpdk 16.04 compiled with the CONFIG_RTE_LIBRTE_VHOST_NUMA=y.

Comment 16 bmichalo 2016-11-28 18:02:14 UTC
Current 2.6 testing was done with DPDK 16.07 built with CONFIG_RTE_LIBRTE_VHOST_NUMA=n, so I will rebuild with CONFIG_RTE_LIBRTE_VHOST_NUMA=y and retest.

Comment 17 bmichalo 2016-11-28 19:39:46 UTC
With OVS 2.6, if I compile DPDK 16.07 with the flag CONFIG_RTE_LIBRTE_VHOST_NUMA=y (instead of the default value CONFIG_RTE_LIBRTE_VHOST_NUMA=n) I see propery core/PMD thread/network interface pairings with the corresponding NUMA affinities across NUMA nodes 0 and 1.  I do not see the problem anymore and throughput is as expected.

Comment 18 bmichalo 2016-11-28 20:46:44 UTC
Given this data, can we change the default value of CONFIG_RTE_LIBRTE_VHOST_NUMA from:

CONFIG_RTE_LIBRTE_VHOST_NUMA=n

to 

CONFIG_RTE_LIBRTE_VHOST_NUMA=y

Comment 19 Eelco Chaudron 2016-11-29 08:30:18 UTC
William, are you using our DPDK package? Asking as looking at the spec file of http://download-node-02.eng.bos.redhat.com/brewroot/packages/dpdk/16.07/1.el7fdb/ is has it enabled by default;

# Enable pcap and vhost-numa build, the added deps are ok for us
setconf CONFIG_RTE_LIBRTE_PMD_PCAP y
setconf CONFIG_RTE_LIBRTE_VHOST_NUMA y

If you are using this package and its not enabled let me know and I'll investigate.

Comment 20 Eelco Chaudron 2016-12-14 19:22:38 UTC
Can we close this BZ, as I think this is not an issue. If you think not, please respond to the previous log entry.

Comment 21 bmichalo 2016-12-14 19:46:35 UTC
Okay, I see it:

dpdk.spec:setconf CONFIG_RTE_LIBRTE_VHOST_NUMA y

So the issue doesn't exist anymore with OVS 2.6 and DPDK 16.07.

I think I'm okay with closing it... is there any reason to perhaps leave it open (or other?) until RHEL actually ships with with these versions of code?

Comment 22 Eelco Chaudron 2016-12-19 10:41:41 UTC
Closing PR as for OVS 2.6 package this setting is true, no need to keep BZ open.


Note You need to log in before you can comment on or make changes to this bug.