| Summary: | Some cpumasks generate error with ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Andrew Theurer <atheurer> |
| Component: | openvswitch-dpdk | Assignee: | Eelco Chaudron <echaudro> |
| Status: | CLOSED NEXTRELEASE | QA Contact: | Jean-Tsung Hsiao <jhsiao> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.2 | CC: | atheurer, bmichalo, fleitner, kzhang, mleitner, osabart, pagupta, rkhan |
| Target Milestone: | rc | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-12-19 10:41:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Andrew Theurer
2016-01-05 21:35:45 UTC
Hi Andrew, The current code forces the PMD thread to be in the same NUMA node as the port. So, if you have a port in node 0, you need to allow at least one CPU otherwise it will complain about the lack of unpinned cores. If you have one port in each node, you will need to enable at least one core in each node as a consequence. The idea is to avoid running the PMD threads on a remote node which would affect the performance significantly. Please let me know your thoughts. Thanks, fbl Flavio, in my case all of the physical ports were in node1, and all of the cpus in the mask were in node1, but ovs still rejected the cpumask. I'll try again on latest code to confirm this. Bill M, can you share you recent experience with this? Bill M, can you share you recent experience with this? (In reply to Flavio Leitner from comment #5) > Bill M, can you share you recent experience with this? With: Kernel version: Linux perf84.perf.lab.eng.bos.redhat.com 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux OVS version: ovs-vsctl (Open vSwitch) 2.5.90 Compiled Mar 15 2016 12:20:02 DB Schema 7.12.1 DPDK version: 16.07 I am not seeing this issue with a simple two port-bare metal-OVS test. I was able to use a cpumask=A and create PMD threads upon cpus 1 and 3 (associated with NUMA node 1) and successfully forward packets. ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=A There is not an error within the CPU log. Historically, I would have needed to use a cpumask=3 and then migrate the one via 'taskset' from CPU 0 to CPU 3 Thanks Bill for verifying it. Since we are not updating 2.4 anymore and that 2.5 is fine, I am closing this based on comment#6.
* Bare metal testing seems immune to the problem. The PMD threads are successfully created all on a single NUMA node are behave accordingly
* However, as soon as you introduce a VM such as for a {(P,V),(V,P)} test, the 'unpinned cores' issue manifests itself again
Both cases seem to be 100% repeatable.
(In reply to bmichalo from comment #9) > > > * Bare metal testing seems immune to the problem. The PMD threads are > successfully created all on a single NUMA node are behave accordingly > > * However, as soon as you introduce a VM such as for a {(P,V),(V,P)} test, > the 'unpinned cores' issue manifests itself again > > Both cases seem to be 100% repeatable. With a CPU mask of AA during a {(P1,V1),(V2,P2)} test, four PMD threads should be created upon NUMA node 0. In the end this occurs, however even though the threads are there (htop one can see CPUs 1,3,5, and 7 at 100%) they fail to move data. For a unidirectional test, packets will succeed: P1-->V1--->V2 but then will be dropped at V2. If you use testpmd at the guest forwarding engine, you will see the dropped packets. Also, even with a cpu mask AA, there is an attempt to put the PMD threads upon NUMA node 0. Here's a summary of the ovs-vswitchd.log: 2016-10-13T17:16:09.453Z|00024|dpif_netdev|INFO|Created 1 pmd threads on numa node 1 2016-10-13T17:16:09.590Z|00069|dpif_netdev|INFO|Created 1 pmd threads on numa node 0 2016-10-13T17:16:10.720Z|00134|dpif_netdev|INFO|Created 4 pmd threads on numa node 1 2016-10-13T17:16:10.720Z|00135|dpif_netdev|ERR|Cannot create pmd threads due to out of unpinned cores on numa node 2016-10-13T17:16:10.720Z|00136|dpif_netdev|ERR|Cannot create pmd threads due to out of unpinned cores on numa node Here is the entire ovs-vswitchd.log: cat /usr/local/var/log/openvswitch/ovs-vswitchd.log 2016-10-13T17:16:08.175Z|00002|vlog|INFO|opened log file /usr/local/var/log/openvswitch/ovs-vswitchd.log 2016-10-13T17:16:08.175Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0 2016-10-13T17:16:08.175Z|00004|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1 2016-10-13T17:16:08.176Z|00005|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores 2016-10-13T17:16:08.176Z|00006|memory|INFO|3860 kB peak resident set size after 12.2 seconds 2016-10-13T17:16:08.176Z|00007|reconnect|INFO|unix:/usr/local/var/run/openvswitch/db.sock: connecting... 2016-10-13T17:16:08.176Z|00008|reconnect|INFO|unix:/usr/local/var/run/openvswitch/db.sock: connected 2016-10-13T17:16:08.194Z|00009|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation 2016-10-13T17:16:08.194Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: MPLS label stack length probed as 3 2016-10-13T17:16:08.194Z|00011|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports unique flow ids 2016-10-13T17:16:08.194Z|00012|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_state 2016-10-13T17:16:08.194Z|00013|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_zone 2016-10-13T17:16:08.194Z|00014|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_mark 2016-10-13T17:16:08.194Z|00015|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_label 2016-10-13T17:16:08.194Z|00016|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_state_nat 2016-10-13T17:16:08.194Z|00017|bridge|INFO|bridge ovsbr0: added interface ovsbr0 on port 65534 2016-10-13T17:16:08.198Z|00018|dpif_netlink|ERR|Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded. 2016-10-13T17:16:08.198Z|00019|bridge|INFO|bridge ovsbr0: using datapath ID 0000c6631ac3fd48 2016-10-13T17:16:08.199Z|00020|connmgr|INFO|ovsbr0: added service controller "punix:/usr/local/var/run/openvswitch/ovsbr0.mgmt" 2016-10-13T17:16:08.200Z|00021|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.5.90 2016-10-13T17:16:08.826Z|00022|dpdk|INFO|Port 0: 90:e2:ba:8a:20:28 2016-10-13T17:16:09.453Z|00023|dpdk|INFO|Port 0: 90:e2:ba:8a:20:28 2016-10-13T17:16:09.453Z|00024|dpif_netdev|INFO|Created 1 pmd threads on numa node 1 2016-10-13T17:16:09.453Z|00025|bridge|INFO|bridge ovsbr0: added interface dpdk0 on port 1 2016-10-13T17:16:09.453Z|00026|bridge|INFO|bridge ovsbr0: using datapath ID 000090e2ba8a2028 2016-10-13T17:16:09.453Z|00027|timeval|WARN|Unreasonably long 1250ms poll interval (1245ms user, 1ms system) 2016-10-13T17:16:09.453Z|00028|timeval|WARN|faults: 26 minor, 0 major 2016-10-13T17:16:09.454Z|00029|timeval|WARN|disk: 0 reads, 16 writes 2016-10-13T17:16:09.454Z|00030|timeval|WARN|context switches: 64 voluntary, 81 involuntary 2016-10-13T17:16:09.454Z|00031|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=4ce6b97e: 2016-10-13T17:16:09.454Z|00032|coverage|INFO|bridge_reconfigure 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:09.454Z|00033|coverage|INFO|ofproto_flush 0.0/sec 0.000/sec 0.0000/sec total: 1 2016-10-13T17:16:09.454Z|00034|coverage|INFO|ofproto_update_port 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:09.454Z|00035|coverage|INFO|rev_reconfigure 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:09.454Z|00036|coverage|INFO|rev_flow_table 0.0/sec 0.000/sec 0.0000/sec total: 1 2016-10-13T17:16:09.454Z|00037|coverage|INFO|cmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 6 2016-10-13T17:16:09.454Z|00038|coverage|INFO|cmap_shrink 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:09.454Z|00039|coverage|INFO|dpif_port_add 0.0/sec 0.000/sec 0.0000/sec total: 2 2016-10-13T17:16:09.455Z|00040|coverage|INFO|dpif_flow_flush 0.0/sec 0.000/sec 0.0000/sec total: 2 2016-10-13T17:16:09.455Z|00041|coverage|INFO|dpif_flow_get 0.0/sec 0.000/sec 0.0000/sec total: 5 2016-10-13T17:16:09.455Z|00042|coverage|INFO|dpif_flow_put 0.0/sec 0.000/sec 0.0000/sec total: 10 2016-10-13T17:16:09.455Z|00043|coverage|INFO|dpif_flow_del 0.0/sec 0.000/sec 0.0000/sec total: 5 2016-10-13T17:16:09.455Z|00044|coverage|INFO|dpif_execute 0.0/sec 0.000/sec 0.0000/sec total: 2 2016-10-13T17:16:09.456Z|00045|coverage|INFO|flow_extract 0.0/sec 0.000/sec 0.0000/sec total: 1 2016-10-13T17:16:09.456Z|00046|coverage|INFO|miniflow_malloc 0.0/sec 0.000/sec 0.0000/sec total: 57 2016-10-13T17:16:09.456Z|00047|coverage|INFO|hmap_pathological 0.0/sec 0.000/sec 0.0000/sec total: 1 2016-10-13T17:16:09.456Z|00048|coverage|INFO|hmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 816 2016-10-13T17:16:09.456Z|00049|coverage|INFO|netdev_get_stats 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:09.457Z|00050|coverage|INFO|txn_incomplete 0.0/sec 0.000/sec 0.0000/sec total: 13 2016-10-13T17:16:09.457Z|00051|coverage|INFO|txn_success 0.0/sec 0.000/sec 0.0000/sec total: 4 2016-10-13T17:16:09.457Z|00052|coverage|INFO|poll_create_node 0.0/sec 0.000/sec 0.0000/sec total: 438 2016-10-13T17:16:09.457Z|00053|coverage|INFO|poll_zero_timeout 0.0/sec 0.000/sec 0.0000/sec total: 4 2016-10-13T17:16:09.457Z|00054|coverage|INFO|seq_change 0.0/sec 0.000/sec 0.0000/sec total: 1810 2016-10-13T17:16:09.457Z|00055|coverage|INFO|pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:09.457Z|00056|coverage|INFO|stream_open 0.0/sec 0.000/sec 0.0000/sec total: 1 2016-10-13T17:16:09.458Z|00057|coverage|INFO|util_xalloc 0.0/sec 0.000/sec 0.0000/sec total: 16240 2016-10-13T17:16:09.458Z|00058|coverage|INFO|netdev_set_policing 0.0/sec 0.000/sec 0.0000/sec total: 2 2016-10-13T17:16:09.458Z|00059|coverage|INFO|netdev_get_ifindex 0.0/sec 0.000/sec 0.0000/sec total: 1 2016-10-13T17:16:09.458Z|00060|coverage|INFO|netdev_get_hwaddr 0.0/sec 0.000/sec 0.0000/sec total: 7 2016-10-13T17:16:09.458Z|00061|coverage|INFO|netdev_set_hwaddr 0.0/sec 0.000/sec 0.0000/sec total: 2 2016-10-13T17:16:09.458Z|00062|coverage|INFO|netdev_get_ethtool 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:09.458Z|00063|coverage|INFO|netlink_received 0.0/sec 0.000/sec 0.0000/sec total: 22 2016-10-13T17:16:09.458Z|00064|coverage|INFO|netlink_recv_jumbo 0.0/sec 0.000/sec 0.0000/sec total: 2 2016-10-13T17:16:09.458Z|00065|coverage|INFO|netlink_sent 0.0/sec 0.000/sec 0.0000/sec total: 8 2016-10-13T17:16:09.458Z|00066|coverage|INFO|nln_changed 0.0/sec 0.000/sec 0.0000/sec total: 5 2016-10-13T17:16:09.458Z|00067|coverage|INFO|64 events never hit 2016-10-13T17:16:09.460Z|00068|dpdk|INFO|Socket /usr/local/var/run/openvswitch/vhost-user1 created for vhost-user port vhost-user1 2016-10-13T17:16:09.590Z|00069|dpif_netdev|INFO|Created 1 pmd threads on numa node 0 2016-10-13T17:16:09.590Z|00070|bridge|INFO|bridge ovsbr0: added interface vhost-user1 on port 2 2016-10-13T17:16:09.593Z|00071|connmgr|INFO|ovsbr0<->unix: 1 flow_mods in the last 0 s (1 deletes) 2016-10-13T17:16:09.595Z|00072|connmgr|INFO|ovsbr0<->unix: 1 flow_mods in the last 0 s (1 adds) 2016-10-13T17:16:09.597Z|00073|connmgr|INFO|ovsbr0<->unix: 1 flow_mods in the last 0 s (1 adds) 2016-10-13T17:16:09.604Z|00074|bridge|INFO|bridge ovsbr1: added interface ovsbr1 on port 65534 2016-10-13T17:16:09.604Z|00075|bridge|INFO|bridge ovsbr1: using datapath ID 000022bce6557d4d 2016-10-13T17:16:09.604Z|00076|connmgr|INFO|ovsbr1: added service controller "punix:/usr/local/var/run/openvswitch/ovsbr1.mgmt" 2016-10-13T17:16:09.609Z|00077|dpdk|INFO|Socket /usr/local/var/run/openvswitch/vhost-user2 created for vhost-user port vhost-user2 2016-10-13T17:16:09.609Z|00078|bridge|INFO|bridge ovsbr1: added interface vhost-user2 on port 1 2016-10-13T17:16:10.094Z|00079|dpdk|INFO|Port 1: 90:e2:ba:8a:20:29 2016-10-13T17:16:10.705Z|00080|dpdk|INFO|Port 1: 90:e2:ba:8a:20:29 2016-10-13T17:16:10.705Z|00081|bridge|INFO|bridge ovsbr1: added interface dpdk1 on port 2 2016-10-13T17:16:10.705Z|00082|bridge|INFO|bridge ovsbr1: using datapath ID 000090e2ba8a2029 2016-10-13T17:16:10.705Z|00083|timeval|WARN|Unreasonably long 1092ms poll interval (552ms user, 0ms system) 2016-10-13T17:16:10.705Z|00084|timeval|WARN|faults: 1 minor, 0 major 2016-10-13T17:16:10.705Z|00085|timeval|WARN|disk: 0 reads, 8 writes 2016-10-13T17:16:10.705Z|00086|timeval|WARN|context switches: 53 voluntary, 107 involuntary 2016-10-13T17:16:10.705Z|00087|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=5e4541bb: 2016-10-13T17:16:10.705Z|00088|coverage|INFO|bridge_reconfigure 0.0/sec 0.000/sec 0.0000/sec total: 9 2016-10-13T17:16:10.705Z|00089|coverage|INFO|ofproto_flush 0.0/sec 0.000/sec 0.0000/sec total: 2 2016-10-13T17:16:10.705Z|00090|coverage|INFO|ofproto_recv_openflow 0.0/sec 0.000/sec 0.0000/sec total: 6 2016-10-13T17:16:10.705Z|00091|coverage|INFO|ofproto_update_port 0.0/sec 0.000/sec 0.0000/sec total: 17 2016-10-13T17:16:10.705Z|00092|coverage|INFO|rev_reconfigure 0.0/sec 0.000/sec 0.0000/sec total: 10 2016-10-13T17:16:10.705Z|00093|coverage|INFO|rev_port_toggled 0.0/sec 0.000/sec 0.0000/sec total: 1 2016-10-13T17:16:10.705Z|00094|coverage|INFO|rev_flow_table 0.0/sec 0.000/sec 0.0000/sec total: 5 2016-10-13T17:16:10.705Z|00095|coverage|INFO|cmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 13 2016-10-13T17:16:10.705Z|00096|coverage|INFO|cmap_shrink 0.0/sec 0.000/sec 0.0000/sec total: 9 2016-10-13T17:16:10.706Z|00097|coverage|INFO|dpif_port_add 0.0/sec 0.000/sec 0.0000/sec total: 6 2016-10-13T17:16:10.706Z|00098|coverage|INFO|dpif_flow_flush 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:10.706Z|00099|coverage|INFO|dpif_flow_get 0.0/sec 0.000/sec 0.0000/sec total: 5 2016-10-13T17:16:10.706Z|00100|coverage|INFO|dpif_flow_put 0.0/sec 0.000/sec 0.0000/sec total: 10 2016-10-13T17:16:10.706Z|00101|coverage|INFO|dpif_flow_del 0.0/sec 0.000/sec 0.0000/sec total: 5 2016-10-13T17:16:10.706Z|00102|coverage|INFO|dpif_execute 0.0/sec 0.000/sec 0.0000/sec total: 2 2016-10-13T17:16:10.706Z|00103|coverage|INFO|flow_extract 0.0/sec 0.000/sec 0.0000/sec total: 1 2016-10-13T17:16:10.706Z|00104|coverage|INFO|miniflow_malloc 0.0/sec 0.000/sec 0.0000/sec total: 117 2016-10-13T17:16:10.706Z|00105|coverage|INFO|hmap_pathological 0.0/sec 0.000/sec 0.0000/sec total: 17 2016-10-13T17:16:10.706Z|00106|coverage|INFO|hmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 1326 2016-10-13T17:16:10.706Z|00107|coverage|INFO|netdev_get_stats 0.0/sec 0.000/sec 0.0000/sec total: 7 2016-10-13T17:16:10.706Z|00108|coverage|INFO|txn_unchanged 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:10.707Z|00109|coverage|INFO|txn_incomplete 0.0/sec 0.000/sec 0.0000/sec total: 24 2016-10-13T17:16:10.707Z|00110|coverage|INFO|txn_success 0.0/sec 0.000/sec 0.0000/sec total: 8 2016-10-13T17:16:10.707Z|00111|coverage|INFO|poll_create_node 0.0/sec 0.000/sec 0.0000/sec total: 2901 2016-10-13T17:16:10.707Z|00112|coverage|INFO|poll_zero_timeout 0.0/sec 0.000/sec 0.0000/sec total: 14 2016-10-13T17:16:10.707Z|00113|coverage|INFO|rconn_queued 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:10.707Z|00114|coverage|INFO|rconn_sent 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:10.707Z|00115|coverage|INFO|seq_change 0.0/sec 0.000/sec 0.0000/sec total: 66168 2016-10-13T17:16:10.707Z|00116|coverage|INFO|pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 5 2016-10-13T17:16:10.707Z|00117|coverage|INFO|stream_open 0.0/sec 0.000/sec 0.0000/sec total: 1 2016-10-13T17:16:10.707Z|00118|coverage|INFO|util_xalloc 0.0/sec 0.000/sec 0.0000/sec total: 27692 2016-10-13T17:16:10.707Z|00119|coverage|INFO|vconn_received 0.0/sec 0.000/sec 0.0000/sec total: 9 2016-10-13T17:16:10.707Z|00120|coverage|INFO|vconn_sent 0.0/sec 0.000/sec 0.0000/sec total: 6 2016-10-13T17:16:10.707Z|00121|coverage|INFO|netdev_set_policing 0.0/sec 0.000/sec 0.0000/sec total: 5 2016-10-13T17:16:10.708Z|00122|coverage|INFO|netdev_get_ifindex 0.0/sec 0.000/sec 0.0000/sec total: 2 2016-10-13T17:16:10.708Z|00123|coverage|INFO|netdev_get_hwaddr 0.0/sec 0.000/sec 0.0000/sec total: 12 2016-10-13T17:16:10.708Z|00124|coverage|INFO|netdev_set_hwaddr 0.0/sec 0.000/sec 0.0000/sec total: 4 2016-10-13T17:16:10.708Z|00125|coverage|INFO|netdev_get_ethtool 0.0/sec 0.000/sec 0.0000/sec total: 7 2016-10-13T17:16:10.708Z|00126|coverage|INFO|netlink_received 0.0/sec 0.000/sec 0.0000/sec total: 39 2016-10-13T17:16:10.708Z|00127|coverage|INFO|netlink_recv_jumbo 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-10-13T17:16:10.708Z|00128|coverage|INFO|netlink_sent 0.0/sec 0.000/sec 0.0000/sec total: 15 2016-10-13T17:16:10.708Z|00129|coverage|INFO|nln_changed 0.0/sec 0.000/sec 0.0000/sec total: 9 2016-10-13T17:16:10.708Z|00130|coverage|INFO|57 events never hit 2016-10-13T17:16:10.711Z|00131|connmgr|INFO|ovsbr1<->unix: 1 flow_mods in the last 0 s (1 deletes) 2016-10-13T17:16:10.712Z|00132|connmgr|INFO|ovsbr1<->unix: 1 flow_mods in the last 0 s (1 adds) 2016-10-13T17:16:10.714Z|00133|connmgr|INFO|ovsbr1<->unix: 1 flow_mods in the last 0 s (1 adds) 2016-10-13T17:16:10.720Z|00134|dpif_netdev|INFO|Created 4 pmd threads on numa node 1 2016-10-13T17:16:10.720Z|00135|dpif_netdev|ERR|Cannot create pmd threads due to out of unpinned cores on numa node 2016-10-13T17:16:10.720Z|00136|dpif_netdev|ERR|Cannot create pmd threads due to out of unpinned cores on numa node 2016-10-13T17:16:11.225Z|00137|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage) 2016-10-13T17:16:11.725Z|00138|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage) 2016-10-13T17:16:12.225Z|00139|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage) 2016-10-13T17:16:12.725Z|00140|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage) 2016-10-13T17:16:13.176Z|00141|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage) 2016-10-13T17:16:13.176Z|00142|poll_loop|INFO|wakeup due to [POLLIN] on fd 24 (<->/usr/local/var/run/openvswitch/db.sock) at lib/stream-fd.c:155 (64% CPU usage) 2016-10-13T17:16:13.201Z|00143|poll_loop|INFO|wakeup due to 24-ms timeout at vswitchd/bridge.c:2772 (64% CPU usage) 2016-10-13T17:16:13.226Z|00144|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage) 2016-10-13T17:16:13.727Z|00145|poll_loop|INFO|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[1054703]) at lib/dpif-netdev.c:2639 (64% CPU usage) 2016-10-13T17:16:18.202Z|00146|memory|INFO|peak resident set size grew 580% in last 10.0 seconds, from 3860 kB to 26240 kB 2016-10-13T17:16:18.202Z|00147|memory|INFO|handlers:17 ports:6 revalidators:7 rules:12 Provide provide the output of lstopo-no-graphics from the host running ovs-dpdk. The command line used to start ovs: ps auwwwwwx The VMs xml/command lines and affinity commands (if manually set). Thanks, fbl
lstopo-no-graphics
==================
Machine (256GB total)
NUMANode L#0 (P#0 128GB)
Package L#0 + L3 L#0 (30MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#2)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#4)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#6)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#8)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#10)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#12)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#14)
L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#16)
L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#18)
L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#20)
L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#22)
HostBridge L#0
PCIBridge
PCI 1000:005d
Block(Disk) L#0 "sda"
Block(Disk) L#1 "sdb"
Block(Disk) L#2 "sdc"
PCIBridge
PCI 8086:1583
Net L#3 "p3p1"
PCI 8086:1583
Net L#4 "p3p2"
PCIBridge
PCI 8086:10fb
Net L#5 "em1"
PCI 8086:10fb
Net L#6 "em2"
PCI 8086:8d62
PCIBridge
PCI 8086:1521
Net L#7 "em3"
PCI 8086:1521
Net L#8 "em4"
PCIBridge
PCIBridge
PCIBridge
PCIBridge
PCI 102b:0534
GPU L#9 "card0"
GPU L#10 "controlD64"
PCI 8086:8d02
Block(Removable Media Device) L#11 "sr0"
NUMANode L#1 (P#1 128GB)
Package L#1 + L3 L#1 (30MB)
L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#1)
L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#3)
L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#5)
L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#7)
L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#9)
L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 (P#11)
L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 (P#13)
L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 (P#15)
L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20 + PU L#20 (P#17)
L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21 + PU L#21 (P#19)
L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22 + PU L#22 (P#21)
L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23 + PU L#23 (P#23)
HostBridge L#9
PCIBridge
PCI 8086:10fb
Net L#12 "p2p1"
PCI 8086:10fb
Net L#13 "p2p2"
PCIBridge
PCI 8086:10fb
Net L#14 "p1p1"
PCI 8086:10fb
Net L#15 "p1p2"
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Misc(MemoryModule)
Starting OVS:
=============
root 4188 0.0 0.0 49960 2660 ? Ss 10:01 0:00 /sbin/ovsdb-server -v --remote=punix:/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach
root 4190 0.0 0.0 125628 1268 ? Ss 10:01 0:00 SCREEN -dmS ovs sudo su -g qemu -c umask 002; /sbin/ovs-vswitchd --dpdk -c 0x1 -n 3 --socket-mem 1024,1024 -- unix:/var/run/openvswitch/db.sock --pidfile --log-file=/var/log/openvswitch/ovs-vswitchd.log 2>&1 >/var/log/openvswitch/ovs-launch.txt
root 4192 0.0 0.0 215508 3556 pts/1 Ss+ 10:01 0:00 sudo su -g qemu -c umask 002; /sbin/ovs-vswitchd --dpdk -c 0x1 -n 3 --socket-mem 1024,1024 -- unix:/var/run/openvswitch/db.sock --pidfile --log-file=/var/log/openvswitch/ovs-vswitchd.log 2>&1 >/var/log/openvswitch/ovs-launch.txt
root 4195 0.0 0.0 207872 3080 pts/1 S+ 10:01 0:00 su -g qemu -c umask 002; /sbin/ovs-vswitchd --dpdk -c 0x1 -n 3 --socket-mem 1024,1024 -- unix:/var/run/openvswitch/db.sock --pidfile --log-file=/var/log/openvswitch/ovs-vswitchd.log 2>&1 >/var/log/openvswitch/ovs-launch.txt
root 4198 0.0 0.0 113120 1192 pts/1 S+ 10:01 0:00 bash -c umask 002; /sbin/ovs-vswitchd --dpdk -c 0x1 -n 3 --socket-mem 1024,1024 -- unix:/var/run/openvswitch/db.sock --pidfile --log-file=/var/log/openvswitch/ovs-vswitchd.log 2>&1 >/var/log/openvswitch/ovs-launch.txt
root 4199 367 0.0 4478024 30720 pts/1 SLl+ 10:01 8:26 /sbin/ovs-vswitchd --dpdk -c 0x1 -n 3 --socket-mem 1024 1024 -- unix:/var/run/openvswitch/db.sock --pidfile --log-file=/var/log/openvswitch/ovs-vswitchd.log
VM XML file:
------------
<domain type='kvm'>
<name>vm1-PVP-dpdkVhostUser-1Q-RT</name>
<uuid>34d85fe9-037b-4468-8191-0a461b819663</uuid>
<memory unit='KiB'>4194304</memory>
<currentMemory unit='KiB'>4194304</currentMemory>
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB'/>
</hugepages>
<nosharepages/>
<locked/>
</memoryBacking>
<vcpu placement='static'>3</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='18'/>
<vcpupin vcpu='1' cpuset='17'/>
<vcpupin vcpu='2' cpuset='19'/>
<vcpusched vcpus='1-2' scheduler='fifo' priority='1'/>
</cputune>
<numatune>
<memory mode='strict' nodeset='1'/>
</numatune>
<resource>
<partition>/machine</partition>
</resource>
<os>
<type arch='x86_64' machine='pc-i440fx-rhel7.1.0'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<pmu state='off'/>
</features>
<cpu mode='host-passthrough'>
<feature policy='require' name='tsc-deadline'/>
<numa>
<cell id='0' cpus='0-2' memory='4194304' unit='KiB' memAccess='shared'/>
</numa>
</cpu>
<clock offset='utc'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/home/vm1-PVP-dpdkVhostUser-1Q.qcow2'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>
<controller type='usb' index='0' model='none'/>
<controller type='pci' index='0' model='pci-root'/>
<interface type='bridge'>
<mac address='52:54:00:d3:14:23'/>
<source bridge='br-em3'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</interface>
<interface type='vhostuser'>
<mac address='52:54:00:c9:5d:fa'/>
<source type='unix' path='/var/run/openvswitch/vhost-user1' mode='client'/>
<model type='virtio'/>
<driver name='vhost'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</interface>
<interface type='vhostuser'>
<mac address='52:54:00:9b:8e:01'/>
<source type='unix' path='/var/run/openvswitch/vhost-user2' mode='client'/>
<model type='virtio'/>
<driver name='vhost'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</interface>
<serial type='pty'>
<target port='0'/>
</serial>
<serial type='file'>
<source path='/tmp/vm1-PVP-dpdkVhostUser-1Q.console'/>
<target port='1'/>
</serial>
<console type='pty'>
<target type='serial' port='0'/>
</console>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</memballoon>
</devices>
</domain>
Bridge creation:
================
# create the bridges/ports with 1 phys dev and 1 virt dev per bridge, to be used for 1 VM to forward packets
$prefix/bin/ovs-vsctl --if-exists del-br ovsbr0
echo "creating ovsbr0 bridge"
$prefix/bin/ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
$prefix/bin/ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk
$prefix/bin/ovs-vsctl add-port ovsbr0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser
$prefix/bin/ovs-ofctl del-flows ovsbr0
$prefix/bin/ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2"
$prefix/bin/ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1"
$prefix/bin/ovs-vsctl --if-exists del-br ovsbr1
echo "creating ovsbr1 bridge"
$prefix/bin/ovs-vsctl add-br ovsbr1 -- set bridge ovsbr1 datapath_type=netdev
$prefix/bin/ovs-vsctl add-port ovsbr1 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuser
$prefix/bin/ovs-vsctl add-port ovsbr1 dpdk1 -- set Interface dpdk1 type=dpdk
$prefix/bin/ovs-ofctl del-flows ovsbr1
$prefix/bin/ovs-ofctl add-flow ovsbr1 "in_port=1,idle_timeout=0 actions=output:2"
$prefix/bin/ovs-ofctl add-flow ovsbr1 "in_port=2,idle_timeout=0 actions=output:1"
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=AA
ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=1
Digged trough the code, and the "dpdkvhostuser" PDM interface always gets initialized on the master lcore from DPDK. This is always the lowest DPDK lcore used, and as OVS takes all lcore's it's the numa of the first lcore in the system. In the configuration mentioned only lcores on numa 1 are assigned, and this is causing the traffic loss (did a quick test in my setup to confirm). So please try again in your setup with at least one lcore on numa 0. If you still see the issue, can you let me take a look at your setup? Else please close this bz. Thanks, Eelco Does this problem exist because vhostuser ports in OVS 2.5 are always assumed to be on node0? If so, should we test OVS 2.6 to see if this problem goes away? Yes this should be supported in OVS 2.6, and dpdk 16.04 compiled with the CONFIG_RTE_LIBRTE_VHOST_NUMA=y. Current 2.6 testing was done with DPDK 16.07 built with CONFIG_RTE_LIBRTE_VHOST_NUMA=n, so I will rebuild with CONFIG_RTE_LIBRTE_VHOST_NUMA=y and retest. With OVS 2.6, if I compile DPDK 16.07 with the flag CONFIG_RTE_LIBRTE_VHOST_NUMA=y (instead of the default value CONFIG_RTE_LIBRTE_VHOST_NUMA=n) I see propery core/PMD thread/network interface pairings with the corresponding NUMA affinities across NUMA nodes 0 and 1. I do not see the problem anymore and throughput is as expected. Given this data, can we change the default value of CONFIG_RTE_LIBRTE_VHOST_NUMA from: CONFIG_RTE_LIBRTE_VHOST_NUMA=n to CONFIG_RTE_LIBRTE_VHOST_NUMA=y William, are you using our DPDK package? Asking as looking at the spec file of http://download-node-02.eng.bos.redhat.com/brewroot/packages/dpdk/16.07/1.el7fdb/ is has it enabled by default; # Enable pcap and vhost-numa build, the added deps are ok for us setconf CONFIG_RTE_LIBRTE_PMD_PCAP y setconf CONFIG_RTE_LIBRTE_VHOST_NUMA y If you are using this package and its not enabled let me know and I'll investigate. Can we close this BZ, as I think this is not an issue. If you think not, please respond to the previous log entry. Okay, I see it: dpdk.spec:setconf CONFIG_RTE_LIBRTE_VHOST_NUMA y So the issue doesn't exist anymore with OVS 2.6 and DPDK 16.07. I think I'm okay with closing it... is there any reason to perhaps leave it open (or other?) until RHEL actually ships with with these versions of code? Closing PR as for OVS 2.6 package this setting is true, no need to keep BZ open. |