Bug 1474398 - Significant degrade in packet throughput when network traffic has high number of flows/streams
Significant degrade in packet throughput when network traffic has high number...
Status: ASSIGNED
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch (Show other bugs)
7.3
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Kevin Traynor
Christian Trautman
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-24 10:09 EDT by Andrew Theurer
Modified: 2017-09-18 21:57 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
1024 flow mac learning test (211.73 KB, text/plain)
2017-08-18 09:43 EDT, Kevin Traynor
no flags Details
dpctl dump flows (40.51 KB, text/plain)
2017-08-24 17:59 EDT, Andrew Theurer
no flags Details
fdb show (9.18 KB, text/plain)
2017-08-24 18:00 EDT, Andrew Theurer
no flags Details
dpctl show (first) (1.40 KB, text/plain)
2017-08-24 18:01 EDT, Andrew Theurer
no flags Details
dpctl show (second) (1.40 KB, text/plain)
2017-08-24 18:02 EDT, Andrew Theurer
no flags Details

  None (edit)
Description Andrew Theurer 2017-07-24 10:09:35 EDT
Description of problem:

When testing from very few (256) a very many (1024+) packet flows while using OVS with actions=NORMAL, packet rate degrades significantly.  This is with DPDK.


Version-Release number of selected component (if applicable):

2.6.x and 2.7.x


How reproducible:

Easily


Steps to Reproduce:
1. Set up a ovs bridge with 2 dpdk devices and 2 vhostuser devices, and testpmd in the VM, using mac learning mode (actions=NORMAL)
2. Use a packet generator to run 256 network flows and achieve 3-6Mpps
3. Change the number of flows to 1024 and observe packet rate drop to under 1Mpps

Actual results:


Expected results:

Packet rate should not degrade when using more than 1024 flows.


Additional info:
Comment 2 Andrew Theurer 2017-07-25 08:54:40 EDT
I have tested this without vhostuser, just two dpdk physical devices on 1 bridge and still see the problem.  There is usually 1 CPU not used for PMD  threads that is near 100%.  Perf report for that cpu shows:

     0.96%           718  urcu3            [kernel.kallsyms]    [.] irq_return
     0.56%           487  revalidator52    ovs-vswitchd         [.] parse_flow_nlattrs
     0.56%           450  revalidator94    ovs-vswitchd         [.] parse_flow_nlattrs
     0.54%           468  revalidator72    ovs-vswitchd         [.] parse_flow_nlattrs
     0.52%           430  revalidator93    ovs-vswitchd         [.] parse_flow_nlattrs
     0.51%           424  revalidator53    ovs-vswitchd         [.] parse_flow_nlattrs
     0.51%           410  revalidator57    ovs-vswitchd         [.] parse_flow_nlattrs
     0.49%           396  revalidator95    ovs-vswitchd         [.] parse_flow_nlattrs
     0.49%           417  revalidator56    ovs-vswitchd         [.] parse_flow_nlattrs
     0.48%           396  revalidator70    ovs-vswitchd         [.] parse_flow_nlattrs
     0.48%           401  revalidator69    ovs-vswitchd         [.] parse_flow_nlattrs
     0.47%           424  revalidator75    ovs-vswitchd         [.] parse_flow_nlattrs
     0.44%           368  revalidator58    ovs-vswitchd         [.] parse_flow_nlattrs
     0.44%           385  revalidator98    ovs-vswitchd         [.] parse_flow_nlattrs
     0.44%           343  ovs-vswitchd     [kernel.kallsyms]    [.] irq_return
     0.36%           276  urcu3            [vdso]               [.] __vdso_clock_gettime
     0.36%           294  revalidator69    ovs-vswitchd         [.] xlate_actions
     0.36%           290  revalidator70    ovs-vswitchd         [.] xlate_actions
     0.35%           291  revalidator72    ovs-vswitchd         [.] mac_entry_lookup
     0.35%           299  revalidator52    ovs-vswitchd         [.] xlate_actions
     0.34%           274  revalidator75    [kernel.kallsyms]    [.] irq_return
     0.34%           297  revalidator72    ovs-vswitchd         [.] xlate_actions
     0.34%           276  revalidator58    ovs-vswitchd         [.] xlate_actions
     0.34%           275  revalidator58    ovs-vswitchd         [.] mac_entry_lookup
     0.33%           278  revalidator98    ovs-vswitchd         [.] mac_entry_lookup
     0.33%           275  revalidator94    ovs-vswitchd         [.] xlate_actions
     0.33%           273  revalidator98    ovs-vswitchd         [.] xlate_actions
     0.33%           282  revalidator52    ovs-vswitchd         [.] mac_entry_lookup
     0.32%           265  revalidator53    ovs-vswitchd         [.] mac_entry_lookup
     0.32%           264  revalidator95    ovs-vswitchd         [.] xlate_actions
     0.32%           270  revalidator56    ovs-vswitchd         [.] xlate_actions
     0.32%           264  revalidator56    ovs-vswitchd         [.] mac_entry_lookup
     0.32%           271  revalidator53    ovs-vswitchd         [.] xlate_actions
     0.31%           256  revalidator57    ovs-vswitchd         [.] xlate_actions
     0.30%           260  revalidator93    ovs-vswitchd         [.] xlate_actions
     0.30%           264  revalidator93    ovs-vswitchd         [.] mac_entry_lookup
     0.30%           251  revalidator69    ovs-vswitchd         [.] mac_entry_lookup
Comment 3 Andrew Theurer 2017-07-25 09:24:16 EDT
I will test an older version of OVS 2.6 to see if this is a new problem or always there.
Comment 4 Andrew Theurer 2017-07-25 11:22:18 EDT
Using openvswitch 2.6.1-5 showed similar results, so I don't think this is a new problem, just newly discovered.

Please advise if there are any revalidator thread tuning we can try.
Comment 5 Rashid Khan 2017-07-26 11:24:30 EDT
(In reply to Andrew Theurer from comment #4)
> Using openvswitch 2.6.1-5 showed similar results, so I don't think this is a
> new problem, just newly discovered.
> 
> Please advise if there are any revalidator thread tuning we can try.

Hi Andrew. So please confirm that you do not think it is a regression, but a bug that we need to squish (root-cause).
Comment 6 Andrew Theurer 2017-07-26 11:42:08 EDT
This is a bug we need to fix
Comment 7 Rashid Khan 2017-07-26 12:07:57 EDT
(In reply to Andrew Theurer from comment #6)
> This is a bug we need to fix

No doubt.
Comment 8 Kevin Traynor 2017-07-27 11:39:50 EDT
(In reply to Andrew Theurer from comment #4)
> Using openvswitch 2.6.1-5 showed similar results, so I don't think this is a
> new problem, just newly discovered.
> 
> Please advise if there are any revalidator thread tuning we can try.

I'm not sure about tuning within individual threads, but the amount of revalidator threads can be set with:
ovs-vsctl set Open_vSwitch . other_config:n-revalidator-threads=n

or, by not setting a dpdk-lcore-mask, they can be scheduled by Linux across the initial cpu affinity mask of the OVS main thread => if a user has multiple cores for OS, they would be scheduled across them.
Comment 9 Kevin Traynor 2017-08-01 13:44:52 EDT
For the 2x dpdk port case - can you share the traffic profile that is being sent from each port?

I see only a ~10% drop in my tests going from 256 to 1024 flows, which is probably due to the increase in emc entries or less chance for batching.

If I force emc collisions and send almost everything to the megaflow classifier I see a ~2% drop from 256 to 1024 flows.

I don't see any real load on revalidator threads.

It could be in your test that the traffic profile is causing increased emc collisions when moving to 1024 flows. 'ovs-appctl dpif-netdev/pmd-stats-show' will show if the megaflow is being hit in the steady state.

and/or could be the forwarding db does not have the right entries and packets are being forwarded to the bridge port which is exaggerating the effects of increased flows. Can check this with 'ovs-appctl fdb/show br0' and br0 Tx packets in 'ovs-appctl dpctl/show -s'
Comment 10 Karl Rister 2017-08-01 14:01:33 EDT
atheurer is out on PTO...

I am seeing similar problems to what he has when running tests with source and destination flow mods for both MAC and IP.  We are running a binary search and trying to find a convergence for 0.002% and/or 0.0001% packet loss over a 5 minute validation time.

I'll collect some data and post it ASAP.
Comment 11 Karl Rister 2017-08-01 16:03:30 EDT
I ran a bidirectional test with 256 and 10,000 flows with the previously mentioned parameters (src and dst MAC and IP flow mods, 0.002% and 0.0001% max packet loss) and got the following results:

Max Loss     256 Flows     10,000 Flows
0.002%       8.9678 mpps   1.0984 mpps
0.0001%      8.6710 mpps   N/A

The 10,000 flow/0.0001% was unable to converge so there is no datapoint available.

Focusing on the 0.002% data I can see that during the 256 flow testcase both PMD threads are getting about 4.6M EMC hits/second with 0 megaflow hits/second while driving about 4.6M packets/second.  For the 10,000 flow testcase both PMD threads are getting about ~275K EMC hits/second with ~360K megaflow hits/second.

The high level results are available here:

http://pbench.perf.lab.eng.bos.redhat.com/results/perf122/trafficgen_XL710_OVS-bridged_OVS-2.6.1-20_baremetal_cpu-adj-2_granularity-0.5_ovs-tool_tg%5btrex%5dr%5b100%5dfs%5b64%5dnf%5b256,10000%5dfm%5bsi,di,sm,dm%5dtd%5bbi%5dml%5b0.002,0.001%5dtt%5bbs%5d_2017-08-01_18:06:53/result.html

The detailed results are available at the following links:

256 flow search log:
http://pbench.perf.lab.eng.bos.redhat.com/results/perf122/trafficgen_XL710_OVS-bridged_OVS-2.6.1-20_baremetal_cpu-adj-2_granularity-0.5_ovs-tool_tg%5btrex%5dr%5b100%5dfs%5b64%5dnf%5b256,10000%5dfm%5bsi,di,sm,dm%5dtd%5bbi%5dml%5b0.002,0.001%5dtt%5bbs%5d_2017-08-01_18:06:53/1-bidirec-64B-256flows-0.002pct_drop/sample1/result.txt

256 flow OVS tool data:
http://pbench.perf.lab.eng.bos.redhat.com/results/perf122/trafficgen_XL710_OVS-bridged_OVS-2.6.1-20_baremetal_cpu-adj-2_granularity-0.5_ovs-tool_tg%5btrex%5dr%5b100%5dfs%5b64%5dnf%5b256,10000%5dfm%5bsi,di,sm,dm%5dtd%5bbi%5dml%5b0.002,0.001%5dtt%5bbs%5d_2017-08-01_18:06:53/1-bidirec-64B-256flows-0.002pct_drop/sample1/tools-default/perf124/openvswitch/openvswitch.html

10,000 flow search log:
http://pbench.perf.lab.eng.bos.redhat.com/results/perf122/trafficgen_XL710_OVS-bridged_OVS-2.6.1-20_baremetal_cpu-adj-2_granularity-0.5_ovs-tool_tg%5btrex%5dr%5b100%5dfs%5b64%5dnf%5b256,10000%5dfm%5bsi,di,sm,dm%5dtd%5bbi%5dml%5b0.002,0.001%5dtt%5bbs%5d_2017-08-01_18:06:53/2-bidirec-64B-10000flows-0.002pct_drop/sample1/result.txt

10,000 flow OVS tool data:
http://pbench.perf.lab.eng.bos.redhat.com/results/perf122/trafficgen_XL710_OVS-bridged_OVS-2.6.1-20_baremetal_cpu-adj-2_granularity-0.5_ovs-tool_tg%5btrex%5dr%5b100%5dfs%5b64%5dnf%5b256,10000%5dfm%5bsi,di,sm,dm%5dtd%5bbi%5dml%5b0.002,0.001%5dtt%5bbs%5d_2017-08-01_18:06:53/2-bidirec-64B-10000flows-0.002pct_drop/sample1/tools-default/perf124/openvswitch/openvswitch.html
Comment 12 Andrew Theurer 2017-08-01 21:27:41 EDT
Kevin, are you using mac learning (actions=NORMAL)?
Comment 13 Kevin Traynor 2017-08-02 08:43:12 EDT
(In reply to Andrew Theurer from comment #12)
> Kevin, are you using mac learning (actions=NORMAL)?

yes, I am using it.
Comment 14 Kevin Traynor 2017-08-02 08:58:44 EDT
(In reply to Karl Rister from comment #11)
> I ran a bidirectional test with 256 and 10,000 flows with the previously
> mentioned parameters (src and dst MAC and IP flow mods, 0.002% and 0.0001%
> max packet loss) and got the following results:
> 
> Max Loss     256 Flows     10,000 Flows
> 0.002%       8.9678 mpps   1.0984 mpps
> 0.0001%      8.6710 mpps   N/A
> 
> The 10,000 flow/0.0001% was unable to converge so there is no datapoint
> available.
> 
> Focusing on the 0.002% data I can see that during the 256 flow testcase both
> PMD threads are getting about 4.6M EMC hits/second with 0 megaflow
> hits/second while driving about 4.6M packets/second.  For the 10,000 flow
> testcase both PMD threads are getting about ~275K EMC hits/second with ~360K
> megaflow hits/second.
> 

Thanks for the info. When you start reach multiple 1000's of flows it's normal that the megaflow will get hit due to emc collisions. There is probably also some thrashing of the emc which further degrades performance. Still, I wouldn't have expected the drop you are seeing.

I suspect it's related to the traffic profile in that the learning table cannot get in a good state. For example in a particular test case I just ran the rate goes from 3 mpps ---> 1.2 mpps when the table does not know the port for the dst mac and the packets are sent to the bridge port as well.

It would be useful if you could check the Tx packets on the bridge during the steady state.
Comment 15 Karl Rister 2017-08-02 16:29:08 EDT
If I'm interpreting your request (by bridge port you mean LOCAL, right?) and the data correctly then in the good case (256 flows) we are seeing no activity.  However, in the bad case (10,000 flows) we are seeing high levels of TX packets and a corresponding number of drops.  It is oscillating quite badly between ~600K and ~1.15M for both counters (per second).

If you look at the OVS tool data links I provided previously the tool is mapping the LOCAL counters to the name of the bridge, which in this case is ovsbr0.  So in the graphs all of the ovsbr0 entries are for LOCAL.
Comment 16 Kevin Traynor 2017-08-08 13:26:04 EDT
(In reply to Karl Rister from comment #15)
> If I'm interpreting your request (by bridge port you mean LOCAL, right?) and

yep, that's right.

> the data correctly then in the good case (256 flows) we are seeing no
> activity.  However, in the bad case (10,000 flows) we are seeing high levels
> of TX packets and a corresponding number of drops.  It is oscillating quite
> badly between ~600K and ~1.15M for both counters (per second).
> 
> If you look at the OVS tool data links I provided previously the tool is
> mapping the LOCAL counters to the name of the bridge, which in this case is
> ovsbr0.  So in the graphs all of the ovsbr0 entries are for LOCAL.

Thanks Karl. This indicates that the mac learning table does not have all the information of where to send the packets. Can you (or Andrew) re-test/confirm that the range of dst mac addresses are sent as src mac addresses from the relevant port, so that the mac learning table can update correctly.

It is expected behaviour to have decreased performance if the mac learning table cannot get into a good state because packets are having to be broadcast to the LOCAL port.
Comment 17 Kevin Traynor 2017-08-18 09:36:12 EDT
I've tested the mac learning table with 1024 flows and 2 phy ports.
I see
- 1024 flows correct in the mac learning table
- 1024 flow rules correct on the bridge (i.e. sending to other phy port only, no broadcast)
- no increasing Tx packets on LOCAL port (confirming no broadcast)
- all revalidator threads showing ~0% cpu
- same rates when using direct programmed rules or the mac learning table
- actual rates I am getting are affected by megaflow hits I am seeing due to being difficult to create traffic with a set amount of flows  that rotates 5-tuple and mac addresses in sync to ensure that there is unique mac addresses (for the mac learning table) and also unique entries in the emc. I see this with both mac learning and directly programmed flow rules.

Will attach some logs with details.

My tests show that the mac learning table is working correctly for 1024 flows. Unless there is something that indicates otherwise, I will close this Bz.
Comment 18 Kevin Traynor 2017-08-18 09:43 EDT
Created attachment 1315216 [details]
1024 flow mac learning test

512 flows in each direction
src and dst mac addresses match between ports
Comment 19 Andrew Theurer 2017-08-21 22:00:09 EDT
We'll get a capture of the traffic we generate to make sure it is valid.  We should have it in a couple of days.
Comment 20 Kevin Traynor 2017-08-22 07:05:10 EDT
(In reply to Andrew Theurer from comment #19)
> We'll get a capture of the traffic we generate to make sure it is valid.  We
> should have it in a couple of days.

The flows and mac learning entries will show which mac address are not being learned, so should help with the trafffic analysis.

ovs-appctl dpctl/dump-flows
ovs-appctl fdb/show br0
Comment 21 Andrew Theurer 2017-08-24 17:56:04 EDT
Here's our bridge:
# ovs-vsctl show
49263d2d-15a9-433c-893a-d01cc44c4e78
    Bridge "ovsbr0"
        Port "dpdk-0"
            Interface "dpdk-0"
                type: dpdk
                options: {dpdk-devargs="0000:86:00.0", n_rxq="1", n_rxq_desc="2048", n_txq_desc="2048"}
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "dpdk-1"
            Interface "dpdk-1"
                type: dpdk
                options: {dpdk-devargs="0000:86:00.1", n_rxq="1", n_rxq_desc="2048", n_txq_desc="2048"}


And the flow rules:
# ovs-ofctl dump-flows ovsbr0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=2688.143s, table=0, n_packets=20629720929, n_bytes=1237783316860, idle_age=0, actions=NORMAL


I ran a test scaling number of MAC flows from 1-128, there are the results:

                                                                                                                                                                  throughput|
                                                                                                   Gb_sec                                                        Mframes_sec|
                                                      tx_port:1-rx_port:0              tx_port:0-rx_port:1              tx_port:1-rx_port:0              tx_port:0-rx_port:1|
                                         -------------------------------- -------------------------------- -------------------------------- --------------------------------+
                                            mean stddevpct closest sample    mean stddevpct closest sample    mean stddevpct closest sample    mean stddevpct closest sample|
 1   bidirec-64B-1flows-0.002pct_drop     6.2353    0.0000              1  6.2353    0.0000              1  9.2787    0.0000              1  9.2787    0.0000              1|
 2   bidirec-64B-2flows-0.002pct_drop     5.7734    0.0000              1  5.7734    0.0000              1  8.5913    0.0000              1  8.5913    0.0000              1|
 3   bidirec-64B-4flows-0.002pct_drop     5.3147    0.0000              1  5.3147    0.0000              1  7.9087    0.0000              1  7.9087    0.0000              1|
 4   bidirec-64B-8flows-0.002pct_drop     4.6071    0.0000              1  4.6071    0.0000              1  6.8558    0.0000              1  6.8557    0.0000              1|
 5  bidirec-64B-16flows-0.002pct_drop     3.6170    0.0000              1  3.6170    0.0000              1  5.3825    0.0000              1  5.3825    0.0000              1|
 6  bidirec-64B-32flows-0.002pct_drop     2.8772    0.0000              1  2.8772    0.0000              1  4.2816    0.0000              1  4.2816    0.0000              1|
 7  bidirec-64B-64flows-0.002pct_drop     2.7029    0.0000              1  2.7029    0.0000              1  4.0221    0.0000              1  4.0221    0.0000              1|
 8 bidirec-64B-128flows-0.002pct_drop     2.5708    0.0000              1  2.5708    0.0000              1  3.8256    0.0000              1  3.8256    0.0000              1|



So in this test, throughput goes from 9.2M to 3.8Mpps per direction from 1 to 128 flows.

I'll attach the other info requested.  Here's perf for 1 of the PMD threads at 128 flows:

    26.10%         11079  pmd104   ovs-vswitchd        [.] odp_execute_actions
    16.59%          7041  pmd104   ovs-vswitchd        [.] dp_netdev_input__
    11.84%          5027  pmd104   ovs-vswitchd        [.] miniflow_extract
     8.39%          3561  pmd104   ovs-vswitchd        [.] ixgbe_xmit_pkts_vec
     6.63%          2814  pmd104   ovs-vswitchd        [.] netdev_dpdk_eth_send
     5.96%          2530  pmd104   ovs-vswitchd        [.] dp_execute_cb
     5.74%          2435  pmd104   ovs-vswitchd        [.] ixgbe_recv_pkts_vec
     3.83%          1625  pmd104   libc-2.17.so        [.] __memcmp_sse4_1
     2.74%          1162  pmd104   ovs-vswitchd        [.] netdev_dpdk_filter_packet_len
     2.63%          1115  pmd104   ovs-vswitchd        [.] tx_port_lookup
     1.91%           809  pmd104   ovs-vswitchd        [.] netdev_send
     1.36%           578  pmd104   ovs-vswitchd        [.] netdev_dpdk_rxq_recv
     1.33%           563  pmd104   ovs-vswitchd        [.] nl_attr_type
     1.18%           500  pmd104   ovs-vswitchd        [.] dp_netdev_process_rxq_port
     0.84%           357  pmd104   ovs-vswitchd        [.] non_atomic_ullong_add
     0.57%           243  pmd104   [vdso]              [.] __vdso_clock_gettime
     0.49%           208  pmd104   ovs-vswitchd        [.] __popcountdi2
     0.42%           178  pmd104   ovs-vswitchd        [.] pmd_thread_main
     0.32%           137  pmd104   ovs-vswitchd        [.] nl_attr_get_odp_port
     0.26%           106  pmd104   ovs-vswitchd        [.] netdev_rxq_recv
     0.21%            91  pmd104   ovs-vswitchd        [.] memcmp@plt
     0.10%            41  pmd104   ovs-vswitchd        [.] time_timespec__
     0.07%            31  pmd104   libc-2.17.so        [.] clock_gettime
     0.06%            26  pmd104   ovs-vswitchd        [.] time_msec


and for 1 flow:
    29.10%         18530  pmd104     ovs-vswitchd        [.] miniflow_extract
    20.46%         13028  pmd104     ovs-vswitchd        [.] dp_netdev_input__
    13.81%          8792  pmd104     ovs-vswitchd        [.] ixgbe_xmit_pkts_vec
    13.27%          8450  pmd104     ovs-vswitchd        [.] ixgbe_recv_pkts_vec
     6.69%          4257  pmd104     libc-2.17.so        [.] __memcmp_sse4_1
     2.92%          1853  pmd104     ovs-vswitchd        [.] netdev_dpdk_rxq_recv
     2.03%          1294  pmd104     ovs-vswitchd        [.] odp_execute_actions
     1.93%          1226  pmd104     ovs-vswitchd        [.] netdev_dpdk_filter_packet_len
     1.45%           921  pmd104     ovs-vswitchd        [.] dp_netdev_process_rxq_port
     1.25%           798  pmd104     [vdso]              [.] __vdso_clock_gettime
     1.20%           764  pmd104     ovs-vswitchd        [.] __popcountdi2
     1.18%           751  pmd104     ovs-vswitchd        [.] netdev_dpdk_eth_send
     0.82%           521  pmd104     ovs-vswitchd        [.] dp_execute_cb
     0.55%           352  pmd104     ovs-vswitchd        [.] tx_port_lookup
     0.51%           322  pmd104     ovs-vswitchd        [.] pmd_thread_main
     0.50%           317  pmd104     ovs-vswitchd        [.] netdev_send
     0.35%           222  pmd104     ovs-vswitchd        [.] time_timespec__
     0.35%           221  pmd104     ovs-vswitchd        [.] memcmp@plt
     0.32%           206  pmd104     ovs-vswitchd        [.] netdev_rxq_recv
     0.25%           158  pmd104     ovs-vswitchd        [.] non_atomic_ullong_add
     0.22%           138  pmd104     ovs-vswitchd        [.] time_msec
     0.19%           122  pmd104     libc-2.17.so        [.] clock_gettime
     0.11%            71  pmd104     ovs-vswitchd        [.] nl_attr_type
     0.06%            41  pmd104     ovs-vswitchd        [.] xclock_gettime
     0.04%            23  pmd104     ovs-vswitchd        [.] nl_attr_get_odp_port

Note that odp_execute_actions consumes a ton of cpu on the 128 flow perf report.
Comment 22 Andrew Theurer 2017-08-24 17:59 EDT
Created attachment 1317913 [details]
dpctl dump flows
Comment 23 Andrew Theurer 2017-08-24 18:00 EDT
Created attachment 1317914 [details]
fdb show
Comment 24 Andrew Theurer 2017-08-24 18:01 EDT
Created attachment 1317915 [details]
dpctl show (first)
Comment 25 Andrew Theurer 2017-08-24 18:02 EDT
Created attachment 1317916 [details]
dpctl show (second)
Comment 26 Andrew Theurer 2017-09-18 14:58:54 EDT
This was tested with OSP10 with standard OVS ML2 plug-in, with two provider networks, and the degrade is also present.

Note You need to log in before you can comment on or make changes to this bug.