Bug 1395752

Summary: excessive overhead in PMD thread from ipv4_frag_reassmble()
Product: Red Hat Enterprise Linux 7 Reporter: Andrew Theurer <atheurer>
Component: openvswitchAssignee: Aaron Conole <aconole>
Status: CLOSED WORKSFORME QA Contact: ovs-qe
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: aconole, aloughla, atelang, atheurer, atragler, bmichalo, fbaudin, fleitner, jamie_blatt, ktraynor, kzhang, mleitner, rkhan, sukulkar
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-18 15:18:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Andrew Theurer 2016-11-16 15:21:28 UTC
Description of problem:
Network throughput for openvswitch 2.5.0-14 fdp is lower than source-built 2.5

Version-Release number of selected component (if applicable):
openvswitch-2.5.0-14.git20160727.el7fdp

How reproducible:
requires GTP tunnel and vEPC VNF

Steps to Reproduce:
1. Test vEPC with Ixia BreakingPoint


Actual results:
About 30% of PMD CPU in ipv4_frag_reassembly()

Expected results:
no CPU in ipv4_frag_reassembly()

Additional info:
When the openvswitch 2.5-14 fdp RPM is built without the included patches, performance is good.

Comment 1 Aaron Conole 2016-11-16 16:52:34 UTC
ovs with GTP support is not accepted upstream, even.  Are you manually applying them?  What is the setup?  I am not sure which patches you are using, and which vEPC - is the ixia vEPC available for download somewhere or do we need a license?  Do we need enodeb as well, or are there emulators for that?

Comment 2 Andrew Theurer 2016-11-16 17:43:31 UTC
The GTP tunnel is done in the VNF, not openvswitch.  From ovs point of view, it should have no knowledge of the GTP tunnel.  However, this is the only situation where we have seen this problem.

We don't yet have the capability to reproduce this workload.  So far we are relying on the partner to run tests.  They took the src.rpm and built two ways to compare to our package, first built as normal (rpmbuild -ba) and saw same degrade as the RPM, and second, built only the 2.5 source from the src.rpm, but not including the patches that are applied (that come with the src rpm).  Without the patches, the performance is very good.

Comment 3 Kevin Traynor 2016-11-16 18:10:44 UTC
from the parts of the long email chain I was on - the partner has also tested '2.6.1' built from source and performance was good. Assuming all ok with testing environments, this suggests the patches applied in the openvswitch.spec or some build procedure difference is the root cause. 

If we can't reproduce, to rule out build difference they could apply patches manually and build from source. If degradation is still present they could bisect through patches to id the offender?

Comment 5 Flavio Leitner 2016-11-17 18:30:58 UTC
Hi,

This update covers both the ipv4_frag issue and the performance issue:

The correct symbol name is ipv4_frag_reassemble() which part of DPDK librte IP Frag.  That library shouldn't be used by OVS at all. It might be is used by DPDK bond PMD when balancing with layer34, but I don't think this is the case here, isn't?

So, maybe the symbol resolution is giving a red herring or I am missing something, so could you attach gdb to the PMD thread, add a break point to ipv4_frag_reassemble and then capture the backtrace when it gets called?
Make sure you have the -debuginfo package installed.
For instance:
# gdb /usr/sbin/ovs-vswitchd -p $(pidof ovs-vswitchd)
[...]
Reading symbols from /usr/sbin/ovs-vswitchd...Reading symbols from /usr/lib/debug/usr/sbin/ovs-vswitchd.debug...done.
done.
Attaching to program: /usr/sbin/ovs-vswitchd, process 2928
[...]

// Adding the breakpoint:
(gdb) b ipv4_frag_reassemble
Breakpoint 1 at 0x7fc4a402ee30: file /usr/src/debug/openvswitch-2.5.0/dpdk-2.2.0/lib/librte_ip_frag/rte_ipv4_reassembly.c, line 45.

// Let it continue:
(gdb) continue
Continuing.

Now reproduce the issue while looking at perf.  If the symbol is there, but gdb doesn't stop then perf isn't resolving the symbol correctly otherwise please grab the stack trace:

With another breakpoint (ixgbe_recv_pkts_vec) just as an example:
Breakpoint 3, ixgbe_recv_pkts_vec (rx_queue=0x7fc31a5d5280, rx_pkts=0x7fc48b3f4850, nb_pkts=32)
    at /usr/src/debug/openvswitch-2.5.0/dpdk-2.2.0/drivers/net/ixgbe/ixgbe_rxtx_vec.c:426
426             return _recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
(gdb)

// Getting the backtrace (bt):
(gdb) bt
#0  ixgbe_recv_pkts_vec (rx_queue=0x7fc31a5d5280, rx_pkts=0x7fc48b3f4850, nb_pkts=32)
    at /usr/src/debug/openvswitch-2.5.0/dpdk-2.2.0/drivers/net/ixgbe/ixgbe_rxtx_vec.c:426
#1  0x00007fc4a416a2ba in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7fc48b3f4850, queue_id=0, port_id=1 '\001')
    at /usr/src/debug/openvswitch-2.5.0/dpdk-2.2.0/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2510
#2  netdev_dpdk_rxq_recv (rxq_=<optimized out>, packets=0x7fc48b3f4850, c=0x7fc48b3f484c) at lib/netdev-dpdk.c:1092
#3  0x00007fc4a40df881 in netdev_rxq_recv (rx=<optimized out>, buffers=buffers@entry=0x7fc48b3f4850, cnt=cnt@entry=0x7fc48b3f484c)
    at lib/netdev.c:654
#4  0x00007fc4a40bfa56 in dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fc4a46a7a30, rxq=<optimized out>, port=<optimized out>, 
    port=<optimized out>) at lib/dpif-netdev.c:2594
#5  0x00007fc4a40bfdd9 in pmd_thread_main (f_=0x7fc4a46a7a30) at lib/dpif-netdev.c:2725
#6  0x00007fc4a4121e96 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:340
#7  0x00007fc4a32b7dc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fc4a26af1cd in clone () from /lib64/libc.so.6


That gives us a lot of information about the issue.


Another approach is to disable the support for the librte_frag and pmd bond.  For that you just need to add the following three lines in the spec file without the '+' as below:

 setconf CONFIG_RTE_LIBRTE_CRYPTODEV n
 setconf CONFIG_RTE_LIBRTE_MBUF_OFFLOAD n
 
+# Turn off frag
+setconf CONFIG_RTE_LIBRTE_IP_FRAG n
+setconf CONFIG_RTE_LIBRTE_PMD_BOND n
+
 make V=1 O=%{dpdktarget} %{?_smp_mflags}

and then rebuild the srpm.


OK,  regarding to the performance it is hard to say anything at the moment because we don't have a reliable perf output yet, unfortunately.  It does work in my lab though.

However, we can try to bisect which patch is causing the issue. The first one to take out is the big update in openvswitch-2.5-branch.patch.  Just comment out the patch11 in the spec as below and then rebuild the srpm.

 %patch3 -p1
 %patch4 -p1
 %patch10 -p1 -R
#%patch11 -p1
 %patch20 -p1
 %patch21 -p1
 %patch22 -p1

Another candidate is patch#20, 0001-dpif-Allow-adding-ukeys-for-same-flow-by-different-p.patch.

Most of the other patches are initialization/systemd related, so not related to performance at all.

The RPM uses a common set of CPU flags/optimizations in order for the RPM to be usable with a range of CPUs, but even so it should not give a noticeable performance impact.

Thanks!
fbl

Comment 6 Anita Tragler 2016-11-17 21:19:41 UTC
This is affecting Affirmed networks vEPC VNF and they are seeing 30-40% improved performance without using patches for OVS 2.5 built from source.
They are using Mobile traffic profile with IXIA Breakingpoint stateful traffic generator.

We do not have customer traffic profile or L4 traffic test tool. But we can try to recreate this issue with raw GTP (UDP) traffic and bisect each  patch individually and verify performance degradation? Do we know packet sizes being used to trigger IP fragmentation?

We would like to urgently address this issue in time for FDP release in December. Errata needs to go out next week Nov 25th? This cannot wait another 6 weeks.

Comment 7 Aaron Conole 2016-12-02 18:15:16 UTC
Can we also get the DPDK configuration options they used to build, and which dpdk they built against (was it 2.2 release?)  DPDK configuration might differ, and I want to ensure our configurations are the same.  Customer reports issues only when using the RPM build, but the patches applied are almost exclusively to Red Hat utilities.

Additionally, it seems the compilation flags we use to build are as follows:

  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic

I don't know what redhat-hardened-cc1 will do, but I believe it adds '-fPIE' to the build, as well.  Is it possible for the customer to use these build flags and try the build out?


Finally, perhaps there are some PCAP files that can be captured in either direction from outside the ovs/dpdk boundary, ex:


  +-------------+       +----------+       +------+
  | traffic-gen | <---> | OVS+dpdk | <---> | vEPC |
  +-------------+   ^   +----------+       +------+
   capture here  ---+

Then we could play that traffic back and observe the OVS behavior.  It isn't the same as running the full vEPC, but if we have the traffic in both directions, we could play it with something like tcpreplay, as appropriate, and watch for the perf data.  Thoughts?

Comment 9 Kevin Traynor 2016-12-12 16:53:03 UTC
There is now also a OVS 2.6.1 package available in brew. It has not been tested by QE yet.

https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=528193

Comment 10 Rashid Khan 2016-12-13 20:37:31 UTC
(In reply to bmichalo from comment #8)
> I have provided Affirmed with RPMs with several suspect patches removed:
> 
> http://people.redhat.com/dshaks/openvswitch-devel-2.5.0-22p21.git20160727.
> el7.x86_64.rpm
> http://people.redhat.com/dshaks/openvswitch-2.5.0-22p21.git20160727.el7.
> x86_64.rpm
> http://people.redhat.com/dshaks/openvswitch-debuginfo-2.5.0-22p21.
> git20160727.el7.x86_64.rpm
> 
> waiting for feedback regarding performance.

Any feedback from Affirmed?

Comment 11 bmichalo 2016-12-13 20:43:48 UTC
(In reply to Rashid Khan from comment #10)
> (In reply to bmichalo from comment #8)
> > I have provided Affirmed with RPMs with several suspect patches removed:
> > 
> > http://people.redhat.com/dshaks/openvswitch-devel-2.5.0-22p21.git20160727.
> > el7.x86_64.rpm
> > http://people.redhat.com/dshaks/openvswitch-2.5.0-22p21.git20160727.el7.
> > x86_64.rpm
> > http://people.redhat.com/dshaks/openvswitch-debuginfo-2.5.0-22p21.
> > git20160727.el7.x86_64.rpm
> > 
> > waiting for feedback regarding performance.
> 
> Any feedback from Affirmed?


I have not heard anything yet.

Comment 14 jamieblatt 2017-01-04 12:48:50 UTC
(In reply to bmichalo from comment #8)
> I have provided Affirmed with RPMs with several suspect patches removed:
> 
> http://people.redhat.com/dshaks/openvswitch-devel-2.5.0-22p21.git20160727.
> el7.x86_64.rpm
> http://people.redhat.com/dshaks/openvswitch-2.5.0-22p21.git20160727.el7.
> x86_64.rpm
> http://people.redhat.com/dshaks/openvswitch-debuginfo-2.5.0-22p21.
> git20160727.el7.x86_64.rpm
> 
> waiting for feedback regarding performance.

When I try to use these RPMS one of them gives error on install

rpm -iv openvswitch-2.5.0-22p21.git20160727.el7.x86_64.rpm
error: Failed dependencies:
        libatomic.so.1()(64bit) is needed by openvswitch-2.5.0-22p21.git20160727.el7.x86_64

even though I have this lib
/usr/bin/usr/libatomic.so.1
/usr/lib64/libatomic.so.1

Comment 15 Aaron Conole 2017-01-04 15:07:47 UTC
You may need to either 
   yum install openvswitch-2.5.0-22p21.git20160727.el7.x86_64.rpm

or
   yum install libatomic

I don't know what other dependencies may be needed.  I'm not sure what /usr/bin/usr/libatomic.so.1 is - can you confirm you didn't custom build your gcc package (and you aren't using a 3rd party gcc)?

Comment 16 bmichalo 2017-01-04 15:10:17 UTC
Please try:

yum install libatomic-*.x86_64

and then install the RPMs

Comment 17 jamieblatt 2017-01-04 19:05:39 UTC
(In reply to bmichalo from comment #16)
> Please try:
> 
> yum install libatomic-*.x86_64
> 
> and then install the RPMs

an older version was already installed.

I removed it and reinstalled.
I can now install RPM.
Will proceed with testing tomorrow.

Comment 18 jamieblatt 2017-01-11 12:56:53 UTC
(In reply to jamieblatt from comment #17)
> (In reply to bmichalo from comment #16)
> > Please try:
> > 
> > yum install libatomic-*.x86_64
> > 
> > and then install the RPMs
> 
> an older version was already installed.
> 
> I removed it and reinstalled.
> I can now install RPM.
> Will proceed with testing tomorrow.

I finally was able to get call generator from QA to test this.

End result: only getting 6 GB of throughput using 1024 MSS packets.(without packet loss). Expecting at least 8 GB.

When looking at perf top . I do not see ipv4_frag_reassemble() any more

here is one of the PMD cores

   34.21%    34.06%  ovs-vswitchd        [.] rte_vhost_enqueue_burst
+   21.43%    12.28%  ovs-vswitchd        [.] dp_netdev_process_rxq_port.isra.22
+    7.66%     7.63%  ovs-vswitchd        [.] _recv_raw_pkts_vec
+    7.53%     7.50%  ovs-vswitchd        [.] netdev_dpdk_rxq_recv
+    6.94%     1.72%  ovs-vswitchd        [.] dp_netdev_input__
+    6.57%     6.55%  ovs-vswitchd        [.] emc_processing.constprop.23
+    6.18%     0.00%  [unknown]           [.] 0000000000000000
+    6.16%     6.07%  ovs-vswitchd        [.] __netdev_dpdk_vhost_send
+    4.73%     0.49%  ovs-vswitchd        [.] fast_path_processing
+    4.18%     4.17%  ovs-vswitchd        [.] miniflow_extract
+    3.65%     3.63%  ovs-vswitchd        [.] dpcls_lookup
+    1.71%     0.29%  libc-2.17.so        [.] clock_gettime
+    1.62%     0.00%  [kernel]            [k] ksm_scan_thread
+    1.57%     0.66%  [kernel]            [k] ksm_do_scan
+    1.40%     1.39%  [vdso]              [.] __vdso_clock_gettime
+    1.36%     1.36%  ovs-vswitchd        [.] rte_pktmbuf_free
+    1.31%     1.31%  ovs-vswitchd        [.] pmd_thread_main
+    0.98%     0.98%  ovs-vswitchd        [.] cmap_find_batch
+    0.92%     0.91%  ovs-vswitchd        [.] netdev_rxq_recv
+    0.84%     0.84%  ovs-vswitchd        [.] non_atomic_ullong_add
+    0.80%     0.68%  ovs-vswitchd        [.] odp_execute_actions
+    0.79%     0.68%  ovs-vswitchd        [.] emc_insert
+    0.69%     0.69%  ovs-vswitchd        [.] dp_execute_cb
+    0.69%     0.68%  ovs-vswitchd        [.] packet_batch_update
+    0.65%     0.65%  libc-2.17.so        [.] __memcmp_sse4_1
+    0.65%     0.00%  [unknown]           [.] 0x00450008070022c5
+    0.62%     0.48%  [kernel]            [k] follow_page_mask
+    0.62%     0.62%  ovs-vswitchd        [.] cmap_find
+    0.47%     0.00%  [unknown]           [.] 0x0000000000000035
+    0.39%     0.39%  ovs-vswitchd        [.] time_timespec__
+    0.35%     0.00%  [unknown]           [.] 0x00000048b6f14000
+    0.35%     0.02%  [kernel]            [k] apic_timer_interrupt
+    0.34%     0.34%  ovs-vswitchd        [.] dp_netdev_lookup_port
+    0.32%     0.00%  [unknown]           [.] 0x00000048c6944000
+    0.30%     0.30%  ovs-vswitchd        [.] netdev_send
+    0.27%     0.00%  [kernel]            [k] ret_from_fork
+    0.27%     0.00%  [kernel]            [k] kthread
+    0.25%     0.25%  ovs-vswitchd        [.] ixgbe_recv_pkts_vec
+    0.25%     0.24%  ovs-vswitchd        [.] time_msec
+    0.24%     0.01%  [kernel]            [k] local_apic_timer_interrupt
+    0.22%     0.00%  [unknown]           [.] 0x00000048b6e60000
+    0.18%     0.01%  [kernel]            [k] __hrtimer_run_queues
+    0.17%     0.00%  [unknown]           [.] 0x0000004880001129
+    0.17%     0.17%  ovs-vswitchd        [.] __popcountdi2
+    0.16%     0.00%  [kernel]            [k] smp_apic_timer_interrupt
+    0.16%     0.00%  [kernel]            [k] tick_sched_timer
+    0.15%     0.00%  [kernel]            [k] hrtimer_interrupt
+    0.14%     0.14%  ovs-vswitchd        [.] netdev_dpdk_vhost_send
+    0.13%     0.13%  libpthread-2.17.so  [.] pthread_once
+    0.11%     0.11%  [kernel]            [k] _raw_spin_lock
+    0.10%     0.10%  libc-2.17.so        [.] __memset_sse2
+    0.10%     0.00%  [unknown]           [.] 0x0000042200000063

+    0.10%     0.00%  [unknown]           [.] 0x000000487800123

Comment 19 Aaron Conole 2017-01-13 16:14:39 UTC
(In reply to bmichalo from comment #8)
> I have provided Affirmed with RPMs with several suspect patches removed:
> 
> http://people.redhat.com/dshaks/openvswitch-devel-2.5.0-22p21.git20160727.
> el7.x86_64.rpm
> http://people.redhat.com/dshaks/openvswitch-2.5.0-22p21.git20160727.el7.
> x86_64.rpm
> http://people.redhat.com/dshaks/openvswitch-debuginfo-2.5.0-22p21.
> git20160727.el7.x86_64.rpm
> 
> waiting for feedback regarding performance.

Can you put up the srpm or explain what the patches you removed here are?

Comment 20 bmichalo 2017-01-17 23:28:20 UTC
(In reply to jamieblatt from comment #18)
> (In reply to jamieblatt from comment #17)
> > (In reply to bmichalo from comment #16)
> > > Please try:
> > > 
> > > yum install libatomic-*.x86_64
> > > 
> > > and then install the RPMs
> > 
> > an older version was already installed.
> > 
> > I removed it and reinstalled.
> > I can now install RPM.
> > Will proceed with testing tomorrow.
> 
> I finally was able to get call generator from QA to test this.
> 
> End result: only getting 6 GB of throughput using 1024 MSS packets.(without
> packet loss). Expecting at least 8 GB.
> 
> When looking at perf top . I do not see ipv4_frag_reassemble() any more
> 
> here is one of the PMD cores
> 
>    34.21%    34.06%  ovs-vswitchd        [.] rte_vhost_enqueue_burst
> +   21.43%    12.28%  ovs-vswitchd        [.]
> dp_netdev_process_rxq_port.isra.22
> +    7.66%     7.63%  ovs-vswitchd        [.] _recv_raw_pkts_vec
> +    7.53%     7.50%  ovs-vswitchd        [.] netdev_dpdk_rxq_recv
> +    6.94%     1.72%  ovs-vswitchd        [.] dp_netdev_input__
> +    6.57%     6.55%  ovs-vswitchd        [.] emc_processing.constprop.23
> +    6.18%     0.00%  [unknown]           [.] 0000000000000000
> +    6.16%     6.07%  ovs-vswitchd        [.] __netdev_dpdk_vhost_send
> +    4.73%     0.49%  ovs-vswitchd        [.] fast_path_processing
> +    4.18%     4.17%  ovs-vswitchd        [.] miniflow_extract
> +    3.65%     3.63%  ovs-vswitchd        [.] dpcls_lookup
> +    1.71%     0.29%  libc-2.17.so        [.] clock_gettime
> +    1.62%     0.00%  [kernel]            [k] ksm_scan_thread
> +    1.57%     0.66%  [kernel]            [k] ksm_do_scan
> +    1.40%     1.39%  [vdso]              [.] __vdso_clock_gettime
> +    1.36%     1.36%  ovs-vswitchd        [.] rte_pktmbuf_free
> +    1.31%     1.31%  ovs-vswitchd        [.] pmd_thread_main
> +    0.98%     0.98%  ovs-vswitchd        [.] cmap_find_batch
> +    0.92%     0.91%  ovs-vswitchd        [.] netdev_rxq_recv
> +    0.84%     0.84%  ovs-vswitchd        [.] non_atomic_ullong_add
> +    0.80%     0.68%  ovs-vswitchd        [.] odp_execute_actions
> +    0.79%     0.68%  ovs-vswitchd        [.] emc_insert
> +    0.69%     0.69%  ovs-vswitchd        [.] dp_execute_cb
> +    0.69%     0.68%  ovs-vswitchd        [.] packet_batch_update
> +    0.65%     0.65%  libc-2.17.so        [.] __memcmp_sse4_1
> +    0.65%     0.00%  [unknown]           [.] 0x00450008070022c5
> +    0.62%     0.48%  [kernel]            [k] follow_page_mask
> +    0.62%     0.62%  ovs-vswitchd        [.] cmap_find
> +    0.47%     0.00%  [unknown]           [.] 0x0000000000000035
> +    0.39%     0.39%  ovs-vswitchd        [.] time_timespec__
> +    0.35%     0.00%  [unknown]           [.] 0x00000048b6f14000
> +    0.35%     0.02%  [kernel]            [k] apic_timer_interrupt
> +    0.34%     0.34%  ovs-vswitchd        [.] dp_netdev_lookup_port
> +    0.32%     0.00%  [unknown]           [.] 0x00000048c6944000
> +    0.30%     0.30%  ovs-vswitchd        [.] netdev_send
> +    0.27%     0.00%  [kernel]            [k] ret_from_fork
> +    0.27%     0.00%  [kernel]            [k] kthread
> +    0.25%     0.25%  ovs-vswitchd        [.] ixgbe_recv_pkts_vec
> +    0.25%     0.24%  ovs-vswitchd        [.] time_msec
> +    0.24%     0.01%  [kernel]            [k] local_apic_timer_interrupt
> +    0.22%     0.00%  [unknown]           [.] 0x00000048b6e60000
> +    0.18%     0.01%  [kernel]            [k] __hrtimer_run_queues
> +    0.17%     0.00%  [unknown]           [.] 0x0000004880001129
> +    0.17%     0.17%  ovs-vswitchd        [.] __popcountdi2
> +    0.16%     0.00%  [kernel]            [k] smp_apic_timer_interrupt
> +    0.16%     0.00%  [kernel]            [k] tick_sched_timer
> +    0.15%     0.00%  [kernel]            [k] hrtimer_interrupt
> +    0.14%     0.14%  ovs-vswitchd        [.] netdev_dpdk_vhost_send
> +    0.13%     0.13%  libpthread-2.17.so  [.] pthread_once
> +    0.11%     0.11%  [kernel]            [k] _raw_spin_lock
> +    0.10%     0.10%  libc-2.17.so        [.] __memset_sse2
> +    0.10%     0.00%  [unknown]           [.] 0x0000042200000063
> 
> +    0.10%     0.00%  [unknown]           [.] 0x000000487800123

Is that 8 Mpps bidirectional (4 Mpps in each direction?), or 8 Mpps unidirectional?

Comment 21 jamieblatt 2017-01-18 13:16:14 UTC
(In reply to bmichalo from comment #20)
> (In reply to jamieblatt from comment #18)
> > (In reply to jamieblatt from comment #17)
> > > (In reply to bmichalo from comment #16)
> > > > Please try:
> > > > 
> > > > yum install libatomic-*.x86_64
> > > > 
> > > > and then install the RPMs
> > > 
> > > an older version was already installed.
> > > 
> > > I removed it and reinstalled.
> > > I can now install RPM.
> > > Will proceed with testing tomorrow.
> > 
> > I finally was able to get call generator from QA to test this.
> > 
> > End result: only getting 6 GB of throughput using 1024 MSS packets.(without
> > packet loss). Expecting at least 8 GB.
> > 
> > When looking at perf top . I do not see ipv4_frag_reassemble() any more
> > 
> > here is one of the PMD cores
> > 
> >    34.21%    34.06%  ovs-vswitchd        [.] rte_vhost_enqueue_burst
> > +   21.43%    12.28%  ovs-vswitchd        [.]
> > dp_netdev_process_rxq_port.isra.22
> > +    7.66%     7.63%  ovs-vswitchd        [.] _recv_raw_pkts_vec
> > +    7.53%     7.50%  ovs-vswitchd        [.] netdev_dpdk_rxq_recv
> > +    6.94%     1.72%  ovs-vswitchd        [.] dp_netdev_input__
> > +    6.57%     6.55%  ovs-vswitchd        [.] emc_processing.constprop.23
> > +    6.18%     0.00%  [unknown]           [.] 0000000000000000
> > +    6.16%     6.07%  ovs-vswitchd        [.] __netdev_dpdk_vhost_send
> > +    4.73%     0.49%  ovs-vswitchd        [.] fast_path_processing
> > +    4.18%     4.17%  ovs-vswitchd        [.] miniflow_extract
> > +    3.65%     3.63%  ovs-vswitchd        [.] dpcls_lookup
> > +    1.71%     0.29%  libc-2.17.so        [.] clock_gettime
> > +    1.62%     0.00%  [kernel]            [k] ksm_scan_thread
> > +    1.57%     0.66%  [kernel]            [k] ksm_do_scan
> > +    1.40%     1.39%  [vdso]              [.] __vdso_clock_gettime
> > +    1.36%     1.36%  ovs-vswitchd        [.] rte_pktmbuf_free
> > +    1.31%     1.31%  ovs-vswitchd        [.] pmd_thread_main
> > +    0.98%     0.98%  ovs-vswitchd        [.] cmap_find_batch
> > +    0.92%     0.91%  ovs-vswitchd        [.] netdev_rxq_recv
> > +    0.84%     0.84%  ovs-vswitchd        [.] non_atomic_ullong_add
> > +    0.80%     0.68%  ovs-vswitchd        [.] odp_execute_actions
> > +    0.79%     0.68%  ovs-vswitchd        [.] emc_insert
> > +    0.69%     0.69%  ovs-vswitchd        [.] dp_execute_cb
> > +    0.69%     0.68%  ovs-vswitchd        [.] packet_batch_update
> > +    0.65%     0.65%  libc-2.17.so        [.] __memcmp_sse4_1
> > +    0.65%     0.00%  [unknown]           [.] 0x00450008070022c5
> > +    0.62%     0.48%  [kernel]            [k] follow_page_mask
> > +    0.62%     0.62%  ovs-vswitchd        [.] cmap_find
> > +    0.47%     0.00%  [unknown]           [.] 0x0000000000000035
> > +    0.39%     0.39%  ovs-vswitchd        [.] time_timespec__
> > +    0.35%     0.00%  [unknown]           [.] 0x00000048b6f14000
> > +    0.35%     0.02%  [kernel]            [k] apic_timer_interrupt
> > +    0.34%     0.34%  ovs-vswitchd        [.] dp_netdev_lookup_port
> > +    0.32%     0.00%  [unknown]           [.] 0x00000048c6944000
> > +    0.30%     0.30%  ovs-vswitchd        [.] netdev_send
> > +    0.27%     0.00%  [kernel]            [k] ret_from_fork
> > +    0.27%     0.00%  [kernel]            [k] kthread
> > +    0.25%     0.25%  ovs-vswitchd        [.] ixgbe_recv_pkts_vec
> > +    0.25%     0.24%  ovs-vswitchd        [.] time_msec
> > +    0.24%     0.01%  [kernel]            [k] local_apic_timer_interrupt
> > +    0.22%     0.00%  [unknown]           [.] 0x00000048b6e60000
> > +    0.18%     0.01%  [kernel]            [k] __hrtimer_run_queues
> > +    0.17%     0.00%  [unknown]           [.] 0x0000004880001129
> > +    0.17%     0.17%  ovs-vswitchd        [.] __popcountdi2
> > +    0.16%     0.00%  [kernel]            [k] smp_apic_timer_interrupt
> > +    0.16%     0.00%  [kernel]            [k] tick_sched_timer
> > +    0.15%     0.00%  [kernel]            [k] hrtimer_interrupt
> > +    0.14%     0.14%  ovs-vswitchd        [.] netdev_dpdk_vhost_send
> > +    0.13%     0.13%  libpthread-2.17.so  [.] pthread_once
> > +    0.11%     0.11%  [kernel]            [k] _raw_spin_lock
> > +    0.10%     0.10%  libc-2.17.so        [.] __memset_sse2
> > +    0.10%     0.00%  [unknown]           [.] 0x0000042200000063
> > 
> > +    0.10%     0.00%  [unknown]           [.] 0x000000487800123
> 
> Is that 8 Mpps bidirectional (4 Mpps in each direction?), or 8 Mpps
> unidirectional?
Actually on my latest test the DPDK-OVS 2.5 code form openvswitch git site I get an overall throughput of 8.8Gbps
I was just using an overall system throughput of 8.8Gbps to describe differences. Our vEPC has two 10G ports connected to a load generator. One port has rx 8.1Gbps and tx 1Gbps while the other port has rx 700Mbps and tx 7.8Gbps. 
So the overall rx and tx is 8.8Gbps.  The avg frame rate for each port is 920000 Frames/sec bidirectional.

Also our VMs are distributed between two host. SO there is another EW 10G port on each host sending around 9200000 Frames/sec bidirectional  of traffic between hosts

Once I finish all my testing I will send out results in a doc.

Comment 22 Flavio Leitner 2017-01-18 14:09:31 UTC
> here is one of the PMD cores
[...]

The vswitch side looks expected now. However this is not expected to me:

> +    1.62%     0.00%  [kernel]            [k] ksm_scan_thread
> +    1.57%     0.66%  [kernel]            [k] ksm_do_scan
> +    0.62%     0.48%  [kernel]            [k] follow_page_mask

The KSM thread should not be running in PMD core, otherwise there will be context switches and extra load that will impact in the vswitch's performance.

Comment 23 Aaron Conole 2017-01-24 19:17:06 UTC
I think it might happen if the hugepages are being broken up in a low memory condition for sharing. If that happens again, we can do a dump of /proc/meminfo and maybe get some information.

It does seem like our compile flags will differ.  For instance, our user-space is compiled with hardening, and as position independent executable, and this would seem to have an effect as compared with compiling without these flags.

Comment 29 Aaron Conole 2017-04-18 15:18:01 UTC
I haven't been able to reproduce the initial reported issue in this BZ - ipv4_frag_reassemble().  Additionall, I provided the customer with a number of RPMs, one of which where I didn't make any changes other than compilation flags.  The issue hasn't resurfaced since the initial report.

There is a performance related bug tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1414939 so I'm closing this bug.