Bug 1364898
| Summary: | openvswitch multi-queues packet drop | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Eyal Dannon <edannon> | ||||||||||
| Component: | openvswitch | Assignee: | Panu Matilainen <pmatilai> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Christian Trautman <ctrautma> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 7.2 | CC: | aloughla, apevec, atheurer, atragler, chrisw, edannon, fbaudin, fleitner, jhsiao, oblaut, osabart, qding, ralongi, rhos-maint, rkhan, srevivo, yrachman, zshi | ||||||||||
| Target Milestone: | rc | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | openvswitch-2.5.0-10.git20160727.el7fdb | Doc Type: | If docs needed, set a value | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2017-01-12 17:55:00 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | |||||||||||||
| Bug Blocks: | 1383822 | ||||||||||||
| Attachments: |
|
||||||||||||
Hi, The queue selection happens after the device selection. So, if the DP is sending packets to a kernel device that would slow down regardless of the number of queues. The number of queues doesn't affect the flow processing, so most probably (I am guessing here) the action NORMAL is sending packets to a kernel device during the MQ test. Could you please confirm? Thanks, fbl Here are a few data points on different versions of OVS. All tests use vhostuser with 2-queues and testpmd from dpdk-2.2 in the VM. VM has 2 virtio-net interfaces (2-queues each). Network traffic is uni-directional, incoming on first virtio-net interface and outgoing on second.
from git, 'master' branch, commit 884d541fdffe44714eb9381fb92f8f4e1cfcf668:
9.71 Mpps
from git, 'branch-2.5' branch, commit 884d541fdffe44714eb9381fb92f8f4e1cfcf668:
0.19Mpps
from openvswitch-2.5.0-5.git20160628.el7fdb RPM:
0.l9 Mpps
The following data was captured when using openvswitch-2.5.0-5.git20160628.el7fdb RPM:
# ovs-vsctl show
178f0d9a-476e-484c-9295-9b5cdce77bee
Bridge "ovsbr0"
Port "ovsbr0"
Interface "ovsbr0"
type: internal
Port "vhost-user1"
Interface "vhost-user1"
type: dpdkvhostuser
Port "dpdk0"
Interface "dpdk0"
type: dpdk
Bridge "ovsbr1"
Port "vhost-user2"
Interface "vhost-user2"
type: dpdkvhostuser
Port "dpdk1"
Interface "dpdk1"
type: dpdk
Port "ovsbr1"
Interface "ovsbr1"
type: internal
# ovs-appctl dpif-netdev/pmd-stats-show
main thread:
emc hits:0
megaflow hits:0
miss:0
lost:0
polling cycles:36966792 (100.00%)
processing cycles:0 (0.00%)
pmd thread numa_id 1 core_id 3:
emc hits:0
megaflow hits:0
miss:0
lost:0
polling cycles:1459645150576 (100.00%)
processing cycles:0 (0.00%)
pmd thread numa_id 1 core_id 13:
emc hits:136409971
megaflow hits:0
miss:1
lost:0
polling cycles:1726800368791 (96.01%)
processing cycles:71854002242 (3.99%)
avg cycles per packet: 13185.65 (1798654371033/136409972)
avg processing cycles per packet: 526.75 (71854002242/136409972)
pmd thread numa_id 1 core_id 11:
emc hits:0
megaflow hits:0
miss:136961497
lost:0
polling cycles:764825337869 (26.87%)
processing cycles:2081526883669 (73.13%)
avg cycles per packet: 20782.13 (2846352221538/136961497)
avg processing cycles per packet: 15197.90 (2081526883669/136961497)
pmd thread numa_id 1 core_id 7:
emc hits:0
megaflow hits:0
miss:0
lost:0
polling cycles:1526268846359 (100.00%)
processing cycles:0 (0.00%)
pmd thread numa_id 1 core_id 1:
emc hits:0
megaflow hits:0
miss:0
lost:0
polling cycles:1435175002203 (100.00%)
processing cycles:0 (0.00%)
pmd thread numa_id 1 core_id 5:
emc hits:0
megaflow hits:0
miss:0
lost:0
polling cycles:1651619686286 (100.00%)
processing cycles:0 (0.00%)
pmd thread numa_id 1 core_id 9:
emc hits:136423434
megaflow hits:510
miss:1
lost:0
polling cycles:1595946437782 (92.07%)
processing cycles:137473699762 (7.93%)
avg cycles per packet: 12706.13 (1733420137544/136423945)
avg processing cycles per packet: 1007.69 (137473699762/136423945)
pmd thread numa_id 1 core_id 15:
emc hits:0
megaflow hits:0
miss:136875679
lost:0
polling cycles:724736574935 (26.24%)
processing cycles:2036738640385 (73.76%)
avg cycles per packet: 20175.06 (2761475215320/136875679)
avg processing cycles per packet: 14880.21 (2036738640385/136875679)
(In reply to Flavio Leitner from comment #2) > Hi, > > The queue selection happens after the device selection. So, if the DP is > sending packets to a kernel device that would slow down regardless of the > number of queues. > > The number of queues doesn't affect the flow processing, so most probably (I > am guessing here) the action NORMAL is sending packets to a kernel device > during the MQ test. > > Could you please confirm? > Thanks, > fbl In my tests I have 0 kernel interfaces. Output of ovs-vsctl is in the previous comment. MOre stats below. I believe the drops for ovsbr0 is when pmd thread is waiting for virtio ring free space and cannot pull dpdk0 rx packets. Once the max-no-drop rate is reached (0.19Mpps), those rx drops should stop.
# ovs-ofctl dump-flows ovsbr0
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=1805.065s, table=0, n_packets=273385442, n_bytes=16403126520, idle_age=447, in_port=1 actions=output:2
cookie=0x0, duration=1805.061s, table=0, n_packets=0, n_bytes=0, idle_age=1805, in_port=2 actions=output:1
# ovs-ofctl dump-flows ovsbr1
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=1808.175s, table=0, n_packets=0, n_bytes=0, idle_age=1808, in_port=1 actions=output:2
cookie=0x0, duration=1808.171s, table=0, n_packets=273285651, n_bytes=16397139060, idle_age=450, in_port=2 actions=output:1
# ovs-ofctl dump-ports ovsbr0
OFPST_PORT reply (xid=0x2): 3 ports
port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
tx pkts=0, bytes=0, drop=0, errs=0, coll=0
port 1: rx pkts=273385442, bytes=16403126520, drop=2364216847, errs=0, frame=?, over=?, crc=?
tx pkts=0, bytes=0, drop=0, errs=0, coll=?
port 2: rx pkts=0, bytes=0, drop=?, errs=0, frame=?, over=?, crc=?
tx pkts=273385442, bytes=16403126520, drop=0, errs=?, coll=?
# ovs-ofctl dump-ports ovsbr1
OFPST_PORT reply (xid=0x2): 3 ports
port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
tx pkts=0, bytes=0, drop=0, errs=0, coll=0
port 1: rx pkts=0, bytes=0, drop=0, errs=0, frame=?, over=?, crc=?
tx pkts=273285644, bytes=16397138640, drop=0, errs=0, coll=?
port 2: rx pkts=273285651, bytes=16397139060, drop=?, errs=0, frame=?, over=?, crc=?
tx pkts=0, bytes=0, drop=0, errs=?, coll=?
Another test, still using openvswitch-2.5.0-5.git20160628.el7fdb RPM, but now single-queue. The only changes are: -VM has "queues=1" in the XML. -The command, "ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=2" is -not- used. -The PMD mask command used is, "ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=AA", where as 2-queue test used "AAAA" mask. This single-queue test gets 4.8Mpps Created attachment 1189421 [details]
perf callgraph
Created attachment 1189422 [details]
perf callgraph
Created attachment 1189423 [details]
perf callgraph
Created attachment 1189424 [details]
perf callgraph
I have re-run 2-queue with 8 PMD threads to gather some perf call-graph data. The PMD threads have the following assignments: # ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 1 core_id 1: port: vhost-user1 queue-id: 0 pmd thread numa_id 1 core_id 3: port: vhost-user1 queue-id: 1 pmd thread numa_id 1 core_id 9: port: dpdk1 queue-id: 0 pmd thread numa_id 1 core_id 13: port: vhost-user2 queue-id: 0 pmd thread numa_id 1 core_id 7: port: dpdk0 queue-id: 1 pmd thread numa_id 1 core_id 11: port: dpdk1 queue-id: 1 pmd thread numa_id 1 core_id 15: port: vhost-user2 queue-id: 1 pmd thread numa_id 1 core_id 5: port: dpdk0 queue-id: 0 So, PMD threads on cpus 5 and 7 are polling on dpdk0, where the packet come in. PMD threads on cpus 13 and 15 are polling on the 2nd virtio interface, where testpmd in the VM is transmitting packets. There is a lot of futex and rescheduling interrupt work going on, which seems out of place for PMD threads. /proc/sched_debug also shows ridiculous number of task switches for the PMD threads. BTW, I can reproduce this problem with no KVM or vhostuser at all. All that is needed is to use multiple queues for the physical DPDK devices. I get about the same thoughput as the vhostuser tests, around 0.3 Mpps. Hi, I had an opportunity to look at Andrew's test env and indeed there is something wrong there. It seems that the PMD moved from 100% busy on %usr to almost 100% busy on %sys indicating the usage of syscalls. One possible reason is logging and looking at ovs-vswitch.log we can see tons of error messages from the PMD regarding to problems UFID. Most probably that would explain the low traffic rate and the higher %sys usage. In order to debug this further I built the same setup on my testlab (two DPDK NICs on a ovsbr), but fortunately or unfortunately it works. My next step is to use Andrew's script to see if I can reproduce otherwise consider some patches to get additional information. Flavio, and progress with this? I believe there was some concern with "upcall"? Would it help it I tested with limited number of flows? Or is there a trace/logging facility we can use to understand the code path of the PMD? Hi Andrew,
The issue seems to be related to ukeys, so the PMD thread is flooding with error messages. The question is why is that happening.
---8<---
016-08-11T14:50:10.898Z|00010|ofproto_dpif_upcall(pmd53)|WARN|Dropped 27250866 log messages in last 60 seconds (most recently, 0 seconds ago) due to excessive rate
2016-08-11T14:50:10.898Z|00011|ofproto_dpif_upcall(pmd53)|WARN|upcall_cb failure: ukey installation fails
---8<---
That message comes from upcall_cb() and looks like ukey_install_start() is failing. The only reason for that is if there is a conflicting ukey already installed:
...
if (upcall.ukey && !ukey_install(udpif, upcall.ukey)) {
VLOG_WARN_RL(&rl, "upcall_cb failure: ukey installation fails");
error = ENOSPC;
}
...
static bool
ukey_install(struct udpif *udpif, struct udpif_key *ukey)
{
return ukey_install_start(udpif, ukey) && ukey_install_finish(ukey, 0);
}
The ukey_install_finish() always return true in this case, so the only reason left is regarding to the function below:
1597 ukey_install_start(struct udpif *udpif, struct udpif_key *new_ukey)
1599 {
1603 bool locked = false;
1608 old_ukey = ukey_lookup(udpif, &new_ukey->ufid);
1609 if (old_ukey) {
...
1628 } else {
1631 locked = true;
1632 }
1635 return locked;
Basically there is a key already installed for the flow though it is coming from a flow miss, oops!
You can confirm the issue looking at 'ovs-appctl coverage/show' and checking if the counter 'handler_duplicate_upcall' is increasing.
This commit on branch master looks interesting:
commit 5f2ccb1c0d3b01b775ba03c89a486cae7b720275
Author: Ilya Maximets <i.maximets>
Date: Wed Feb 3 14:31:43 2016 +0300
dpif: Allow adding ukeys for same flow by different pmds.
Could you please cherry-pick that patch to your branch-2.5 and see if it fixes the issue? It applies cleaned on top of branch-2.5.
For instance:
[fleitner@plex ovs]$ git checkout branch-2.5
Switched to branch 'branch-2.5'
Your branch is up-to-date with 'origin/branch-2.5'.
[fleitner@plex ovs]$ git cherry-pick 5f2ccb1c0d3b01b775ba03c89a486cae7b720275
[branch-2.5 f4eadd7] dpif: Allow adding ukeys for same flow by different pmds.
Author: Ilya Maximets <i.maximets>
Date: Wed Feb 3 14:31:43 2016 +0300
1 file changed, 13 insertions(+), 10 deletions(-)
[fleitner@plex ovs]$
Thanks,
fbl
Excellent find! Yes, I will retest with that patch. I am traveling the next 4 days, but I should be able to test this somewhere in that time. I should have a result either 8/26 or 8/29. Flavio, I applied this patch and ran 2-queue for dpdk devices (10Gb), 64-byte packet rate went from 0.59 to 14.8 Mpps. Wow that is huge Andrew, Can you please apply that patch one more time and maybe we will reach 29.6Mpps ? :-) :-) Flavio is that patch already part of our Y-stream and FDP-B ? To clarify this was not a vhostuser test, just dpdk (ixgbe pmd) > Flavio is that patch already part of our Y-stream and FDP-B ?
No, it needs to be backported and packaged.
Fixed in openvswitch-2.5.0-10.git20160727.el7fdb Verified in QA using Andrews systems.
Using 2.5.0-09 Test reports Mpps at .643
[PARAMETERS] startRate: 14.636205 nrFlows: 1024 frameSize: 64 runBidirec: false searchRunTime: 30 validationRunTime: 30 acceptableLossPct: 1.000000 ports: 1,2
[REPORT]Device 0->1: Tx frames: 19290222 Rx Frames: 19290215 frame loss: 7, 0.000036% Rx Mpps: 0.642987
[REPORT] total: Tx frames: 19290222 Rx Frames: 19290215 frame loss: 7, 0.000036% Tx Mpps: 0.642987 Rx Mpps: 0.642987
[root@perf102 MoonGen]#
[root@perf104 ~]# ovs-ofctl dump-flows ovsbr0
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=668.996s, table=0, n_packets=286868825, n_bytes=17212129500, idle_age=150, in_port=1 actions=output:2
cookie=0x0, duration=668.994s, table=0, n_packets=0, n_bytes=0, idle_age=668, in_port=2 actions=output:1
[root@perf104 ~]# ovs-ofctl dump-ports ovsbr0
OFPST_PORT reply (xid=0x2): 3 ports
port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
tx pkts=0, bytes=0, drop=0, errs=0, coll=0
port 1: rx pkts=286868826, bytes=17212129560, drop=1103580108, errs=0, frame=?, over=?, crc=?
tx pkts=0, bytes=0, drop=0, errs=0, coll=?
port 2: rx pkts=0, bytes=0, drop=0, errs=0, frame=?, over=?, crc=?
tx pkts=286868817, bytes=17212129020, drop=0, errs=0, coll=?
Using 2.5.0-10 the results improve as Mpps is at 14.714
[PARAMETERS] startRate: 14.527923 nrFlows: 1024 frameSize: 64 runBidirec: false searchRunTime: 30 validationRunTime: 30 acceptableLossPct: 1.000000 ports: 1,2
[REPORT]Device 0->1: Tx frames: 441403074 Rx Frames: 441403018 frame loss: 56, 0.000013% Rx Mpps: 14.713635
[REPORT] total: Tx frames: 441403074 Rx Frames: 441403018 frame loss: 56, 0.000013% Tx Mpps: 14.713637 Rx Mpps: 14.713635
[root@perf104 ~]# ovs-ofctl dump-flows ovsbr0
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=232.058s, table=0, n_packets=1243456262, n_bytes=74607375720, idle_age=105, in_port=1 actions=output:2
cookie=0x0, duration=232.057s, table=0, n_packets=0, n_bytes=0, idle_age=232, in_port=2 actions=output:1
[root@perf104 ~]# ovs-ofctl dump-ports ovsbr0
OFPST_PORT reply (xid=0x2): 3 ports
port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
tx pkts=0, bytes=0, drop=0, errs=0, coll=0
port 1: rx pkts=1243456263, bytes=74607375780, drop=0, errs=0, frame=?, over=?, crc=?
tx pkts=0, bytes=0, drop=0, errs=0, coll=?
port 2: rx pkts=0, bytes=0, drop=0, errs=0, frame=?, over=?, crc=?
tx pkts=1243456171, bytes=74607370260, drop=0, errs=0, coll=?
And sanity test in http://pkgs.devel.redhat.com/cgit/tests/kernel/tree/networking/openvswitch/topo with beaker job https://beaker.engineering.redhat.com/jobs/1477037 test in http://pkgs.devel.redhat.com/cgit/tests/kernel/tree/networking/openvswitch/of_rules with beaker job https://beaker.engineering.redhat.com/jobs/1478664 Perf104 info for future reference if needed. This was the machine used to verify the bug.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
Stepping: 2
CPU MHz: 2601.000
BogoMIPS: 5204.11
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47
0000:86:00.0 'Ethernet Converged Network Adapter X520-Q1' if=p1p3 drv=ixgbe unused=vfio-pci
0000:86:00.1 'Ethernet Converged Network Adapter X520-Q1' if=p1p4 drv=ixgbe unused=vfio-pci
ovs-vsctl show
Bridge "ovsbr0"
Port "dpdk0"
Interface "dpdk0"
type: dpdk
Port "ovsbr0"
Interface "ovsbr0"
type: internal
Port "dpdk1"
Interface "dpdk1"
type: dpdk
ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=2
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=AA
Performance testing results for 2.5.0-10 https://docs.google.com/spreadsheets/d/1Oi_bhCiXAfKEh2ZJWnuFhp7p77k-habds6jDmf7iNFg/edit#gid=454534690 Sorry did not post the viewable link https://docs.google.com/a/redhat.com/spreadsheets/d/1Oi_bhCiXAfKEh2ZJWnuFhp7p77k-habds6jDmf7iNFg/edit?usp=sharing Beaker regression jobs for vanilla OVS: Comprehensive functional regression: https://beaker.engineering.redhat.com/jobs/1478254 Basic RFC2544: https://beaker.engineering.redhat.com/jobs/1478255 Basic performance check (ixgbe): https://beaker.engineering.redhat.com/jobs/1481771 Basic performance check (bnx2x): https://beaker.engineering.redhat.com/jobs/1482081 All tests passed except for one scenario related to bnx2x driver, UDP Stream over IPv6 and only in one specific VM to VM topology. Details in this log: http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2016/08/14820/1482081/3028253/45144889/bnx2x_mtu9000_ovs_test_result_1472694011.log I will attempt to reproduce manually but not a major/blocking issue in any case. |
Description of problem: Massive packet drop to the kernel with multi-queue configuration over openvswitch 2.5(DPDK) Version-Release number of selected component (if applicable): openvswitch-2.5.0-5.git20160628.el7fdb.x86_64 How reproducible: Always Steps to Reproduce: 1.Setup environment with OVS-2.5 2.configure multi-queue ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=2 3.Generate packets Actual results: Packets dropped to the kernel, getting around 0.30Mpps [root@panther03 ~]# ovs-ofctl dump-ports br-int OFPST_PORT reply (xid=0x2): 3 ports port 1: rx pkts=12497271, bytes=799825344, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=0, bytes=0, drop=0, errs=0, coll=0 port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=12494007, bytes=799716096, drop=12497271, errs=0, coll=0 port 2: rx pkts=0, bytes=0, drop=?, errs=0, frame=?, over=?, crc=? tx pkts=0, bytes=0, drop=12497271, errs=?, coll=? Expected results: Suppose to get ~8Mpps while using 2 queues Additional info: Tracking bug