Bug 1360431
| Summary: | OVS Remove PMD latency on seq_mutex | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Flavio Leitner <fleitner> |
| Component: | openvswitch | Assignee: | Flavio Leitner <fleitner> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Jean-Tsung Hsiao <jhsiao> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.3 | CC: | atheurer, atragler, fleitner, jhsiao, jniu, xzhou |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | openvswitch-2.5.0-7.git20160727.el7fdb | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-12 17:50:57 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1342229 | ||
*** Bug 1299635 has been marked as a duplicate of this bug. *** Hi Jean, On a OVS-DPDK Phy-to-Phy envinromnent, measure the packet latency. Before the patch you will see significant spikes in latency and no spikes after the patch. https://github.com/fleitner/XenaPythonLib/blob/master/examples/latency.py (In reply to Flavio Leitner from comment #7) > Hi Jean, > > On a OVS-DPDK Phy-to-Phy envinromnent, measure the packet latency. > Before the patch you will see significant spikes in latency and no spikes > after the patch. > > https://github.com/fleitner/XenaPythonLib/blob/master/examples/latency.py Hi Flavio, Please see the numbers attached below. There is not much difference between ovs-2.5.0-7 and ovs-2.5.0-5. Each has 500+ us Max latancy, but average latency about the same. Please advise. Thanks! Jean == ovs-2.5.0-5 == Test 1 Mon Sep 12 00:15:40 EDT 2016 INFO:root:Packet: Using sample packet INFO:root:XenaSocket: Connected INFO:root:XenaManager: Logged succefully INFO:root:XenaPort: 1/0 starting traffic INFO:root:XenaPort: 1/0 stopping traffic Average latency: 3599.00 ns Max latency: 530192.00 ns Min latency: 0.00 ns Test 2 Mon Sep 12 00:21:11 EDT 2016 INFO:root:Packet: Using sample packet INFO:root:XenaSocket: Connected INFO:root:XenaManager: Logged succefully INFO:root:XenaPort: 1/0 starting traffic INFO:root:XenaPort: 1/0 stopping traffic Average latency: 3593.00 ns Max latency: 21485.00 ns Min latency: 0.00 ns Test 3 Mon Sep 12 00:26:44 EDT 2016 INFO:root:Packet: Using sample packet INFO:root:XenaSocket: Connected INFO:root:XenaManager: Logged succefully INFO:root:XenaPort: 1/0 starting traffic INFO:root:XenaPort: 1/0 stopping traffic Average latency: 3591.00 ns Max latency: 21926.00 ns Min latency: 0.00 ns == ovs-2.5.0-7 == Test 1 Mon Sep 12 00:40:28 EDT 2016 INFO:root:Packet: Using sample packet INFO:root:XenaSocket: Connected INFO:root:XenaManager: Logged succefully INFO:root:XenaPort: 1/0 starting traffic INFO:root:XenaPort: 1/0 stopping traffic Average latency: 3604.00 ns Max latency: 566185.00 ns Min latency: 0.00 ns Test 2 Mon Sep 12 00:46:00 EDT 2016 INFO:root:Packet: Using sample packet INFO:root:XenaSocket: Connected INFO:root:XenaManager: Logged succefully INFO:root:XenaPort: 1/0 starting traffic INFO:root:XenaPort: 1/0 stopping traffic Average latency: 3607.00 ns Max latency: 18767.00 ns Min latency: 0.00 ns Test 3 Mon Sep 12 00:51:32 EDT 2016 INFO:root:Packet: Using sample packet INFO:root:XenaSocket: Connected INFO:root:XenaManager: Logged succefully INFO:root:XenaPort: 1/0 starting traffic INFO:root:XenaPort: 1/0 stopping traffic Average latency: 3604.00 ns Max latency: 15183.00 ns Min latency: 0.00 ns [root@netqe-infra01 XenaScripts.old]# [root@netqe5 dpdk-multique-scripts]# cat ovs_config_dpdk0_dpdk1.sh ovs-vsctl --if-exists del-br ovsbr0 ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=4 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xaa0000 ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev ovs-vsctl add-port ovsbr0 dpdk0 \ -- set interface dpdk0 type=dpdk ofport_request=10 ovs-vsctl add-port ovsbr0 dpdk1 \ -- set interface dpdk1 type=dpdk ofport_request=11 ovs-ofctl del-flows ovsbr0 ovs-ofctl add-flow ovsbr0 in_port=10,actions=output:11 ovs-ofctl add-flow ovsbr0 in_port=11,actions=output:10 ovs-ofctl dump-flows ovsbr0 echo 0 > /proc/sys/kernel/watchdog [root@netqe5 jhsiao]# cat set_non_pmd_cpu_range.sh NON_PMD_CPU_RANGE="13,15" OVS_PID="$(pidof ovs-vswitchd)" for pid in $(ps -e -T | grep ${OVS_PID} | grep -v 'pmd' | awk '{ print $2 }') do taskset -p -c ${NON_PMD_CPU_RANGE} ${pid} done tuna -t ovs-vswitchd -CP *** Verification Procedure ***
As the latency can be impacted by many factors, Flavio has suggested the following verification procedure:
The idea is to set up and get ready, then attach perf recording the
number of scheduling events:
1. Get everything ready
2. Push traffic, wait a couple seconds to stabilizes
3. attach perf
# perf record -e sched:sched_switch -g -C <PMD CPU> sleep 10
4. perf will exit in 10 seconds
5. Review the report:
# perf report -g --no-children --stdio
You have reproduced if you see this backtrace:
50.00% pmd36 [kernel.kallsyms] [k] __schedule
|
---__schedule
schedule
futex_wait_queue_me
futex_wait
do_futex
sys_futex
tracesys
__lll_lock_wait
*** Verification ***
Ran "python latency.py" with ovs-2.5.0-5 saw the mutex trace for all pmd cores.
But, with ovs-2.5.0-7 there were no such trace for all pmd cores.
Thus, the fix has been verified.
|
Description of problem: Current openvswitch-2.5.0-5.git20160628.el7fdb causes latency spikes on seq_mutex. The following upstream patch fixes the issue: commit 2e6a1eae96615cab458757d62c95ce9993df7202 Author: Flavio Leitner <fbl> Date: Tue Jul 5 10:33:38 2016 -0300 dpif-netdev: Remove PMD latency on seq_mutex The PMD thread needs to keep processing RX queues in order to achieve maximum throughput. It also needs to sweep emc cache and quiesce which use seq_mutex. That mutex can eventually block the PMD thread causing latency spikes and affecting the throughput. Since there is no requirement for running those tasks at a specific time, this patch extend seq API to allow tentative locking instead. Reported-by: Karl Rister <krister> Co-authored-by: Karl Rister <krister> Signed-off-by: Flavio Leitner <fbl> Signed-off-by: Daniele Di Proietto <diproiettod>