Bug 1344787
Summary: | [FdBeta] latency regression | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Karl Rister <krister> | ||||
Component: | openvswitch-dpdk | Assignee: | Flavio Leitner <fleitner> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | ovs-qe | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.4 | CC: | aloughla, atheurer, atragler, fleitner, jean-mickael.guerin, krister, kzhang, vincent.jardin | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | openvswitch-2.5.0-21.git20160727.el7fdb | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1397481 (view as bug list) | Environment: | |||||
Last Closed: | 2017-01-12 15:51:35 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1397481 | ||||||
Attachments: |
|
Description
Karl Rister
2016-06-10 19:44:56 UTC
It looks like the problematic code that I previously identified has been removed in the following commit: commit b59cc14e032da370021794bfd1bdd3d67e88a9a3 Author: Ilya Maximets <i.maximets> Date: Mon Jun 27 16:28:16 2016 +0300 netdev-dpdk: Use instant sending instead of queueing of packets. Current implementarion of TX packet's queueing is broken in several ways: * TX queue flushing implemented on receive assumes that all core_id-s are sequential and starts from zero. This may lead to situation when packets will stuck in queue forever and, also, this influences on latency. * For a long time flushing logic depends on uninitialized 'txq_needs_locking', because it usually calculated after 'netdev_dpdk_alloc_txq' but used inside of this function for initialization of 'flush_tx'. Testing shows no performance difference with and without queueing. Lets remove queueing at all because it doesn't work properly now and also does not increase performance. Signed-off-by: Ilya Maximets <i.maximets> Acked-by: Daniele Di Proietto <diproiettod> I don't see the particular latency issue that I had previously identified on upstream but there is something wrong as bidirectional latency is quite uneven currently: Current upstream: Device 0 -> 1 6.567602 Mpps 17.8454 usec average latency Device 1 -> 0 6.567602 Mpps 203.0460 usec average latency openvswitch-dpdk-2.4.0-0.10346.git97bab959.1.el7: Device 0 -> 1 6.388642 Mpps 19.2421 usec average latency Device 1 -> 0 6.388642 Mpps 18.8085 usec average latency I am seeing the same problem in openvswitch-2.5.0-10.git20160727.el7fdb.x86_64.rpm. We are running tests with 0.002% and 0% loss requirements and with that RPM I can only get 0.5506 Mpps each direction for 0.002% while passing and I can't even pass a single run for 0% with a very short test (60 second search, 120 second validation). If I apply my hacked fix I instantly get much better behavior with 0.002% loss getting 6.7937 Mpps each way and 0% loss getting 4.9410 Mpps each direction. This is my hacked fix: --- lib/netdev-dpdk.c.orig 2016-07-19 11:59:58.644824282 -0400 +++ lib/netdev-dpdk.c 2016-07-19 12:01:27.031887289 -0400 @@ -571,16 +571,7 @@ for (i = 0; i < n_txqs; i++) { int numa_id = ovs_numa_get_numa_id(i); - if (!netdev->txq_needs_locking) { - /* Each index is considered as a cpu core id, since there should - * be one tx queue for each cpu core. If the corresponding core - * is not on the same numa node as 'netdev', flags the - * 'flush_tx'. */ - netdev->tx_q[i].flush_tx = netdev->socket_id == numa_id; - } else { - /* Queues are shared among CPUs. Always flush */ - netdev->tx_q[i].flush_tx = true; - } + netdev->tx_q[i].flush_tx = true; /* Initialize map for vhost devices. */ netdev->tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN; Hi, Karl. Would you be able to provide me a system where I can reproduce and experiment with this? Thanks. Cascardo. Hi Cascardo I can give you some time on my test environment but I only have one such environment so it can't be open ended. I'll drop you can email that you can reply to when you are ready to use it. Karl Hi, Karl. Does the latest version on branch 2.6 solve the issue for you? I noticed the following commit. Hope that helps. Cascardo. commit b59cc14e032da370021794bfd1bdd3d67e88a9a3 Author: Ilya Maximets <i.maximets> Date: Mon Jun 27 16:28:16 2016 +0300 netdev-dpdk: Use instant sending instead of queueing of packets. Current implementarion of TX packet's queueing is broken in several ways: * TX queue flushing implemented on receive assumes that all core_id-s are sequential and starts from zero. This may lead to situation when packets will stuck in queue forever and, also, this influences on latency. * For a long time flushing logic depends on uninitialized 'txq_needs_locking', because it usually calculated after 'netdev_dpdk_alloc_txq' but used inside of this function for initialization of 'flush_tx'. Testing shows no performance difference with and without queueing. Lets remove queueing at all because it doesn't work properly now and also does not increase performance. Signed-off-by: Ilya Maximets <i.maximets> Acked-by: Daniele Di Proietto <diproiettod> Hi Cascardo, I identified that commit back in comment 2 as being the upstream change that fixed this particular problem. My recollection is that the change was quite a bit more pervasive than my simple hack. I haven't been able to test upstream lately since I'm backlogged on RH build testing and other runs. Karl Sorry I missed that. For some reason, I thought that commit referred to something else. But you also mentioned uneven bidirectional latency. Was that with branch 2.6 with that commit applied? If so, we have two problems here: 1) The latency problem on NUMA systems on branch 2.5. 2) Uneven bidirectional latency on branch 2.6. Is that correct? Cascardo. Yes, although I have only seen the uneven bidirectional latency that one time. Before looking into that I think we would need to reproduce it on current 2.6 branch/master (and there are a lot of higher priority tasks piling up at the moment). The NUMA latency issue has been replicated on multiple source trees/branches including the RH built fast data path RPMs. I think that this is the most pressing issue that I have seen at the moment. Karl Created attachment 1203436 [details]
netdev-dpdk: Use instant sending instead of queueing
Hi, Karl.
This is a backport of that commit to 2.5. Can you try it? If it passes your tests, we can try to push this backport upstream, as it is closer to what is in master. Otherwise, we can try to argue that your patch is worth applying.
Thanks.
Cascardo.
Hi Cascardo Sorry for the delay. I had some long running tests that needed to complete and then had some issues with the test bed that had to be resolved before I could test your patch. Your patch looks good from my testing, it is actually a little better than mine with the latency being lower. There is slight degradation in the the achieved throughput but that can be a noisy statistic for this workload (a binary search to find the highest throughput with <= 0.002% packet loss over a 5 minute run followed by a 15 minute validation). ##################################################### Baseline (openvswitch-2.5.0-10.git20160727.el7fdb): Device 0->1: Throughput: 6.8683 Mpps Min. Latency: 39.9780 usec Avg. Latency: 61.1226 usec Max. Latency: 89.1110 usec Device 1->0: Throughput: 6.8683 Mpps Min. Latency: 41.0660 usec Avg. Latency: 58.7778 usec Max. Latency: 89.4720 usec ##################################################### My hack fix: Device 0->1: Throughput: 6.5106 Mpps Min. Latency: 11.0850 usec Avg. Latency: 15.1282 usec Max. Latency: 31.4210 usec Device 1->0: Throughput: 6.5106 Mpps Min. Latency: 13.2320 usec Avg. Latency: 22.9893 usec Max. Latency: 37.9040 usec ##################################################### Your fix: Device 0->1: Throughput: 6.3941 Mpps Min. Latency: 10.5410 usec Avg. Latency: 14.1309 usec Max. Latency: 28.9880 usec Device 1->0: Throughput: 6.3941 Mpps Min. Latency: 11.9780 usec Avg. Latency: 18.0692 usec Max. Latency: 29.5200 Thanks, Karl, for your testing. Sent backport upstream, waiting for feedback or to be applied to branch 2.5. http://openvswitch.org/pipermail/dev/2016-October/080954.html Cascardo. Flavio, just a reminder that Daniele should be pinged about applying this upstream. Cascardo. Patch is commited upstream: https://github.com/openvswitch/ovs/commit/a1b2fc05e6a260c6580fa7a40d02cd7dbf2989ea |