Bug 1616001
Summary: | [RFE] Add option to OVS 2.9 for legacy RXQ assignment to cores | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Andreas Karis <akaris> | ||||||
Component: | openvswitch | Assignee: | Kevin Traynor <ktraynor> | ||||||
Status: | CLOSED ERRATA | QA Contact: | liting <tli> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 7.5 | CC: | akaris, atelang, atragler, cfields, ctrautma, echaudro, fbaudin, gmuthukr, ktraynor, lmanasko, marjones, ovs-qe, psahoo, qding, rkhan, tredaelli | ||||||
Target Milestone: | rc | Keywords: | FutureFeature | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | openvswitch-2.9.0-56.el7fdp.1 | Doc Type: | Enhancement | ||||||
Doc Text: |
With this update, the pmd-rxq-assign configuration has been added to Poll Mode Drivers (PMDs) cores. This allows users to select a round-robin assignment.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1631797 (view as bug list) | Environment: | |||||||
Last Closed: | 2018-11-05 14:59:03 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1628669 | ||||||||
Attachments: |
|
Description
Andreas Karis
2018-08-14 17:13:42 UTC
Created attachment 1475918 [details]
SRPM with patch 1
Created attachment 1475919 [details]
SRPM with patch 2
Test results without the test build (OVS 2.9 standard behavior): First VM: Link State : <UP-10000-FD> ----TotalRate---- Pkts/s Max/Rx : 12608444/12582488 12608444/12582488 Max/Tx : 6402208/6398400 6402208/6398400 MBits/s Rx/Tx : 8455/4299 8455/4299 Second VM: Link State : <UP-10000-FD> ----TotalRate---- Pkts/s Max/Rx : 6402635/6401132 6402635/6401132 Max/Tx : 14308416/14305600 14308416/14305600 MBits/s Rx/Tx : 4301/9613 4301/9613 Test results with the test build (OVS 2.9 with OVS 2.6.1 behavior) openvswitch-2.9.0-19.el7.1.02143196.x86_64.tar.gz: First VM: Flags:Port : P-----R--------:0 Link State : <UP-10000-FD> ----TotalRate---- Pkts/s Max/Rx : 12199750/12189416 12199750/12189416 Max/Tx : 14315520/12945376 14315520/12945376 MBits/s Rx/Tx : 8191/8699 8191/8699 Second VM: Flags:Port : P-----R--------:0 Link State : <UP-10000-FD> ----TotalRate---- Pkts/s Max/Rx : 14203632/12227512 14203632/12227512 Max/Tx : 12965600/12958912 12965600/12958912 Notice the increased RX performance. Created an RPM build with: rpmbuild -ba ../SPECS/openvswitch.spec --without check 2>&1 | tee /root/buildlog.txt And updated the spec file to include the patch. And patch: ~~~ [root@undercloud-r430 ~]# cat rpmbuild/SOURCES/0012-revert-rxq-ordering.patch --- openvswitch-2.9.0.orig/lib/dpif-netdev.c 2018-02-19 17:49:44.867277147 -0500 +++ openvswitch-2.9.0.modif/lib/dpif-netdev.c 2018-08-02 16:44:57.916173757 -0400 @@ -3453,25 +3453,35 @@ /* Returns the next pmd from the numa node in * incrementing or decrementing order. */ static struct dp_netdev_pmd_thread * -rr_numa_get_pmd(struct rr_numa *numa) +rr_numa_get_pmd(struct rr_numa *numa,bool always_increment) { int numa_idx = numa->cur_index; - if (numa->idx_inc == true) { + if (always_increment) { /* Incrementing through list of pmds. */ if (numa->cur_index == numa->n_pmds-1) { /* Reached the last pmd. */ - numa->idx_inc = false; + numa->cur_index = 0; } else { numa->cur_index++; } } else { - /* Decrementing through list of pmds. */ - if (numa->cur_index == 0) { - /* Reached the first pmd. */ - numa->idx_inc = true; + if (numa->idx_inc == true) { + /* Incrementing through list of pmds. */ + if (numa->cur_index == numa->n_pmds-1) { + /* Reached the last pmd. */ + numa->idx_inc = false; + } else { + numa->cur_index++; + } } else { - numa->cur_index--; + /* Decrementing through list of pmds. */ + if (numa->cur_index == 0) { + /* Reached the first pmd. */ + numa->idx_inc = true; + } else { + numa->cur_index--; + } } } return numa->pmds[numa_idx]; @@ -3581,11 +3591,11 @@ } } - if (n_rxqs > 1) { + //if (n_rxqs > 1) { /* Sort the queues in order of the processing cycles * they consumed during their last pmd interval. */ - qsort(rxqs, n_rxqs, sizeof *rxqs, compare_rxq_cycles); - } + // qsort(rxqs, n_rxqs, sizeof *rxqs, compare_rxq_cycles); + //} rr_numa_list_populate(dp, &rr); /* Assign the sorted queues to pmds in round robin. */ @@ -3605,7 +3615,7 @@ netdev_rxq_get_queue_id(rxqs[i]->rx)); continue; } - rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa); + rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa,true); VLOG_WARN("There's no available (non-isolated) pmd thread " "on numa node %d. Queue %d on port \'%s\' will " "be assigned to the pmd on core %d " @@ -3614,7 +3624,7 @@ netdev_rxq_get_name(rxqs[i]->rx), rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id); } else { - rxqs[i]->pmd = rr_numa_get_pmd(numa); + rxqs[i]->pmd = rr_numa_get_pmd(numa,true); VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " "rx queue %d (measured processing cycles %"PRIu64").", rxqs[i]->pmd->core_id, numa_id, ~~~ In my lab with standard build: ~~~ [root@overcloud-compute-0 ~]# grep 'on numa node 0 assigned port' /var/log/openvswitch/ovs-vswitchd.log | tail -n 9 2018-08-02T22:26:44.776Z|00299|dpif_netdev|INFO|Core 2 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 1 (measured processing cycles 518187). 2018-08-02T22:26:44.776Z|00300|dpif_netdev|INFO|Core 8 on numa node 0 assigned port 'dpdk0' rx queue 0 (measured processing cycles 223997). 2018-08-02T22:26:44.776Z|00301|dpif_netdev|INFO|Core 6 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 0 (measured processing cycles 0). 2018-08-02T22:26:44.776Z|00302|dpif_netdev|INFO|Core 6 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 2 (measured processing cycles 0). 2018-08-02T22:26:44.776Z|00303|dpif_netdev|INFO|Core 8 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 3 (measured processing cycles 0). 2018-08-02T22:26:44.776Z|00304|dpif_netdev|INFO|Core 2 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 4 (measured processing cycles 0). 2018-08-02T22:26:44.776Z|00305|dpif_netdev|INFO|Core 2 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 5 (measured processing cycles 0). 2018-08-02T22:26:44.776Z|00306|dpif_netdev|INFO|Core 8 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 6 (measured processing cycles 0). 2018-08-02T22:26:44.776Z|00307|dpif_netdev|INFO|Core 6 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 7 (measured processing cycles 0). ~~~ In my lab with test build openvswitch-2.9.0-19.el7.1.02143196.x86_64: ~~~ [root@overcloud-compute-0 ~]# grep 'on numa node 0 assigned port' /var/log/openvswitch/ovs-vswitchd.log | tail -n 9 2018-08-02T22:30:57.750Z|00091|dpif_netdev|INFO|Core 6 on numa node 0 assigned port 'dpdk0' rx queue 0 (measured processing cycles 0). 2018-08-02T22:30:57.750Z|00092|dpif_netdev|INFO|Core 8 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 0 (measured processing cycles 0). 2018-08-02T22:30:57.750Z|00093|dpif_netdev|INFO|Core 2 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 1 (measured processing cycles 0). 2018-08-02T22:30:57.750Z|00094|dpif_netdev|INFO|Core 6 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 2 (measured processing cycles 0). 2018-08-02T22:30:57.750Z|00095|dpif_netdev|INFO|Core 8 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 3 (measured processing cycles 0). 2018-08-02T22:30:57.750Z|00096|dpif_netdev|INFO|Core 2 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 4 (measured processing cycles 0). 2018-08-02T22:30:57.750Z|00097|dpif_netdev|INFO|Core 6 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 5 (measured processing cycles 0). 2018-08-02T22:30:57.750Z|00098|dpif_netdev|INFO|Core 8 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 6 (measured processing cycles 0). 2018-08-02T22:30:57.750Z|00099|dpif_netdev|INFO|Core 2 on numa node 0 assigned port 'vhu6e4076aa-c2' rx queue 7 (measured processing cycles 0). ~~~ Looks like I can easily revert to the old behavior (at least more or less). Now, if I can figure out a way to pass the correct struct smap into the method, I could even make this configurable, e.g.: ~~~ [root@undercloud-r430 SOURCES]# cat 0012-revert-rxq-ordering.patch --- openvswitch-2.9.0.orig/lib/dpif-netdev.c 2018-02-19 17:49:44.867277147 -0500 +++ openvswitch-2.9.0.modif/lib/dpif-netdev.c 2018-08-02 17:30:14.868626933 -0400 @@ -3453,25 +3453,35 @@ /* Returns the next pmd from the numa node in * incrementing or decrementing order. */ static struct dp_netdev_pmd_thread * -rr_numa_get_pmd(struct rr_numa *numa) +rr_numa_get_pmd(struct rr_numa *numa,bool always_increment) { int numa_idx = numa->cur_index; - if (numa->idx_inc == true) { + if (always_increment) { /* Incrementing through list of pmds. */ if (numa->cur_index == numa->n_pmds-1) { /* Reached the last pmd. */ - numa->idx_inc = false; + numa->cur_index = 0; } else { numa->cur_index++; } } else { - /* Decrementing through list of pmds. */ - if (numa->cur_index == 0) { - /* Reached the first pmd. */ - numa->idx_inc = true; + if (numa->idx_inc == true) { + /* Incrementing through list of pmds. */ + if (numa->cur_index == numa->n_pmds-1) { + /* Reached the last pmd. */ + numa->idx_inc = false; + } else { + numa->cur_index++; + } } else { - numa->cur_index--; + /* Decrementing through list of pmds. */ + if (numa->cur_index == 0) { + /* Reached the first pmd. */ + numa->idx_inc = true; + } else { + numa->cur_index--; + } } } return numa->pmds[numa_idx]; @@ -3581,7 +3591,9 @@ } } - if (n_rxqs > 1) { + bool legacy_rxq_sort = smap_get_bool(const struct smap *smap, "legacy_rxq_sort", false) + + if (!(legacy_rxq_sort) && n_rxqs > 1) { /* Sort the queues in order of the processing cycles * they consumed during their last pmd interval. */ qsort(rxqs, n_rxqs, sizeof *rxqs, compare_rxq_cycles); @@ -3605,7 +3617,7 @@ netdev_rxq_get_queue_id(rxqs[i]->rx)); continue; } - rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa); + rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa,legacy_rxq_sort); VLOG_WARN("There's no available (non-isolated) pmd thread " "on numa node %d. Queue %d on port \'%s\' will " "be assigned to the pmd on core %d " @@ -3614,7 +3626,7 @@ netdev_rxq_get_name(rxqs[i]->rx), rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id); } else { - rxqs[i]->pmd = rr_numa_get_pmd(numa); + rxqs[i]->pmd = rr_numa_get_pmd(numa,legacy_rxq_sort); VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " "rx queue %d (measured processing cycles %"PRIu64").", rxqs[i]->pmd->core_id, numa_id, ~~~ But I don't know how to get `const struct smap *smap` Is this about rebalancing added to 2.9? Or only the initial assignment? What about the other_config:pmd-rxq-affinity option, can't they use this to manually mimic the 2.6 behaviour? I'll find some time next week to dig through the code changes to better understand your request. In the meantime can you add some details on why you see the performance differences, and how you test this? I assume it has to due to how queues and traffic are distributed. Feature cannot be backported to older LTS releases So not sure what we can do. Run the vsperf CI performance testing on openvswitch-2.9.0-70.el7fdp with pmd-rxq-assign=roundrobin and without pmd-rxq-assign=roundrobin, The case with pmd-rxq-assign=roundrobin work well. The detail as following. job link(without configure pmd-rxq-assign=roundrobin): https://beaker.engineering.redhat.com/jobs/2874931 result link: https://docs.google.com/spreadsheets/d/1FR6Bs6BMANmDC_eEaqDetV-A_ZWd_SV7-J2P1UaoSNQ/edit#gid=1295184293 job link:(with configure pmd-rxq-assign=roundrobin) https://beaker.engineering.redhat.com/jobs/2973629 result link: https://docs.google.com/spreadsheets/d/1cxLlI5fmtwp-qfCD6ep2h07rWI7ThqTosU2MhhDWCz8/edit#gid=1319507937 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3500 |