Bug 1401433
Summary: | Vhost tx batching | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | jason wang <jasowang> |
Component: | kernel | Assignee: | Wei <wexu> |
kernel sub component: | Virtualization | QA Contact: | Quan Wenli <wquan> |
Status: | CLOSED ERRATA | Docs Contact: | Yehuda Zimmerman <yzimmerm> |
Severity: | unspecified | ||
Priority: | high | CC: | ailan, chayang, juzhang, michen, mtessun, pezhang, weliao, wexu, wquan |
Version: | 7.4 | Keywords: | FutureFeature |
Target Milestone: | rc | ||
Target Release: | 7.4 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-3.10.0-670.el7 | Doc Type: | No Doc Update |
Doc Text: |
undefined
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-02 04:53:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1283257, 1352741 | ||
Bug Blocks: | 1395265, 1414627, 1445257 |
Description
jason wang
2016-12-05 09:10:15 UTC
In net-next: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=e3e37e701713731b22f8cebfa1f5deed455cad8a Downstream test result on my laptop: Before: tap2 RX 1564831 pkts/s RX Dropped: 0 pkts/s tap1 TX 2180650 pkts/s TX Dropped: 1677842 pkts/s After: tap2 RX 1582509 pkts/s RX Dropped: 0 pkts/s tap1 TX 2232357 pkts/s TX Dropped: 1702915 pkts/s It is a bit complicated because I posted 2 versions and the v2 didn't touch any change rather than comment part which disturbs the maintainer quite much due to the feedback for other BZs I had done, usually we only need tweak v1, and I commented it to skip v2 and go back to v1 possibly last week but haven't got feedback so far. I will ping the maintainer to make sure if the process is acceptable or not, and see if I should ask reviewer to review it or post a new series. This is a performance improvement which doesn't need a specific document for it. Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-670.el7 Hi Wenli, Could you help to do performance test? Thanks, Xiyue (In reply to xiywang from comment #11) > Hi Wenli, > > Could you help to do performance test? > > Thanks, > Xiyue Ok, I will test it tmr. Hi, jason There is no parm named rx_batched with 3.10.0-670.el7.x86_64, how to check vhost tx batching valid, actually I did not see any tx pps difference between rx_batched=0 and rx_batched=16. # modinfo tun filename: /lib/modules/3.10.0-670.el7.x86_64/kernel/drivers/net/tun.ko.xz alias: devname:net/tun alias: char-major-10-200 license: GPL author: (C) 1999-2004 Max Krasnyansky <maxk> description: Universal TUN/TAP device driver rhelversion: 7.4 srcversion: E0353EFA774E5AFD2FFCFD1 depends: intree: Y vermagic: 3.10.0-670.el7.x86_64 SMP mod_unload modversions signer: Red Hat Enterprise Linux kernel signing key sig_key: 69:FC:97:DA:41:C9:5D:8E:B0:F5:C4:10:8F:59:71:A9:DC:53:14:E9 sig_hashalgo: sha256 (In reply to Quan Wenli from comment #13) > Hi, jason > > There is no parm named rx_batched with 3.10.0-670.el7.x86_64, how to check > vhost tx batching valid, actually I did not see any tx pps difference > between rx_batched=0 and rx_batched=16. > > # modinfo tun > filename: > /lib/modules/3.10.0-670.el7.x86_64/kernel/drivers/net/tun.ko.xz > alias: devname:net/tun > alias: char-major-10-200 > license: GPL > author: (C) 1999-2004 Max Krasnyansky <maxk> > description: Universal TUN/TAP device driver > rhelversion: 7.4 > srcversion: E0353EFA774E5AFD2FFCFD1 > depends: > intree: Y > vermagic: 3.10.0-670.el7.x86_64 SMP mod_unload modversions > signer: Red Hat Enterprise Linux kernel signing key > sig_key: 69:FC:97:DA:41:C9:5D:8E:B0:F5:C4:10:8F:59:71:A9:DC:53:14:E9 > sig_hashalgo: sha256 You need enable it through: ethtool -C tap0 rx-frames N Thanks And better to test it on VM2VM case. Hi, jason, wei Please check following performance results, pps is increased with 1 rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is it expected? Steps: 1. boot 2 vms in same bridge. 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on vm2 assigned in pktgen.sh script. 3. gather pps result on vm2. rx-frames pkts/s -----------+--------------+ 0 311290 -----------+--------------+ 1 311195 -----------+--------------+ 4 313300 -----------+--------------+ 16 315542 -----------+--------------+ 64 328584 -----------+--------------+ 128 329697 -----------+--------------+ 256 312774 ----------> drop compared rx-frames=128 -----------+--------------+ (In reply to Quan Wenli from comment #15) > Hi, jason, wei > > Please check following performance results, pps is increased with 1 > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is > it expected? > > > > Steps: > 1. boot 2 vms in same bridge. > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on > vm2 assigned in pktgen.sh script. > 3. gather pps result on vm2. > > rx-frames pkts/s > -----------+--------------+ > 0 311290 > -----------+--------------+ > 1 311195 > -----------+--------------+ > 4 313300 > -----------+--------------+ > 16 315542 > -----------+--------------+ > 64 328584 > -----------+--------------+ > 128 329697 > -----------+--------------+ > 256 312774 ----------> drop compared rx-frames=128 > -----------+--------------+ Interesting, in my setup with 3.10.0-671.el7.x86_64. rx-frames 0, 0.63Mpps rx-frames 64, 0.99Mpps (+57%) rx-frames 256, 0.99Mpps (+57%) Have you pinned all threads in one numa nodes during testing? Thanks (In reply to jason wang from comment #16) > (In reply to Quan Wenli from comment #15) > > Hi, jason, wei > > > > Please check following performance results, pps is increased with 1 > > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is > > it expected? > > > > > > > > Steps: > > 1. boot 2 vms in same bridge. > > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on > > vm2 assigned in pktgen.sh script. > > 3. gather pps result on vm2. > > > > rx-frames pkts/s > > -----------+--------------+ > > 0 311290 > > -----------+--------------+ > > 1 311195 > > -----------+--------------+ > > 4 313300 > > -----------+--------------+ > > 16 315542 > > -----------+--------------+ > > 64 328584 > > -----------+--------------+ > > 128 329697 > > -----------+--------------+ > > 256 312774 ----------> drop compared rx-frames=128 > > -----------+--------------+ > > Interesting, in my setup with 3.10.0-671.el7.x86_64. > > rx-frames 0, 0.63Mpps > rx-frames 64, 0.99Mpps (+57%) > rx-frames 256, 0.99Mpps (+57%) > > Have you pinned all threads in one numa nodes during testing? > Pinned all thread in one numa node, just slight improvement not obviously and not pps drops with 256 rx-frames. use "ethtool -c tap0" to check everytime, the rx-frames indeed valid. rx-frames 0, 330543 rx-frames 64, 334737 rx-frames 256, 334277 > Thanks (In reply to jason wang from comment #16) > (In reply to Quan Wenli from comment #15) > > Hi, jason, wei > > > > Please check following performance results, pps is increased with 1 > > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is > > it expected? > > > > > > > > Steps: > > 1. boot 2 vms in same bridge. > > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on > > vm2 assigned in pktgen.sh script. > > 3. gather pps result on vm2. > > > > rx-frames pkts/s > > -----------+--------------+ > > 0 311290 > > -----------+--------------+ > > 1 311195 > > -----------+--------------+ > > 4 313300 > > -----------+--------------+ > > 16 315542 > > -----------+--------------+ > > 64 328584 > > -----------+--------------+ > > 128 329697 > > -----------+--------------+ > > 256 312774 ----------> drop compared rx-frames=128 > > -----------+--------------+ > > Interesting, in my setup with 3.10.0-671.el7.x86_64. > > rx-frames 0, 0.63Mpps > rx-frames 64, 0.99Mpps (+57%) > rx-frames 256, 0.99Mpps (+57%) > > Have you pinned all threads in one numa nodes during testing? > > Thanks I tried with your image which guest is using 4.10.0+ kernel, the performance is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the performance is still 0.33Mpps. So no performance difference between rhel7.4 guest and latest upstream guest,but it seems an existed regression issue between 4.10.0+ and 4.11.0+rc5+ in upstream. (In reply to Quan Wenli from comment #18) > (In reply to jason wang from comment #16) > > (In reply to Quan Wenli from comment #15) > > > Hi, jason, wei > > > > > > Please check following performance results, pps is increased with 1 > > > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is > > > it expected? > > > > > > > > > > > > Steps: > > > 1. boot 2 vms in same bridge. > > > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on > > > vm2 assigned in pktgen.sh script. > > > 3. gather pps result on vm2. > > > > > > rx-frames pkts/s > > > -----------+--------------+ > > > 0 311290 > > > -----------+--------------+ > > > 1 311195 > > > -----------+--------------+ > > > 4 313300 > > > -----------+--------------+ > > > 16 315542 > > > -----------+--------------+ > > > 64 328584 > > > -----------+--------------+ > > > 128 329697 > > > -----------+--------------+ > > > 256 312774 ----------> drop compared rx-frames=128 > > > -----------+--------------+ > > > > Interesting, in my setup with 3.10.0-671.el7.x86_64. > > > > rx-frames 0, 0.63Mpps > > rx-frames 64, 0.99Mpps (+57%) > > rx-frames 256, 0.99Mpps (+57%) > > > > Have you pinned all threads in one numa nodes during testing? > > > > Thanks > > I tried with your image which guest is using 4.10.0+ kernel, the performance > is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the > performance is still 0.33Mpps. > > So no performance difference between rhel7.4 guest and latest upstream > guest,but it seems an existed regression issue between 4.10.0+ and > 4.11.0+rc5+ in upstream. Can you try net.git or linux.git. My image use net-next which is in fact a development tree. Thanks (In reply to jason wang from comment #20) > (In reply to Quan Wenli from comment #18) > > (In reply to jason wang from comment #16) > > > (In reply to Quan Wenli from comment #15) > > > > Hi, jason, wei > > > > > > > > Please check following performance results, pps is increased with 1 > > > > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is > > > > it expected? > > > > > > > > > > > > > > > > Steps: > > > > 1. boot 2 vms in same bridge. > > > > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on > > > > vm2 assigned in pktgen.sh script. > > > > 3. gather pps result on vm2. > > > > > > > > rx-frames pkts/s > > > > -----------+--------------+ > > > > 0 311290 > > > > -----------+--------------+ > > > > 1 311195 > > > > -----------+--------------+ > > > > 4 313300 > > > > -----------+--------------+ > > > > 16 315542 > > > > -----------+--------------+ > > > > 64 328584 > > > > -----------+--------------+ > > > > 128 329697 > > > > -----------+--------------+ > > > > 256 312774 ----------> drop compared rx-frames=128 > > > > -----------+--------------+ > > > > > > Interesting, in my setup with 3.10.0-671.el7.x86_64. > > > > > > rx-frames 0, 0.63Mpps > > > rx-frames 64, 0.99Mpps (+57%) > > > rx-frames 256, 0.99Mpps (+57%) > > > > > > Have you pinned all threads in one numa nodes during testing? > > > > > > Thanks > > > > I tried with your image which guest is using 4.10.0+ kernel, the performance > > is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the > > performance is still 0.33Mpps. > > > > So no performance difference between rhel7.4 guest and latest upstream > > guest,but it seems an existed regression issue between 4.10.0+ and > > 4.11.0+rc5+ in upstream. > > Can you try net.git or linux.git. My image use net-next which is in fact a > development tree. > > Thanks Tried again with guest kernel-4.11.0-rc5+ from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, the result is still bad which is 0.33Mpps. So it's a upstream bug ? May I open one bug for tracking it and close this bug ? (In reply to Quan Wenli from comment #21) > (In reply to jason wang from comment #20) > > (In reply to Quan Wenli from comment #18) > > > (In reply to jason wang from comment #16) > > > > (In reply to Quan Wenli from comment #15) > > > > > Hi, jason, wei > > > > > > > > > > Please check following performance results, pps is increased with 1 > > > > > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is > > > > > it expected? > > > > > > > > > > > > > > > > > > > > Steps: > > > > > 1. boot 2 vms in same bridge. > > > > > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on > > > > > vm2 assigned in pktgen.sh script. > > > > > 3. gather pps result on vm2. > > > > > > > > > > rx-frames pkts/s > > > > > -----------+--------------+ > > > > > 0 311290 > > > > > -----------+--------------+ > > > > > 1 311195 > > > > > -----------+--------------+ > > > > > 4 313300 > > > > > -----------+--------------+ > > > > > 16 315542 > > > > > -----------+--------------+ > > > > > 64 328584 > > > > > -----------+--------------+ > > > > > 128 329697 > > > > > -----------+--------------+ > > > > > 256 312774 ----------> drop compared rx-frames=128 > > > > > -----------+--------------+ > > > > > > > > Interesting, in my setup with 3.10.0-671.el7.x86_64. > > > > > > > > rx-frames 0, 0.63Mpps > > > > rx-frames 64, 0.99Mpps (+57%) > > > > rx-frames 256, 0.99Mpps (+57%) > > > > > > > > Have you pinned all threads in one numa nodes during testing? > > > > > > > > Thanks > > > > > > I tried with your image which guest is using 4.10.0+ kernel, the performance > > > is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the > > > performance is still 0.33Mpps. > > > > > > So no performance difference between rhel7.4 guest and latest upstream > > > guest,but it seems an existed regression issue between 4.10.0+ and > > > 4.11.0+rc5+ in upstream. > > > > Can you try net.git or linux.git. My image use net-next which is in fact a > > development tree. > > > > Thanks > > Tried again with guest kernel-4.11.0-rc5+ from > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, the result > is still bad which is 0.33Mpps. > > So it's a upstream bug ? May I open one bug for tracking it and close this > bug ? After check again, I found there is no regression in upstream, the root cause for regression pps between 4.10 to 4.11 is the different param in pktgen.sh. 1. Both enabled dst (IP) and dst_mac, the pps performance was minium with 0.25. 2. Only enabled dst_mac, the pps performance was middle with 0.32 which I got with 4.11-rc5+ kernel. 3. Only enabled dst(IP), the pps performance was maxium with 0.5 which I got with 4.10 kernel. So there is regression in upstream. And for this bug with only dst (IP), the pps performance was indeed improved with enlarging rx-frames. rx-frames 0, 0.50 rx-frames 1, 0.53 rx-frames 4, 0.56 rx-frames 64, 0.64 Base on above, change it to verified.
>
> After check again, I found there is no regression in upstream, the root
> cause for regression pps between 4.10 to 4.11 is the different param in
> pktgen.sh.
>
> 1. Both enabled dst (IP) and dst_mac, the pps performance was minium with
> 0.25.
>
> 2. Only enabled dst_mac, the pps performance was middle with 0.32 which I
> got with 4.11-rc5+ kernel.
>
> 3. Only enabled dst(IP), the pps performance was maxium with 0.5 which I got
> with 4.10 kernel.
>
> So there is regression in upstream.
Should be no regression in upstream
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1842 |