Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1401433 - Vhost tx batching
Vhost tx batching
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.4
Unspecified Unspecified
high Severity unspecified
: rc
: 7.4
Assigned To: Wei
Quan Wenli
Yehuda Zimmerman
: FutureFeature
Depends On: 1283257 1352741
Blocks: 1395265 1414627 1445257
  Show dependency treegraph
 
Reported: 2016-12-05 04:10 EST by jason wang
Modified: 2017-08-02 00:53 EDT (History)
9 users (show)

See Also:
Fixed In Version: kernel-3.10.0-670.el7
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-02 00:53:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:1842 normal SHIPPED_LIVE Important: kernel security, bug fix, and enhancement update 2017-08-01 14:22:09 EDT

  None (edit)
Description jason wang 2016-12-05 04:10:15 EST
Description of problem:

Upstream will suport vhost tx batching, which can batching several tx packets before submitting to host stack.

For testing:
- modprobe tun rx_batched=0
- run pktgen/l2fwd in guest measure pps
- modprobe tun rx_batched=16
- run pktgen/l2fwd in guest measure pps

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Wei 2017-03-15 05:58:34 EDT
Downstream test result on my laptop:
Before:
tap2 RX  1564831 pkts/s RX Dropped: 0 pkts/s
tap1 TX  2180650 pkts/s TX Dropped: 1677842 pkts/s

After:
tap2 RX  1582509 pkts/s RX Dropped: 0 pkts/s
tap1 TX  2232357 pkts/s TX Dropped: 1702915 pkts/s
Comment 4 Wei 2017-05-08 00:24:01 EDT
It is a bit complicated because I posted 2 versions and the v2 didn't touch any 
change rather than comment part which disturbs the maintainer quite much due to 
the feedback for other BZs I had done, usually we only need tweak v1, and I
commented it to skip v2 and go back to v1 possibly last week but haven't got 
feedback so far.

I will ping the maintainer to make sure if the process is acceptable or not, and see if I should ask reviewer to review it or post a new series.
Comment 6 Wei 2017-05-19 09:54:39 EDT
This is a performance improvement which doesn't need a specific document for it.
Comment 7 Rafael Aquini 2017-05-19 19:22:26 EDT
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing
Comment 9 Rafael Aquini 2017-05-22 09:54:12 EDT
Patch(es) available on kernel-3.10.0-670.el7
Comment 11 xiywang 2017-05-22 22:30:59 EDT
Hi Wenli,

Could you help to do performance test?

Thanks,
Xiyue
Comment 12 Quan Wenli 2017-05-22 23:27:58 EDT
(In reply to xiywang from comment #11)
> Hi Wenli,
> 
> Could you help to do performance test?
> 
> Thanks,
> Xiyue

Ok, I will test it tmr.
Comment 13 Quan Wenli 2017-05-23 05:54:23 EDT
Hi, jason

There is no parm named rx_batched with 3.10.0-670.el7.x86_64, how to check vhost tx batching valid, actually I did not see any tx pps difference between rx_batched=0 and rx_batched=16. 

# modinfo tun
filename:       /lib/modules/3.10.0-670.el7.x86_64/kernel/drivers/net/tun.ko.xz
alias:          devname:net/tun
alias:          char-major-10-200
license:        GPL
author:         (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
description:    Universal TUN/TAP device driver
rhelversion:    7.4
srcversion:     E0353EFA774E5AFD2FFCFD1
depends:        
intree:         Y
vermagic:       3.10.0-670.el7.x86_64 SMP mod_unload modversions 
signer:         Red Hat Enterprise Linux kernel signing key
sig_key:        69:FC:97:DA:41:C9:5D:8E:B0:F5:C4:10:8F:59:71:A9:DC:53:14:E9
sig_hashalgo:   sha256
Comment 14 jason wang 2017-05-23 23:26:56 EDT
(In reply to Quan Wenli from comment #13)
> Hi, jason
> 
> There is no parm named rx_batched with 3.10.0-670.el7.x86_64, how to check
> vhost tx batching valid, actually I did not see any tx pps difference
> between rx_batched=0 and rx_batched=16. 
> 
> # modinfo tun
> filename:      
> /lib/modules/3.10.0-670.el7.x86_64/kernel/drivers/net/tun.ko.xz
> alias:          devname:net/tun
> alias:          char-major-10-200
> license:        GPL
> author:         (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
> description:    Universal TUN/TAP device driver
> rhelversion:    7.4
> srcversion:     E0353EFA774E5AFD2FFCFD1
> depends:        
> intree:         Y
> vermagic:       3.10.0-670.el7.x86_64 SMP mod_unload modversions 
> signer:         Red Hat Enterprise Linux kernel signing key
> sig_key:        69:FC:97:DA:41:C9:5D:8E:B0:F5:C4:10:8F:59:71:A9:DC:53:14:E9
> sig_hashalgo:   sha256

You need enable it through:

ethtool -C tap0 rx-frames N

Thanks

And better to test it on VM2VM case.
Comment 15 Quan Wenli 2017-05-24 03:25:04 EDT
Hi, jason, wei 

Please check following performance results, pps is increased with 1 rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is it expected?



Steps: 
1. boot 2 vms in same bridge. 
2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on vm2 assigned in pktgen.sh script. 
3. gather pps result on vm2. 

 rx-frames      pkts/s
-----------+--------------+
     0         311290
-----------+--------------+
     1         311195
-----------+--------------+
     4         313300
-----------+--------------+
    16         315542
-----------+--------------+
    64         328584
-----------+--------------+
   128         329697
-----------+--------------+
   256         312774      ----------> drop compared rx-frames=128
-----------+--------------+
Comment 16 jason wang 2017-05-24 06:51:44 EDT
(In reply to Quan Wenli from comment #15)
> Hi, jason, wei 
> 
> Please check following performance results, pps is increased with 1
> rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> it expected?
> 
> 
> 
> Steps: 
> 1. boot 2 vms in same bridge. 
> 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> vm2 assigned in pktgen.sh script. 
> 3. gather pps result on vm2. 
> 
>  rx-frames      pkts/s
> -----------+--------------+
>      0         311290
> -----------+--------------+
>      1         311195
> -----------+--------------+
>      4         313300
> -----------+--------------+
>     16         315542
> -----------+--------------+
>     64         328584
> -----------+--------------+
>    128         329697
> -----------+--------------+
>    256         312774      ----------> drop compared rx-frames=128
> -----------+--------------+

Interesting, in my setup with 3.10.0-671.el7.x86_64.

rx-frames 0,   0.63Mpps
rx-frames 64,  0.99Mpps (+57%)
rx-frames 256, 0.99Mpps (+57%)

Have you pinned all threads in one numa nodes during testing?

Thanks
Comment 17 Quan Wenli 2017-05-25 05:03:30 EDT
(In reply to jason wang from comment #16)
> (In reply to Quan Wenli from comment #15)
> > Hi, jason, wei 
> > 
> > Please check following performance results, pps is increased with 1
> > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> > it expected?
> > 
> > 
> > 
> > Steps: 
> > 1. boot 2 vms in same bridge. 
> > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> > vm2 assigned in pktgen.sh script. 
> > 3. gather pps result on vm2. 
> > 
> >  rx-frames      pkts/s
> > -----------+--------------+
> >      0         311290
> > -----------+--------------+
> >      1         311195
> > -----------+--------------+
> >      4         313300
> > -----------+--------------+
> >     16         315542
> > -----------+--------------+
> >     64         328584
> > -----------+--------------+
> >    128         329697
> > -----------+--------------+
> >    256         312774      ----------> drop compared rx-frames=128
> > -----------+--------------+
> 
> Interesting, in my setup with 3.10.0-671.el7.x86_64.
> 
> rx-frames 0,   0.63Mpps
> rx-frames 64,  0.99Mpps (+57%)
> rx-frames 256, 0.99Mpps (+57%)
> 
> Have you pinned all threads in one numa nodes during testing?
> 

Pinned all thread in one numa node, just slight improvement not obviously and not pps drops with 256 rx-frames. use "ethtool -c tap0" to check everytime, the rx-frames indeed valid.

rx-frames 0,    330543
rx-frames 64,   334737
rx-frames 256,  334277


> Thanks
Comment 18 Quan Wenli 2017-05-31 23:33:39 EDT
(In reply to jason wang from comment #16)
> (In reply to Quan Wenli from comment #15)
> > Hi, jason, wei 
> > 
> > Please check following performance results, pps is increased with 1
> > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> > it expected?
> > 
> > 
> > 
> > Steps: 
> > 1. boot 2 vms in same bridge. 
> > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> > vm2 assigned in pktgen.sh script. 
> > 3. gather pps result on vm2. 
> > 
> >  rx-frames      pkts/s
> > -----------+--------------+
> >      0         311290
> > -----------+--------------+
> >      1         311195
> > -----------+--------------+
> >      4         313300
> > -----------+--------------+
> >     16         315542
> > -----------+--------------+
> >     64         328584
> > -----------+--------------+
> >    128         329697
> > -----------+--------------+
> >    256         312774      ----------> drop compared rx-frames=128
> > -----------+--------------+
> 
> Interesting, in my setup with 3.10.0-671.el7.x86_64.
> 
> rx-frames 0,   0.63Mpps
> rx-frames 64,  0.99Mpps (+57%)
> rx-frames 256, 0.99Mpps (+57%)
> 
> Have you pinned all threads in one numa nodes during testing?
> 
> Thanks

I tried with your image which guest is using 4.10.0+ kernel, the performance is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the performance is still 0.33Mpps.

So no performance difference between rhel7.4 guest and latest upstream guest,but it seems an existed regression issue between 4.10.0+ and 4.11.0+rc5+ in upstream.
Comment 20 jason wang 2017-06-08 00:09:40 EDT
(In reply to Quan Wenli from comment #18)
> (In reply to jason wang from comment #16)
> > (In reply to Quan Wenli from comment #15)
> > > Hi, jason, wei 
> > > 
> > > Please check following performance results, pps is increased with 1
> > > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> > > it expected?
> > > 
> > > 
> > > 
> > > Steps: 
> > > 1. boot 2 vms in same bridge. 
> > > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> > > vm2 assigned in pktgen.sh script. 
> > > 3. gather pps result on vm2. 
> > > 
> > >  rx-frames      pkts/s
> > > -----------+--------------+
> > >      0         311290
> > > -----------+--------------+
> > >      1         311195
> > > -----------+--------------+
> > >      4         313300
> > > -----------+--------------+
> > >     16         315542
> > > -----------+--------------+
> > >     64         328584
> > > -----------+--------------+
> > >    128         329697
> > > -----------+--------------+
> > >    256         312774      ----------> drop compared rx-frames=128
> > > -----------+--------------+
> > 
> > Interesting, in my setup with 3.10.0-671.el7.x86_64.
> > 
> > rx-frames 0,   0.63Mpps
> > rx-frames 64,  0.99Mpps (+57%)
> > rx-frames 256, 0.99Mpps (+57%)
> > 
> > Have you pinned all threads in one numa nodes during testing?
> > 
> > Thanks
> 
> I tried with your image which guest is using 4.10.0+ kernel, the performance
> is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the
> performance is still 0.33Mpps.
> 
> So no performance difference between rhel7.4 guest and latest upstream
> guest,but it seems an existed regression issue between 4.10.0+ and
> 4.11.0+rc5+ in upstream.

Can you try net.git or linux.git. My image use net-next which is in fact a development tree.

Thanks
Comment 21 Quan Wenli 2017-06-12 02:31:31 EDT
(In reply to jason wang from comment #20)
> (In reply to Quan Wenli from comment #18)
> > (In reply to jason wang from comment #16)
> > > (In reply to Quan Wenli from comment #15)
> > > > Hi, jason, wei 
> > > > 
> > > > Please check following performance results, pps is increased with 1
> > > > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> > > > it expected?
> > > > 
> > > > 
> > > > 
> > > > Steps: 
> > > > 1. boot 2 vms in same bridge. 
> > > > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> > > > vm2 assigned in pktgen.sh script. 
> > > > 3. gather pps result on vm2. 
> > > > 
> > > >  rx-frames      pkts/s
> > > > -----------+--------------+
> > > >      0         311290
> > > > -----------+--------------+
> > > >      1         311195
> > > > -----------+--------------+
> > > >      4         313300
> > > > -----------+--------------+
> > > >     16         315542
> > > > -----------+--------------+
> > > >     64         328584
> > > > -----------+--------------+
> > > >    128         329697
> > > > -----------+--------------+
> > > >    256         312774      ----------> drop compared rx-frames=128
> > > > -----------+--------------+
> > > 
> > > Interesting, in my setup with 3.10.0-671.el7.x86_64.
> > > 
> > > rx-frames 0,   0.63Mpps
> > > rx-frames 64,  0.99Mpps (+57%)
> > > rx-frames 256, 0.99Mpps (+57%)
> > > 
> > > Have you pinned all threads in one numa nodes during testing?
> > > 
> > > Thanks
> > 
> > I tried with your image which guest is using 4.10.0+ kernel, the performance
> > is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the
> > performance is still 0.33Mpps.
> > 
> > So no performance difference between rhel7.4 guest and latest upstream
> > guest,but it seems an existed regression issue between 4.10.0+ and
> > 4.11.0+rc5+ in upstream.
> 
> Can you try net.git or linux.git. My image use net-next which is in fact a
> development tree.
> 
> Thanks

Tried again with guest kernel-4.11.0-rc5+ from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, the result is still bad which is 0.33Mpps. 

So it's a upstream bug ? May I open one bug for tracking it and close this bug ?
Comment 22 Quan Wenli 2017-06-15 02:51:57 EDT
(In reply to Quan Wenli from comment #21)
> (In reply to jason wang from comment #20)
> > (In reply to Quan Wenli from comment #18)
> > > (In reply to jason wang from comment #16)
> > > > (In reply to Quan Wenli from comment #15)
> > > > > Hi, jason, wei 
> > > > > 
> > > > > Please check following performance results, pps is increased with 1
> > > > > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> > > > > it expected?
> > > > > 
> > > > > 
> > > > > 
> > > > > Steps: 
> > > > > 1. boot 2 vms in same bridge. 
> > > > > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> > > > > vm2 assigned in pktgen.sh script. 
> > > > > 3. gather pps result on vm2. 
> > > > > 
> > > > >  rx-frames      pkts/s
> > > > > -----------+--------------+
> > > > >      0         311290
> > > > > -----------+--------------+
> > > > >      1         311195
> > > > > -----------+--------------+
> > > > >      4         313300
> > > > > -----------+--------------+
> > > > >     16         315542
> > > > > -----------+--------------+
> > > > >     64         328584
> > > > > -----------+--------------+
> > > > >    128         329697
> > > > > -----------+--------------+
> > > > >    256         312774      ----------> drop compared rx-frames=128
> > > > > -----------+--------------+
> > > > 
> > > > Interesting, in my setup with 3.10.0-671.el7.x86_64.
> > > > 
> > > > rx-frames 0,   0.63Mpps
> > > > rx-frames 64,  0.99Mpps (+57%)
> > > > rx-frames 256, 0.99Mpps (+57%)
> > > > 
> > > > Have you pinned all threads in one numa nodes during testing?
> > > > 
> > > > Thanks
> > > 
> > > I tried with your image which guest is using 4.10.0+ kernel, the performance
> > > is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the
> > > performance is still 0.33Mpps.
> > > 
> > > So no performance difference between rhel7.4 guest and latest upstream
> > > guest,but it seems an existed regression issue between 4.10.0+ and
> > > 4.11.0+rc5+ in upstream.
> > 
> > Can you try net.git or linux.git. My image use net-next which is in fact a
> > development tree.
> > 
> > Thanks
> 
> Tried again with guest kernel-4.11.0-rc5+ from
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, the result
> is still bad which is 0.33Mpps. 
> 
> So it's a upstream bug ? May I open one bug for tracking it and close this
> bug ?

After check again, I found there is no regression in upstream, the root cause for regression pps between 4.10 to 4.11 is the different param in pktgen.sh. 

1. Both enabled dst (IP) and dst_mac, the pps performance was minium with 0.25. 

2. Only enabled dst_mac, the pps performance was middle with 0.32 which I got with 4.11-rc5+ kernel.

3. Only enabled dst(IP), the pps performance was maxium with 0.5 which I got with 4.10 kernel.

So there is regression in upstream.

And for this bug with only dst (IP), the pps performance was indeed improved with enlarging rx-frames. 

rx-frames 0,    0.50
rx-frames 1,    0.53
rx-frames 4,    0.56
rx-frames 64,   0.64


Base on above, change it to verified.
Comment 23 Quan Wenli 2017-06-19 01:17:18 EDT
> 
> After check again, I found there is no regression in upstream, the root
> cause for regression pps between 4.10 to 4.11 is the different param in
> pktgen.sh. 
> 
> 1. Both enabled dst (IP) and dst_mac, the pps performance was minium with
> 0.25. 
> 
> 2. Only enabled dst_mac, the pps performance was middle with 0.32 which I
> got with 4.11-rc5+ kernel.
> 
> 3. Only enabled dst(IP), the pps performance was maxium with 0.5 which I got
> with 4.10 kernel.
> 
> So there is regression in upstream.

Should be no regression in upstream
Comment 25 errata-xmlrpc 2017-08-02 00:53:19 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842

Note You need to log in before you can comment on or make changes to this bug.