1401433 – Vhost tx batching

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1401433 - Vhost tx batching

Summary: Vhost tx batching

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	rc
Target Release:	7.4
Assignee:	Wei
QA Contact:	Quan Wenli
Docs Contact:	Yehuda Zimmerman
URL:
Whiteboard:
Depends On:	1283257 1352741
Blocks:	1395265 1414627 1445257
TreeView+	depends on / blocked

Reported:	2016-12-05 09:10 UTC by jason wang
Modified:	2017-08-02 04:53 UTC (History)
CC List:	9 users (show)
Fixed In Version:	kernel-3.10.0-670.el7
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2017-08-02 04:53:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:1842	0	normal	SHIPPED_LIVE	Important: kernel security, bug fix, and enhancement update	2017-08-01 18:22:09 UTC

Description jason wang 2016-12-05 09:10:15 UTC

Description of problem:

Upstream will suport vhost tx batching, which can batching several tx packets before submitting to host stack.

For testing:
- modprobe tun rx_batched=0
- run pktgen/l2fwd in guest measure pps
- modprobe tun rx_batched=16
- run pktgen/l2fwd in guest measure pps

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 jason wang 2017-01-19 03:48:38 UTC

In net-next:

https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=e3e37e701713731b22f8cebfa1f5deed455cad8a

Comment 2 Wei 2017-03-15 09:58:34 UTC

Downstream test result on my laptop:
Before:
tap2 RX  1564831 pkts/s RX Dropped: 0 pkts/s
tap1 TX  2180650 pkts/s TX Dropped: 1677842 pkts/s

After:
tap2 RX  1582509 pkts/s RX Dropped: 0 pkts/s
tap1 TX  2232357 pkts/s TX Dropped: 1702915 pkts/s

Comment 4 Wei 2017-05-08 04:24:01 UTC

It is a bit complicated because I posted 2 versions and the v2 didn't touch any 
change rather than comment part which disturbs the maintainer quite much due to 
the feedback for other BZs I had done, usually we only need tweak v1, and I
commented it to skip v2 and go back to v1 possibly last week but haven't got 
feedback so far.

I will ping the maintainer to make sure if the process is acceptable or not, and see if I should ask reviewer to review it or post a new series.

Comment 6 Wei 2017-05-19 13:54:39 UTC

This is a performance improvement which doesn't need a specific document for it.

Comment 7 Rafael Aquini 2017-05-19 23:22:26 UTC

Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 9 Rafael Aquini 2017-05-22 13:54:12 UTC

Patch(es) available on kernel-3.10.0-670.el7

Comment 11 xiywang 2017-05-23 02:30:59 UTC

Hi Wenli,

Could you help to do performance test?

Thanks,
Xiyue

Comment 12 Quan Wenli 2017-05-23 03:27:58 UTC

(In reply to xiywang from comment #11)
> Hi Wenli,
> 
> Could you help to do performance test?
> 
> Thanks,
> Xiyue

Ok, I will test it tmr.

Comment 13 Quan Wenli 2017-05-23 09:54:23 UTC

Hi, jason

There is no parm named rx_batched with 3.10.0-670.el7.x86_64, how to check vhost tx batching valid, actually I did not see any tx pps difference between rx_batched=0 and rx_batched=16. 

# modinfo tun
filename:       /lib/modules/3.10.0-670.el7.x86_64/kernel/drivers/net/tun.ko.xz
alias:          devname:net/tun
alias:          char-major-10-200
license:        GPL
author:         (C) 1999-2004 Max Krasnyansky <maxk>
description:    Universal TUN/TAP device driver
rhelversion:    7.4
srcversion:     E0353EFA774E5AFD2FFCFD1
depends:        
intree:         Y
vermagic:       3.10.0-670.el7.x86_64 SMP mod_unload modversions 
signer:         Red Hat Enterprise Linux kernel signing key
sig_key:        69:FC:97:DA:41:C9:5D:8E:B0:F5:C4:10:8F:59:71:A9:DC:53:14:E9
sig_hashalgo:   sha256

Comment 14 jason wang 2017-05-24 03:26:56 UTC

(In reply to Quan Wenli from comment #13)
> Hi, jason
> 
> There is no parm named rx_batched with 3.10.0-670.el7.x86_64, how to check
> vhost tx batching valid, actually I did not see any tx pps difference
> between rx_batched=0 and rx_batched=16. 
> 
> # modinfo tun
> filename:      
> /lib/modules/3.10.0-670.el7.x86_64/kernel/drivers/net/tun.ko.xz
> alias:          devname:net/tun
> alias:          char-major-10-200
> license:        GPL
> author:         (C) 1999-2004 Max Krasnyansky <maxk>
> description:    Universal TUN/TAP device driver
> rhelversion:    7.4
> srcversion:     E0353EFA774E5AFD2FFCFD1
> depends:        
> intree:         Y
> vermagic:       3.10.0-670.el7.x86_64 SMP mod_unload modversions 
> signer:         Red Hat Enterprise Linux kernel signing key
> sig_key:        69:FC:97:DA:41:C9:5D:8E:B0:F5:C4:10:8F:59:71:A9:DC:53:14:E9
> sig_hashalgo:   sha256

You need enable it through:

ethtool -C tap0 rx-frames N

Thanks

And better to test it on VM2VM case.

Comment 15 Quan Wenli 2017-05-24 07:25:04 UTC

Hi, jason, wei 

Please check following performance results, pps is increased with 1 rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is it expected?



Ｓteps: 
1. boot 2 vms in same bridge. 
2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on vm2 assigned in pktgen.sh script. 
3. gather pps result on vm2. 

 rx-frames      pkts/s
-----------+--------------+
     0         311290
-----------+--------------+
     1         311195
-----------+--------------+
     4         313300
-----------+--------------+
    16         315542
-----------+--------------+
    64         328584
-----------+--------------+
   128         329697
-----------+--------------+
   256         312774      ----------> drop compared rx-frames=128
-----------+--------------+

Comment 16 jason wang 2017-05-24 10:51:44 UTC

(In reply to Quan Wenli from comment #15)
> Hi, jason, wei 
> 
> Please check following performance results, pps is increased with 1
> rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> it expected?
> 
> 
> 
> Ｓteps: 
> 1. boot 2 vms in same bridge. 
> 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> vm2 assigned in pktgen.sh script. 
> 3. gather pps result on vm2. 
> 
>  rx-frames      pkts/s
> -----------+--------------+
>      0         311290
> -----------+--------------+
>      1         311195
> -----------+--------------+
>      4         313300
> -----------+--------------+
>     16         315542
> -----------+--------------+
>     64         328584
> -----------+--------------+
>    128         329697
> -----------+--------------+
>    256         312774      ----------> drop compared rx-frames=128
> -----------+--------------+

Interesting, in my setup with 3.10.0-671.el7.x86_64.

rx-frames 0,   0.63Mpps
rx-frames 64,  0.99Mpps (+57%)
rx-frames 256, 0.99Mpps (+57%)

Have you pinned all threads in one numa nodes during testing?

Thanks

Comment 17 Quan Wenli 2017-05-25 09:03:30 UTC

(In reply to jason wang from comment #16)
> (In reply to Quan Wenli from comment #15)
> > Hi, jason, wei 
> > 
> > Please check following performance results, pps is increased with 1
> > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> > it expected?
> > 
> > 
> > 
> > Ｓteps: 
> > 1. boot 2 vms in same bridge. 
> > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> > vm2 assigned in pktgen.sh script. 
> > 3. gather pps result on vm2. 
> > 
> >  rx-frames      pkts/s
> > -----------+--------------+
> >      0         311290
> > -----------+--------------+
> >      1         311195
> > -----------+--------------+
> >      4         313300
> > -----------+--------------+
> >     16         315542
> > -----------+--------------+
> >     64         328584
> > -----------+--------------+
> >    128         329697
> > -----------+--------------+
> >    256         312774      ----------> drop compared rx-frames=128
> > -----------+--------------+
> 
> Interesting, in my setup with 3.10.0-671.el7.x86_64.
> 
> rx-frames 0,   0.63Mpps
> rx-frames 64,  0.99Mpps (+57%)
> rx-frames 256, 0.99Mpps (+57%)
> 
> Have you pinned all threads in one numa nodes during testing?
> 

Pinned all thread in one numa node, just slight improvement not obviously and not pps drops with 256 rx-frames. use "ethtool -c tap0" to check everytime, the rx-frames indeed valid.

rx-frames 0,    330543
rx-frames 64,   334737
rx-frames 256,  334277


> Thanks

Comment 18 Quan Wenli 2017-06-01 03:33:39 UTC

(In reply to jason wang from comment #16)
> (In reply to Quan Wenli from comment #15)
> > Hi, jason, wei 
> > 
> > Please check following performance results, pps is increased with 1
> > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> > it expected?
> > 
> > 
> > 
> > Ｓteps: 
> > 1. boot 2 vms in same bridge. 
> > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> > vm2 assigned in pktgen.sh script. 
> > 3. gather pps result on vm2. 
> > 
> >  rx-frames      pkts/s
> > -----------+--------------+
> >      0         311290
> > -----------+--------------+
> >      1         311195
> > -----------+--------------+
> >      4         313300
> > -----------+--------------+
> >     16         315542
> > -----------+--------------+
> >     64         328584
> > -----------+--------------+
> >    128         329697
> > -----------+--------------+
> >    256         312774      ----------> drop compared rx-frames=128
> > -----------+--------------+
> 
> Interesting, in my setup with 3.10.0-671.el7.x86_64.
> 
> rx-frames 0,   0.63Mpps
> rx-frames 64,  0.99Mpps (+57%)
> rx-frames 256, 0.99Mpps (+57%)
> 
> Have you pinned all threads in one numa nodes during testing?
> 
> Thanks

I tried with your image which guest is using 4.10.0+ kernel, the performance is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the performance is still 0.33Mpps.

So no performance difference between rhel7.4 guest and latest upstream guest,but it seems an existed regression issue between 4.10.0+ and 4.11.0+rc5+ in upstream.

Comment 20 jason wang 2017-06-08 04:09:40 UTC

(In reply to Quan Wenli from comment #18)
> (In reply to jason wang from comment #16)
> > (In reply to Quan Wenli from comment #15)
> > > Hi, jason, wei 
> > > 
> > > Please check following performance results, pps is increased with 1
> > > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> > > it expected?
> > > 
> > > 
> > > 
> > > Ｓteps: 
> > > 1. boot 2 vms in same bridge. 
> > > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> > > vm2 assigned in pktgen.sh script. 
> > > 3. gather pps result on vm2. 
> > > 
> > >  rx-frames      pkts/s
> > > -----------+--------------+
> > >      0         311290
> > > -----------+--------------+
> > >      1         311195
> > > -----------+--------------+
> > >      4         313300
> > > -----------+--------------+
> > >     16         315542
> > > -----------+--------------+
> > >     64         328584
> > > -----------+--------------+
> > >    128         329697
> > > -----------+--------------+
> > >    256         312774      ----------> drop compared rx-frames=128
> > > -----------+--------------+
> > 
> > Interesting, in my setup with 3.10.0-671.el7.x86_64.
> > 
> > rx-frames 0,   0.63Mpps
> > rx-frames 64,  0.99Mpps (+57%)
> > rx-frames 256, 0.99Mpps (+57%)
> > 
> > Have you pinned all threads in one numa nodes during testing?
> > 
> > Thanks
> 
> I tried with your image which guest is using 4.10.0+ kernel, the performance
> is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the
> performance is still 0.33Mpps.
> 
> So no performance difference between rhel7.4 guest and latest upstream
> guest,but it seems an existed regression issue between 4.10.0+ and
> 4.11.0+rc5+ in upstream.

Can you try net.git or linux.git. My image use net-next which is in fact a development tree.

Thanks

Comment 21 Quan Wenli 2017-06-12 06:31:31 UTC

(In reply to jason wang from comment #20)
> (In reply to Quan Wenli from comment #18)
> > (In reply to jason wang from comment #16)
> > > (In reply to Quan Wenli from comment #15)
> > > > Hi, jason, wei 
> > > > 
> > > > Please check following performance results, pps is increased with 1
> > > > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> > > > it expected?
> > > > 
> > > > 
> > > > 
> > > > Ｓteps: 
> > > > 1. boot 2 vms in same bridge. 
> > > > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> > > > vm2 assigned in pktgen.sh script. 
> > > > 3. gather pps result on vm2. 
> > > > 
> > > >  rx-frames      pkts/s
> > > > -----------+--------------+
> > > >      0         311290
> > > > -----------+--------------+
> > > >      1         311195
> > > > -----------+--------------+
> > > >      4         313300
> > > > -----------+--------------+
> > > >     16         315542
> > > > -----------+--------------+
> > > >     64         328584
> > > > -----------+--------------+
> > > >    128         329697
> > > > -----------+--------------+
> > > >    256         312774      ----------> drop compared rx-frames=128
> > > > -----------+--------------+
> > > 
> > > Interesting, in my setup with 3.10.0-671.el7.x86_64.
> > > 
> > > rx-frames 0,   0.63Mpps
> > > rx-frames 64,  0.99Mpps (+57%)
> > > rx-frames 256, 0.99Mpps (+57%)
> > > 
> > > Have you pinned all threads in one numa nodes during testing?
> > > 
> > > Thanks
> > 
> > I tried with your image which guest is using 4.10.0+ kernel, the performance
> > is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the
> > performance is still 0.33Mpps.
> > 
> > So no performance difference between rhel7.4 guest and latest upstream
> > guest,but it seems an existed regression issue between 4.10.0+ and
> > 4.11.0+rc5+ in upstream.
> 
> Can you try net.git or linux.git. My image use net-next which is in fact a
> development tree.
> 
> Thanks

Tried again with guest kernel-4.11.0-rc5+ from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, the result is still bad which is 0.33Mpps. 

So it's a upstream bug ? May I open one bug for tracking it and close this bug ?

Comment 22 Quan Wenli 2017-06-15 06:51:57 UTC

(In reply to Quan Wenli from comment #21)
> (In reply to jason wang from comment #20)
> > (In reply to Quan Wenli from comment #18)
> > > (In reply to jason wang from comment #16)
> > > > (In reply to Quan Wenli from comment #15)
> > > > > Hi, jason, wei 
> > > > > 
> > > > > Please check following performance results, pps is increased with 1
> > > > > rx-frames to 128 rx-frames, but with performance drop with 256 rx-frames. is
> > > > > it expected?
> > > > > 
> > > > > 
> > > > > 
> > > > > Ｓteps: 
> > > > > 1. boot 2 vms in same bridge. 
> > > > > 2. run pktgen.sh on device eth0 on vm1. make sure the eth0's mac address on
> > > > > vm2 assigned in pktgen.sh script. 
> > > > > 3. gather pps result on vm2. 
> > > > > 
> > > > >  rx-frames      pkts/s
> > > > > -----------+--------------+
> > > > >      0         311290
> > > > > -----------+--------------+
> > > > >      1         311195
> > > > > -----------+--------------+
> > > > >      4         313300
> > > > > -----------+--------------+
> > > > >     16         315542
> > > > > -----------+--------------+
> > > > >     64         328584
> > > > > -----------+--------------+
> > > > >    128         329697
> > > > > -----------+--------------+
> > > > >    256         312774      ----------> drop compared rx-frames=128
> > > > > -----------+--------------+
> > > > 
> > > > Interesting, in my setup with 3.10.0-671.el7.x86_64.
> > > > 
> > > > rx-frames 0,   0.63Mpps
> > > > rx-frames 64,  0.99Mpps (+57%)
> > > > rx-frames 256, 0.99Mpps (+57%)
> > > > 
> > > > Have you pinned all threads in one numa nodes during testing?
> > > > 
> > > > Thanks
> > > 
> > > I tried with your image which guest is using 4.10.0+ kernel, the performance
> > > is up to 0.5Mpps, then I also tried with latest upstream 4.11.0-rc5+, the
> > > performance is still 0.33Mpps.
> > > 
> > > So no performance difference between rhel7.4 guest and latest upstream
> > > guest,but it seems an existed regression issue between 4.10.0+ and
> > > 4.11.0+rc5+ in upstream.
> > 
> > Can you try net.git or linux.git. My image use net-next which is in fact a
> > development tree.
> > 
> > Thanks
> 
> Tried again with guest kernel-4.11.0-rc5+ from
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, the result
> is still bad which is 0.33Mpps. 
> 
> So it's a upstream bug ? May I open one bug for tracking it and close this
> bug ?

After check again, I found there is no regression in upstream, the root cause for regression pps between 4.10 to 4.11 is the different param in pktgen.sh. 

１. Both enabled dst (IP) and dst_mac, the pps performance was minium with 0.25. 

2. Only enabled dst_mac, the pps performance was middle with 0.32 which I got with 4.11-rc5+ kernel.

3. Only enabled dst(IP), the pps performance was maxium with 0.5 which I got with 4.10 kernel.

So there is regression in upstream.

And for this bug with only dst (IP), the pps performance was indeed improved with enlarging rx-frames. 

rx-frames 0,    0.50
rx-frames 1,    0.53
rx-frames 4,    0.56
rx-frames 64,   0.64


Base on above, change it to verified.

Comment 23 Quan Wenli 2017-06-19 05:17:18 UTC

> 
> After check again, I found there is no regression in upstream, the root
> cause for regression pps between 4.10 to 4.11 is the different param in
> pktgen.sh. 
> 
> １. Both enabled dst (IP) and dst_mac, the pps performance was minium with
> 0.25. 
> 
> 2. Only enabled dst_mac, the pps performance was middle with 0.32 which I
> got with 4.11-rc5+ kernel.
> 
> 3. Only enabled dst(IP), the pps performance was maxium with 0.5 which I got
> with 4.10 kernel.
> 
> So there is regression in upstream.

Should be no regression in upstream

Comment 25 errata-xmlrpc 2017-08-02 04:53:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842

Note You need to log in before you can comment on or make changes to this bug.