Bug 1352741

Summary:	tx array support in tun
Product:	Red Hat Enterprise Linux 7	Reporter:	jason wang <jasowang>
Component:	kernel	Assignee:	Wei <wexu>
kernel sub component:	Networking	QA Contact:	xiywang
Status:	CLOSED ERRATA	Docs Contact:	Jiri Herrmann <jherrman>
Severity:	unspecified
Priority:	high	CC:	ailan, atragler, chayang, huding, jasowang, jpirko, juzhang, kzhang, mst, mtessun, tgraf, weliao, wexu, wquan, xiywang
Version:	7.3	Keywords:	FutureFeature
Target Milestone:	rc
Target Release:	7.4
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	kernel-3.10.0-656.el7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-08-01 20:15:05 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1395265, 1401433, 1414006

Description jason wang 2016-07-05 01:20:22 UTC

Description of problem:

We need tx array support in tun for accelerating rx pps in guest:

commit 1576d98605998fb59d121a39581129e134217182
Author: Jason Wang <jasowang>
Date:   Thu Jun 30 14:45:36 2016 +0800

    tun: switch to use skb array for tx
    
    We used to queue tx packets in sk_receive_queue, this is less
    efficient since it requires spinlocks to synchronize between producer
    and consumer.
    
    This patch tries to address this by:
    
    - switch from sk_receive_queue to a skb_array, and resize it when
      tx_queue_len was changed.
    - introduce a new proto_ops peek_len which was used for peeking the
      skb length.
    - implement a tun version of peek_len for vhost_net to use and convert
      vhost_net to use peek_len if possible.
    
    Pktgen test shows about 15.3% improvement on guest receiving pps for small
    buffers:
    
    Before: ~1300000pps
    After : ~1500000pps
    
    Signed-off-by: Jason Wang <jasowang>
    Signed-off-by: David S. Miller <davem>


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 jason wang 2016-07-08 02:48:01 UTC

Note for QE:

- Since this touches tun, it would be better test something like vpn to make sure it does not break anything.

Thanks

Comment 4 jason wang 2016-08-03 01:56:18 UTC

Move back to ASSIGNED.

Comment 6 Wei 2017-01-04 13:30:11 UTC

The percentage of improvement for rhel7 is the same as Jason mentioned in 
upstream commit log.

Get pps gap between beaker machine and Jason's data and my developing machine(5x slower).
Beaker: 48 Cores E5-4650 v3 @ 2.10GHz / 30M L3 Cache
                128G DDR4 2133MHz
        Before:   ~119283 pps
        After:    ~140998 pps
        Upstream: ~221630 pps

Mine:   4 Cores i5-6500 CPU @ 3.20GHz / 6M L3 Cache
                16G DDR4 2133
        Upstream: ~150000 pps

This maybe caused by hardware platform difference.

Comment 7 jason wang 2017-01-04 13:40:13 UTC

(In reply to Wei from comment #6)
> The percentage of improvement for rhel7 is the same as Jason mentioned in 
> upstream commit log.
> 
> Get pps gap between beaker machine and Jason's data and my developing
> machine(5x slower).
> Beaker: 48 Cores E5-4650 v3 @ 2.10GHz / 30M L3 Cache
>                 128G DDR4 2133MHz
>         Before:   ~119283 pps
>         After:    ~140998 pps
>         Upstream: ~221630 pps
> 
> Mine:   4 Cores i5-6500 CPU @ 3.20GHz / 6M L3 Cache
>                 16G DDR4 2133
>         Upstream: ~150000 pps
> 
> This maybe caused by hardware platform difference.

What's your networking configuration and qemu command line?

Comment 8 Wei 2017-01-04 13:58:38 UTC

I'm sending packets(pktgen) from local host to tap interface directly. The guest is running l2fwd with uio driver. All threads(vhost, guest vcpu) bindings are correct.

My qemu command line:
./x86_64-softmmu/qemu-system-x86_64  /vm-tmp/uio-fedora-22-guest-DMAR-tmpfs.qcow2 -netdev tap,id=hn1,script=/etc/qemu-ifup-wei,vhost=on 
-device virtio-net-pci,netdev=hn1,mac=52:54:00:11:22:10
-netdev tap,id=hn2,script=/etc/qemu-ifup-private1,vhost=on
-device virtio-net-pci,netdev=hn2,mac=52:54:00:11:22:12
-netdev tap,id=hn3,script=/etc/qemu-ifup-private2,vhost=on
-device virtio-net-pci,netdev=hn3,mac=52:54:00:11:22:13
-enable-kvm  -vnc 0.0.0.0:2  -smp 3 -m 10G
-cpu qemu64,+ssse3,+sse4.1,+sse4.2 -serial stdio
-machine q35

Comment 9 Wei 2017-01-04 14:09:13 UTC

pktgen log & tap statistics:
[root@hp-bl660cgen9-01 home]# ./pktgen-thread1.sh -i tap1 -d 192.169.1.102 -m 52:54:00:11:22:12
Running... ctrl^C to stop
Done
Result device: tap1
Params: count 100000000  min_pkt_size: 60  max_pkt_size: 60
Result: OK: 173386108(c173373857+d12251) usec, 100000000 (60byte,0frags)
  576747pps 276Mb/sec (276838560bps) errors: 0

tap1(Rx):
[root@hp-bl660cgen9-01 home]# ./05-calc-pps.sh tap1
tap1 TX  141440 pkts/s TX Dropped: 386656 pkts/s
tap1 RX  0 pkts/s RX Dropped: 0 pkts/s

tap2(Tx):
[root@hp-bl660cgen9-01 home]# ./05-calc-pps.sh tap2
tap2 TX  0 pkts/s TX Dropped: 0 pkts/s
tap2 RX  140998 pkts/s RX Dropped: 0 pkts/s

Comment 10 jason wang 2017-01-05 03:05:24 UTC

Wei, can you try not using l2fwd in the guest (just let kernel drops the packets in guest) and post the result here? That's what I test for tx array.

Thanks

Comment 11 Wei 2017-01-05 17:32:02 UTC

I tried benchmarks for different platforms and got different performance with all upstream code.

Beaker Server1: 48 Cores E5-4650 v3 @ 2.10GHz /  30M L3 Cache   ~250k pps
T450s laptop:   4  Cores i7-5600U CPU @ 2.60GHz/ 4M  L3 Cache   ~500k pps
Desktop:        4  Cores i5-6500 CPU @ 3.20GHz / 6M  L3 Cache   ~1.5m pps

Comment 12 Wei 2017-01-10 17:52:38 UTC

The performance gap is caused by the debug dma in kernel config which i generated from my desktop. I did a new round test and also get another server in beaker for try with rhel config. Here is the update number with upstream kernel both for host and guest.

Beaker Server1: 16 Cores E5-5530      @ 2.4GHz  / 8M  L3 Cache   ~1.2M pps
Beaker Server2: 48 Cores E5-4650 v3   @ 2.10GHz / 30M L3 Cache   ~1.4M pps
T450s laptop:   4  Cores i7-5600U CPU @ 2.60GHz / 4M  L3 Cache   ~1.5M pps
Desktop:        4  Cores i5-6500 CPU  @ 3.20GHz / 6M  L3 Cache   ~2m pps

Comment 13 Wei 2017-01-23 13:02:54 UTC

RHEL7.4 performance data:

Test environment.
Beaker Server1: 16 Cores E5-5530      @ 2.4GHz  / 8M L3 Cache
Guest kernel: 4.9 upstream
Running dpdk with uio mode in guest.
Sending packets to tap device directly with pktgen on host.

pps: 
    before: ~0,97 mpps
    after:  ~1.16 mpps

Comment 14 Rafael Aquini 2017-04-20 18:06:15 UTC

Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 16 Rafael Aquini 2017-04-21 12:34:19 UTC

Patch(es) available on kernel-3.10.0-656.el7

Comment 19 Wei 2017-04-24 15:49:57 UTC

Hi Jiri,
It is good to keep it out of the release note because this bz is a performance improvement which differs from a new feature.

Comment 20 xiywang 2017-05-09 07:50:18 UTC

Functional test, result as below.
host&guest 3.10.0-663.el7.x86_64
qemu-kvm-rhev-2.9.0-2.el7.x86_64

1. boot up a guest
/usr/libexec/qemu-kvm -name rhel7.4 -cpu IvyBridge -m 4096 -realtime mlock=off -smp 4 \
-drive file=/home/rhel7.4.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,snapshot=off -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 \
-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,queues=2 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a1:d0:5f,vectors=6,mq=on,host_mtu=9000 \
-monitor stdio -device qxl-vga,id=video0 -serial unix:/tmp/console,server,nowait -vnc :1 -spice port=5900,disable-ticketing

2. install pkcs11-helper and openvpn in guest from brewweb

3. install redhat-internal-cert and redhat-internal-openvpn-profiles in guest from https://redhat.service-now.com/rh_ess/kb_view.do?sysparm_article=KB0005424

4. run 'openvpn --config /etc/openvpn/ovpn-bne-udp.conf' in guest(since pek2 vpn server could not be connected)

5. check tun in guest
# ifconfig
redhat0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST>  mtu 1360
        inet 10.64.54.50  netmask 255.255.254.0  destination 10.64.54.50
        inet6 fe80::ba22:5534:6b03:e397  prefixlen 64  scopeid 0x20<link>
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 100  (UNSPEC)
        RX packets 112  bytes 14230 (13.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 106  bytes 49263 (48.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

6. ping redhat internal website from guest
# ping mail.corp.redhat.com -c 3 -I redhat0
PING mail.corp.redhat.com (10.4.203.66) from 10.64.54.22 redhat0: 56(84) bytes of data.
64 bytes from mail.corp.redhat.com (10.4.203.66): icmp_seq=1 ttl=247 time=352 ms
64 bytes from mail.corp.redhat.com (10.4.203.66): icmp_seq=2 ttl=247 time=352 ms
64 bytes from mail.corp.redhat.com (10.4.203.66): icmp_seq=3 ttl=247 time=353 ms

--- mail.corp.redhat.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 352.395/352.637/353.078/0.312 ms

7. ping redhat internal host from guest
# ping 10.64.54.20 -c 3 -I redhat0
PING 10.64.54.20 (10.64.54.20) from 10.64.54.22 redhat0: 56(84) bytes of data.
64 bytes from 10.64.54.20: icmp_seq=1 ttl=63 time=363 ms
From 10.64.54.1 icmp_seq=2 Redirect Host(New nexthop: 10.64.54.20)
From 10.64.54.1: icmp_seq=2 Redirect Host(New nexthop: 10.64.54.20)
64 bytes from 10.64.54.20: icmp_seq=2 ttl=63 time=361 ms

--- 10.64.54.20 ping statistics ---
2 packets transmitted, 2 received, +1 errors, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 361.664/362.629/363.594/0.965 ms

and no error message in host or guest dmesg

Comment 21 xiywang 2017-05-09 07:55:20 UTC

Hi Jason,

According to https://bugzilla.redhat.com/show_bug.cgi?id=1352741#c20 
see step 7

I used openvpn with bne-udp.conf in both guest and host, since pek2 could not be connected currently.
But while I ping host from guest using tun, I got some err msg of 'Redirect Host...'. 
I'm not sure it's caused by the fact that I used bne openvpn or by this bz.
Could you help to take a look? 

Thanks,
Xiyue

Comment 22 jason wang 2017-05-09 08:57:15 UTC

(In reply to xiywang from comment #21)
> Hi Jason,
> 
> According to https://bugzilla.redhat.com/show_bug.cgi?id=1352741#c20 
> see step 7
> 
> I used openvpn with bne-udp.conf in both guest and host, since pek2 could
> not be connected currently.
> But while I ping host from guest using tun, I got some err msg of 'Redirect
> Host...'. 
> I'm not sure it's caused by the fact that I used bne openvpn or by this bz.
> Could you help to take a look? 
> 
> Thanks,
> Xiyue

Looks not the problem of this bug. In order to be safe, do you see this on 655?

Thanks

Comment 23 xiywang 2017-05-11 02:06:00 UTC

Tested on 3.10.0-655.el7.x86_64, same behavior.
So it should not be a bug related issue.

# ping 10.64.242.69
PING 10.64.242.69 (10.64.242.69) 56(84) bytes of data.
64 bytes from 10.64.242.69: icmp_seq=1 ttl=63 time=188 ms
From 10.64.242.1 icmp_seq=2 Redirect Host(New nexthop: 10.64.242.69)
From 10.64.242.1: icmp_seq=2 Redirect Host(New nexthop: 10.64.242.69)
64 bytes from 10.64.242.69: icmp_seq=2 ttl=63 time=191 ms
From 10.64.242.1 icmp_seq=3 Redirect Host(New nexthop: 10.64.242.69)
From 10.64.242.1: icmp_seq=3 Redirect Host(New nexthop: 10.64.242.69)
64 bytes from 10.64.242.69: icmp_seq=3 ttl=63 time=187 ms
From 10.64.242.1 icmp_seq=4 Redirect Host(New nexthop: 10.64.242.69)
From 10.64.242.1: icmp_seq=4 Redirect Host(New nexthop: 10.64.242.69)
64 bytes from 10.64.242.69: icmp_seq=4 ttl=63 time=189 ms
From 10.64.242.1 icmp_seq=5 Redirect Host(New nexthop: 10.64.242.69)
From 10.64.242.1: icmp_seq=5 Redirect Host(New nexthop: 10.64.242.69)
64 bytes from 10.64.242.69: icmp_seq=5 ttl=63 time=188 ms
^C
--- 10.64.242.69 ping statistics ---
5 packets transmitted, 5 received, +4 errors, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 187.904/188.865/191.128/1.317 ms

Comment 24 xiywang 2017-05-23 02:30:04 UTC

Hi Wenli,

Could you help to do performance test?

Thanks,
Xiyue

Comment 25 Quan Wenli 2017-05-24 03:19:59 UTC

(In reply to xiywang from comment #24)
> Hi Wenli,
> 
> Could you help to do performance test?
> 
> Thanks,
> Xiyue
. 
The tx performance in tun indeed improve with kernel-3.10.0-656. 

Steps: 
1. pktgen on tap device on host. 
2. gather pps result on guest. 

kernel         pkts/s
------------+---------------
3.10.0-655      977662 
------------+---------------
3.10.0-656     1041984
------------+---------------

Comment 26 xiywang 2017-05-24 06:05:29 UTC

Verified both in functional level and performance level. Set to Veirified.

Comment 29 errata-xmlrpc 2017-08-01 20:15:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842

Comment 30 errata-xmlrpc 2017-08-02 00:39:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842