Bug 1309826

Summary: Throughput of 4-concurrent netperf/TCP_STREAM test streams from a 4-queue vhost-net to Host degraded during a 24+ hours testing
Product: Red Hat Enterprise Linux 7 Reporter: Jean-Tsung Hsiao <jhsiao>
Component: openvswitchAssignee: Flavio Leitner <fleitner>
Status: CLOSED WORKSFORME QA Contact: Jean-Tsung Hsiao <jhsiao>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: aloughla, atragler, jhsiao, kzhang, mleitner, qding, rcain, rkhan
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-13 01:20:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jean-Tsung Hsiao 2016-02-18 18:58:17 UTC
Description of problem: Throughput of 4-concurrent netperf/TCP_STREAM test streams from a 4-queue vhost-net to Host degraded during a 24+ hours testing

Initially, the throughput rate was above 50 Gb, but then it started to degrade below 40 Gb around 12 hours mark; Eventually, went down to above 30 Gb.

Version-Release number of selected component (if applicable):

Linux netqe5.knqe.lab.eng.bos.redhat.com 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

libvirt-1.2.17-13.el7_2.2.x86_64
 
openvswitch-dpdk-2.4.0-0.10346.git97bab959.2.el7_2.x86_64.rpm

How reproducible: reproducible


Steps to Reproduce: 
1. Configure a ordinary OVS bridge on a host --- ovsbr0.

2.Configure a vhost-net guest with four CPU cores and four queues.
 
 <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='3'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='9'/>
  </cputune>

    <interface type='bridge'>
      <mac address='52:54:00:b7:44:50'/>
      <source bridge='ovsbr0'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='dff11f03-5ae5-4c61-9e52-bb8df9950d5d'/>
      </virtualport>
      <model type='virtio'/>
      <driver name='vhost' queues='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
 
2.Start the guest.
3.Add internal port int0 to ovsbr0.
4. Add IP addrs to eth0 at guest, and int0 at the host.
5. Run 4-concurrent netperf/TCP_STREAM test streams from eth0 to int0.

Below is a script that would last 16 hours.

[root@localhost jhsiao]# cat run_netperf_tcp_mq_4_taskset.sh
for i in {1..192}
do
MPSTAT=/tmp/mpstat."$i"
LOG=netperf_tcp_T."$i"
(sleep 15; mpstat -P ALL 3 90 > $MPSTAT) &
ssh 192.168.122.1 "sleep 15; mpstat -P ALL 3 90 > $MPSTAT" &
echo Test $i
taskset -c 0 netperf -H 172.16.3.105  -l 300  > $LOG.0 2>&1 &
taskset -c 1 netperf -H 172.16.3.105  -l 300  > $LOG.1 2>&1 &
taskset -c 2 netperf -H 172.16.3.105  -l 300  > $LOG.2 2>&1 &
taskset -c 3 netperf -H 172.16.3.105  -l 300  > $LOG.3 2>&1 &
sleep 15
perf record -g -o screenshot-0.$i -C 0 sleep 1
perf record -g -o screenshot-1.$i -C 1 sleep 1
perf record -g -o screenshot-2.$i -C 2 sleep 1
perf record -g -o screenshot-3.$i -C 3 sleep 1
wait
done


Actual results:
The aggregate throughput went down from above 50 Gb to below 40 Gb in about 12 hours.

Expected results:
No degradation.

Additional info:

Comment 2 Jean-Tsung Hsiao 2016-02-19 13:55:56 UTC
Correction: openvswitch used is:

openvswitch-2.4.0-1.el7.x86_64

Comment 3 Flavio Leitner 2016-05-26 19:28:48 UTC
Jean,
Could you see if 2.5 still has this issue?

Thanks,
fbl

Comment 4 Jean-Tsung Hsiao 2016-05-26 20:29:39 UTC
(In reply to Flavio Leitner from comment #3)
> Jean,
> Could you see if 2.5 still has this issue?
> 
> Thanks,
> fbl

Are you referring to the -4 version ?

http://download.eng.bos.redhat.com/brewroot/packages/openvswitch-dpdk/2.5.0/4.el7/x86_64/openvswitch-dpdk-2.5.0-4.el7.x86_64.rpm

Comment 5 Jean-Tsung Hsiao 2016-05-26 20:38:36 UTC
(In reply to Flavio Leitner from comment #3)
> Jean,
> Could you see if 2.5 still has this issue?
> 
> Thanks,
> fbl

I'll work on it.

Comment 6 Jean-Tsung Hsiao 2016-05-27 20:27:59 UTC
(In reply to Flavio Leitner from comment #3)
> Jean,
> Could you see if 2.5 still has this issue?
> 
> Thanks,
> fbl

Hi Flavio,

After repeating the original test, the resulting data confirms that 2.5 doesn't have this issue.

Please see below.

Thanks!

Jean

min/ave/max of the set of 192 throughputs: 44189.1 / 57587.3 / 61627.4
 
Related softwares:

openvswitch-2.5.0-3.el7.x86_64

Linux netqe6.knqe.lab.eng.bos.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux