Bug 1293233 - el7: multiqueue virt-io does not show performance benefit
el7: multiqueue virt-io does not show performance benefit
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.1
Unspecified Unspecified
unspecified Severity medium
: rc
: 7.3
Assigned To: jason wang
Virtualization Bugs
:
Depends On:
Blocks: 1309274 962749 1258206
  Show dependency treegraph
 
Reported: 2015-12-21 02:26 EST by Dan Kenigsberg
Modified: 2016-07-03 22:40 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-07-03 02:20:36 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dan Kenigsberg 2015-12-21 02:26:04 EST
Description of problem:


Version-Release number of selected component (if applicable):
host kernel 3.10.0-229.24.2.el7.x86_64
guest kernel 3.10.0-327.el7.x86_64

How reproducible:
100%


Steps to Reproduce:
1. connect two VMs (with 2 cpus each) to a bridge via virtio
2. set #queue=2
3. enable `ethtool -L eth1 combined 2` on both guests
4. set MTU=64000 on both guests

Actual results:
Throughput stays on 19.3 Gbits/sec, just as with single queue.

Expected results:
A drastic improvement in throughput.
Comment 1 jason wang 2015-12-21 02:59:43 EST
Can QE reproduce this on a recent kernel/qemu-kvm-{rhev} ?
Comment 2 juzhang 2015-12-21 03:07:16 EST
(In reply to jason wang from comment #1)
> Can QE reproduce this on a recent kernel/qemu-kvm-{rhev} ?

Hi Wenli and Yanhui,

Could you handle this issue?

Best Regards,
Junyi
Comment 3 Dan Kenigsberg 2015-12-21 07:58:53 EST
I had qemu-kvm-ev-2.3.0-29.1.el7.x86_64 installed
Comment 4 Quan Wenli 2015-12-24 05:04:33 EST
Tried it on my host. it can be reproduced. 

3.10.0-327.5.1.el7.x86_64 (host/guest)
qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64
Netperf version 2.6.0 

When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6 Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec

When set MTU=64000 on both guest and two tap on host, performance with single queue (no mq enabled) is 20.8 Gbits/sec , and performance with queues=2 (mq enabled) is 21.1 Gbits/sec.
Comment 7 jason wang 2016-01-08 00:11:26 EST
(In reply to Quan Wenli from comment #4)
> Tried it on my host. it can be reproduced. 
> 
> 3.10.0-327.5.1.el7.x86_64 (host/guest)
> qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64
> Netperf version 2.6.0 
> 
> When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6
> Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec
> 
> When set MTU=64000 on both guest and two tap on host, performance with
> single queue (no mq enabled) is 20.8 Gbits/sec , and performance with
> queues=2 (mq enabled) is 21.1 Gbits/sec.

Quick test on upstream shows no such issue. Is this a regression?
Comment 8 Quan Wenli 2016-01-11 03:47:06 EST
(In reply to jason wang from comment #7)
> (In reply to Quan Wenli from comment #4)
> > Tried it on my host. it can be reproduced. 
> > 
> > 3.10.0-327.5.1.el7.x86_64 (host/guest)
> > qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64
> > Netperf version 2.6.0 
> > 
> > When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6
> > Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec
> > 
> > When set MTU=64000 on both guest and two tap on host, performance with
> > single queue (no mq enabled) is 20.8 Gbits/sec , and performance with
> > queues=2 (mq enabled) is 21.1 Gbits/sec.
> 
> Quick test on upstream shows no such issue. Is this a regression?

My machines was running for 6.8 tests, I will try if it's a regression 3 days later.
Comment 9 Quan Wenli 2016-01-15 00:02:54 EST
All the machines in china's lab are going migration. I will try it once I get the machines.
Comment 10 jason wang 2016-01-21 04:16:56 EST
(In reply to Quan Wenli from comment #4)
> Tried it on my host. it can be reproduced. 
> 
> 3.10.0-327.5.1.el7.x86_64 (host/guest)
> qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64
> Netperf version 2.6.0 
> 
> When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6
> Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec
> 
> When set MTU=64000 on both guest and two tap on host, performance with
> single queue (no mq enabled) is 20.8 Gbits/sec , and performance with
> queues=2 (mq enabled) is 21.1 Gbits/sec.

How many sessions are you used in your testing? I rem Dan use 2 instances of iperfs in parallel, am I right, Dan?
Comment 11 Quan Wenli 2016-01-21 04:22:23 EST
Test with pin mode, pin vhost/vcpus to unique pvpus on host.

Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm
One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1

+--------------------------------+--------------------------+
|   Guest   |  None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) |
+-----------+--------------------+--------------------------+
| upstream  |           20       |      19.7                |             
| 327.9.1   |           19.9     |      19.8                |
| 327.5.1   |           19.9     |      19.8                |
| 327       |           19.8     |      19.8                |
| 229       |           16.9     |      17.2                |
| 113       |           17.1     |      17.0                |
+-----------+--------------------+--------------------------+

So, I think it's not a regression.

Btw, I also switch the host kernel to upstream, but it's always host crash when neterf was running on guest.
Comment 12 Dan Kenigsberg 2016-01-22 03:05:34 EST
(In reply to jason wang from comment #10)

> How many sessions are you used in your testing? I rem Dan use 2 instances of
> iperfs in parallel, am I right, Dan?

I believe we used a single iperf client process, with -P 2. But it does not really matter: we would like to know how can we gain anything from using multi-queues.
Comment 13 jason wang 2016-01-22 03:15:48 EST
(In reply to Dan Kenigsberg from comment #12)
> (In reply to jason wang from comment #10)
> 
> > How many sessions are you used in your testing? I rem Dan use 2 instances of
> > iperfs in parallel, am I right, Dan?
> 
> I believe we used a single iperf client process, with -P 2. But it does not
> really matter: we would like to know how can we gain anything from using
> multi-queues.

Well, you need produce more than one flows in order to gain improvement from multiqueue. If only one flow is used, in default configuration, only one queue will be used even if multiqueue is enabled.
Comment 14 jason wang 2016-01-22 03:17:27 EST
(In reply to Quan Wenli from comment #11)
> Test with pin mode, pin vhost/vcpus to unique pvpus on host.
> 
> Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm
> One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1
> 
> +--------------------------------+--------------------------+
> |   Guest   |  None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) |
> +-----------+--------------------+--------------------------+
> | upstream  |           20       |      19.7                |             
> | 327.9.1   |           19.9     |      19.8                |
> | 327.5.1   |           19.9     |      19.8                |
> | 327       |           19.8     |      19.8                |
> | 229       |           16.9     |      17.2                |
> | 113       |           17.1     |      17.0                |
> +-----------+--------------------+--------------------------+
> 
> So, I think it's not a regression.

That's expected since only one session is used.

> 
> Btw, I also switch the host kernel to upstream, but it's always host crash
> when neterf was running on guest.

Can you please retest with more than one sessions (e.g 2)? My test on upstream show 1.6x-1.8x improvement in this case for 2q.

Thanks
Comment 15 Dan Kenigsberg 2016-01-23 11:19:30 EST
(In reply to jason wang from comment #13)

> Well, you need produce more than one flows in order to gain improvement from
> multiqueue.

Doesn't -P 2 do the trick?

(In reply to jason wang from comment #14)

> 
> Can you please retest with more than one sessions (e.g 2)? My test on
> upstream show 1.6x-1.8x improvement in this case for 2q.

And what about downstream el7? Would you be kind to share the source and client command line used to demonstrate this improvement? We have tried several command lines per your suggestion, and found no improvement, but I'm certain that Quan Wenli would be happy to prove us wrong.
Comment 16 jason wang 2016-01-24 21:38:10 EST
(In reply to Dan Kenigsberg from comment #15)
> (In reply to jason wang from comment #13)
> 
> > Well, you need produce more than one flows in order to gain improvement from
> > multiqueue.
> 
> Doesn't -P 2 do the trick?

Yes, but if I read the test result correctly. Wenli only use 1 sessions of netperf and that's why I'm asking for retesting with more than 1.

> 
> (In reply to jason wang from comment #14)
> 
> > 
> > Can you please retest with more than one sessions (e.g 2)? My test on
> > upstream show 1.6x-1.8x improvement in this case for 2q.
> 
> And what about downstream el7?

Haven't found time to do this.

 Would you be kind to share the source and
> client command line used to demonstrate this improvement?

Rather simple cli:

$qemu_path $img_path -netdev tap,id=hn0,queues=$queues,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hn0,mq=on,vectors=32,mac=$mac -drive file=$img_path,id=img1,if=none,format=qcow2 -enable-kvm $@ -vnc :11 -m 4G 


 We have tried
> several command lines per your suggestion, and found no improvement, but I'm
> certain that Quan Wenli would be happy to prove us wrong.

I do think it's a bug of el7 :). But suspect it was a regression. So I'm asking for qe for a baseline to do bisection.

Thanks
Comment 17 Quan Wenli 2016-01-25 01:52:38 EST
(In reply to jason wang from comment #14)
> (In reply to Quan Wenli from comment #11)
> > Test with pin mode, pin vhost/vcpus to unique pvpus on host.
> > 
> > Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm
> > One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1
> > 
> > +--------------------------------+--------------------------+
> > |   Guest   |  None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) |
> > +-----------+--------------------+--------------------------+
> > | upstream  |           20       |      19.7                |             
> > | 327.9.1   |           19.9     |      19.8                |
> > | 327.5.1   |           19.9     |      19.8                |
> > | 327       |           19.8     |      19.8                |
> > | 229       |           16.9     |      17.2                |
> > | 113       |           17.1     |      17.0                |
> > +-----------+--------------------+--------------------------+
> > 
> > So, I think it's not a regression.
> 
> That's expected since only one session is used.
> 
> > 
> > Btw, I also switch the host kernel to upstream, but it's always host crash
> > when neterf was running on guest.
> 
> Can you please retest with more than one sessions (e.g 2)? My test on
> upstream show 1.6x-1.8x improvement in this case for 2q.
> 
> Thanks

Yes, tried with 2 sessions of netperf. throughput is 20Gb/s around for none-mq.throughput is 34Gb/s around with mq. it's same with your result.
Comment 18 Quan Wenli 2016-01-25 02:56:42 EST
(In reply to Quan Wenli from comment #17)
> (In reply to jason wang from comment #14)
> > (In reply to Quan Wenli from comment #11)
> > > Test with pin mode, pin vhost/vcpus to unique pvpus on host.
> > > 
> > > Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm
> > > One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1
> > > 
> > > +--------------------------------+--------------------------+
> > > |   Guest   |  None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) |
> > > +-----------+--------------------+--------------------------+
> > > | upstream  |           20       |      19.7                |             
> > > | 327.9.1   |           19.9     |      19.8                |
> > > | 327.5.1   |           19.9     |      19.8                |
> > > | 327       |           19.8     |      19.8                |
> > > | 229       |           16.9     |      17.2                |
> > > | 113       |           17.1     |      17.0                |
> > > +-----------+--------------------+--------------------------+
> > > 
> > > So, I think it's not a regression.
> > 
> > That's expected since only one session is used.
> > 
> > > 
> > > Btw, I also switch the host kernel to upstream, but it's always host crash
> > > when neterf was running on guest.
> > 
> > Can you please retest with more than one sessions (e.g 2)? My test on
> > upstream show 1.6x-1.8x improvement in this case for 2q.
> > 
> > Thanks
> 
> Yes, tried with 2 sessions of netperf. throughput is 20Gb/s around for
> none-mq.throughput is 34Gb/s around with mq. it's same with your result.

Also tried with iperf with -P 2, got the same result. 
None-mq: 20 Gbits/sec, MQ: 38.5 Gbits/sec
Comment 19 jason wang 2016-01-27 21:49:21 EST
Dan:

Looks like QE could not reproduce the problem. So a question is can you reproduce the issue in another environment/setup? And I would check the setup again it is available for me.

Thanks
Comment 20 Dan Kenigsberg 2016-01-28 03:09:15 EST
Would you please share your client and server netperf command line?

None of the ones that we have tried showed any benefit for mq, so I'd like to imitate you as closely as possible.
Comment 21 Quan Wenli 2016-01-28 03:34:06 EST
(In reply to Dan Kenigsberg from comment #20)
> Would you please share your client and server netperf command line?
> 
> None of the ones that we have tried showed any benefit for mq, so I'd like
> to imitate you as closely as possible.

Of course. 

Run "netserver" on first guest. run " for i in `seq 2`; do  netperf -H ip_of_netserver -l 60 -D 1 & done " on second guest.
Comment 23 jason wang 2016-07-01 01:34:13 EDT
Defer to 7.4.

Dan:

Do you still meet this?

Thanks
Comment 24 Dan Kenigsberg 2016-07-03 02:20:36 EDT
I'll have it reopen if and when we have the resources to chase the performance benefit again, and fail to see it.
Comment 25 jason wang 2016-07-03 22:40:39 EDT
*** Bug 1272311 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.