1293233 – el7: multiqueue virt-io does not show performance benefit

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1293233 - el7: multiqueue virt-io does not show performance benefit

Summary: el7: multiqueue virt-io does not show performance benefit

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	7.3
Assignee:	jason wang
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	962749 1258206 1309274
TreeView+	depends on / blocked

Reported:	2015-12-21 07:26 UTC by Dan Kenigsberg
Modified:	2016-07-04 02:40 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-07-03 06:20:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Dan Kenigsberg 2015-12-21 07:26:04 UTC

Description of problem:


Version-Release number of selected component (if applicable):
host kernel 3.10.0-229.24.2.el7.x86_64
guest kernel 3.10.0-327.el7.x86_64

How reproducible:
100%


Steps to Reproduce:
1. connect two VMs (with 2 cpus each) to a bridge via virtio
2. set #queue=2
3. enable `ethtool -L eth1 combined 2` on both guests
4. set MTU=64000 on both guests

Actual results:
Throughput stays on 19.3 Gbits/sec, just as with single queue.

Expected results:
A drastic improvement in throughput.

Comment 1 jason wang 2015-12-21 07:59:43 UTC

Can QE reproduce this on a recent kernel/qemu-kvm-{rhev} ?

Comment 2 juzhang 2015-12-21 08:07:16 UTC

(In reply to jason wang from comment #1)
> Can QE reproduce this on a recent kernel/qemu-kvm-{rhev} ?

Hi Wenli and Yanhui,

Could you handle this issue?

Best Regards,
Junyi

Comment 3 Dan Kenigsberg 2015-12-21 12:58:53 UTC

I had qemu-kvm-ev-2.3.0-29.1.el7.x86_64 installed

Comment 4 Quan Wenli 2015-12-24 10:04:33 UTC

Tried it on my host. it can be reproduced. 

3.10.0-327.5.1.el7.x86_64 (host/guest)
qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64
Netperf version 2.6.0 

When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6 Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec

When set MTU=64000 on both guest and two tap on host, performance with single queue (no mq enabled) is 20.8 Gbits/sec , and performance with queues=2 (mq enabled) is 21.1 Gbits/sec.

Comment 7 jason wang 2016-01-08 05:11:26 UTC

(In reply to Quan Wenli from comment #4)
> Tried it on my host. it can be reproduced. 
> 
> 3.10.0-327.5.1.el7.x86_64 (host/guest)
> qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64
> Netperf version 2.6.0 
> 
> When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6
> Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec
> 
> When set MTU=64000 on both guest and two tap on host, performance with
> single queue (no mq enabled) is 20.8 Gbits/sec , and performance with
> queues=2 (mq enabled) is 21.1 Gbits/sec.

Quick test on upstream shows no such issue. Is this a regression?

Comment 8 Quan Wenli 2016-01-11 08:47:06 UTC

(In reply to jason wang from comment #7)
> (In reply to Quan Wenli from comment #4)
> > Tried it on my host. it can be reproduced. 
> > 
> > 3.10.0-327.5.1.el7.x86_64 (host/guest)
> > qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64
> > Netperf version 2.6.0 
> > 
> > When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6
> > Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec
> > 
> > When set MTU=64000 on both guest and two tap on host, performance with
> > single queue (no mq enabled) is 20.8 Gbits/sec , and performance with
> > queues=2 (mq enabled) is 21.1 Gbits/sec.
> 
> Quick test on upstream shows no such issue. Is this a regression?

My machines was running for 6.8 tests, I will try if it's a regression 3 days later.

Comment 9 Quan Wenli 2016-01-15 05:02:54 UTC

All the machines in china's lab are going migration. I will try it once I get the machines.

Comment 10 jason wang 2016-01-21 09:16:56 UTC

(In reply to Quan Wenli from comment #4)
> Tried it on my host. it can be reproduced. 
> 
> 3.10.0-327.5.1.el7.x86_64 (host/guest)
> qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64
> Netperf version 2.6.0 
> 
> When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6
> Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec
> 
> When set MTU=64000 on both guest and two tap on host, performance with
> single queue (no mq enabled) is 20.8 Gbits/sec , and performance with
> queues=2 (mq enabled) is 21.1 Gbits/sec.

How many sessions are you used in your testing? I rem Dan use 2 instances of iperfs in parallel, am I right, Dan?

Comment 11 Quan Wenli 2016-01-21 09:22:23 UTC

Test with pin mode, pin vhost/vcpus to unique pvpus on host.

Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm
One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1

+--------------------------------+--------------------------+
|   Guest   |  None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) |
+-----------+--------------------+--------------------------+
| upstream  |           20       |      19.7                |             
| 327.9.1   |           19.9     |      19.8                |
| 327.5.1   |           19.9     |      19.8                |
| 327       |           19.8     |      19.8                |
| 229       |           16.9     |      17.2                |
| 113       |           17.1     |      17.0                |
+-----------+--------------------+--------------------------+

So, I think it's not a regression.

Btw, I also switch the host kernel to upstream, but it's always host crash when neterf was running on guest.

Comment 12 Dan Kenigsberg 2016-01-22 08:05:34 UTC

(In reply to jason wang from comment #10)

> How many sessions are you used in your testing? I rem Dan use 2 instances of
> iperfs in parallel, am I right, Dan?

I believe we used a single iperf client process, with -P 2. But it does not really matter: we would like to know how can we gain anything from using multi-queues.

Comment 13 jason wang 2016-01-22 08:15:48 UTC

(In reply to Dan Kenigsberg from comment #12)
> (In reply to jason wang from comment #10)
> 
> > How many sessions are you used in your testing? I rem Dan use 2 instances of
> > iperfs in parallel, am I right, Dan?
> 
> I believe we used a single iperf client process, with -P 2. But it does not
> really matter: we would like to know how can we gain anything from using
> multi-queues.

Well, you need produce more than one flows in order to gain improvement from multiqueue. If only one flow is used, in default configuration, only one queue will be used even if multiqueue is enabled.

Comment 14 jason wang 2016-01-22 08:17:27 UTC

(In reply to Quan Wenli from comment #11)
> Test with pin mode, pin vhost/vcpus to unique pvpus on host.
> 
> Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm
> One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1
> 
> +--------------------------------+--------------------------+
> |   Guest   |  None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) |
> +-----------+--------------------+--------------------------+
> | upstream  |           20       |      19.7                |             
> | 327.9.1   |           19.9     |      19.8                |
> | 327.5.1   |           19.9     |      19.8                |
> | 327       |           19.8     |      19.8                |
> | 229       |           16.9     |      17.2                |
> | 113       |           17.1     |      17.0                |
> +-----------+--------------------+--------------------------+
> 
> So, I think it's not a regression.

That's expected since only one session is used.

> 
> Btw, I also switch the host kernel to upstream, but it's always host crash
> when neterf was running on guest.

Can you please retest with more than one sessions (e.g 2)? My test on upstream show 1.6x-1.8x improvement in this case for 2q.

Thanks

Comment 15 Dan Kenigsberg 2016-01-23 16:19:30 UTC

(In reply to jason wang from comment #13)

> Well, you need produce more than one flows in order to gain improvement from
> multiqueue.

Doesn't -P 2 do the trick?

(In reply to jason wang from comment #14)

> 
> Can you please retest with more than one sessions (e.g 2)? My test on
> upstream show 1.6x-1.8x improvement in this case for 2q.

And what about downstream el7? Would you be kind to share the source and client command line used to demonstrate this improvement? We have tried several command lines per your suggestion, and found no improvement, but I'm certain that Quan Wenli would be happy to prove us wrong.

Comment 16 jason wang 2016-01-25 02:38:10 UTC

(In reply to Dan Kenigsberg from comment #15)
> (In reply to jason wang from comment #13)
> 
> > Well, you need produce more than one flows in order to gain improvement from
> > multiqueue.
> 
> Doesn't -P 2 do the trick?

Yes, but if I read the test result correctly. Wenli only use 1 sessions of netperf and that's why I'm asking for retesting with more than 1.

> 
> (In reply to jason wang from comment #14)
> 
> > 
> > Can you please retest with more than one sessions (e.g 2)? My test on
> > upstream show 1.6x-1.8x improvement in this case for 2q.
> 
> And what about downstream el7?

Haven't found time to do this.

 Would you be kind to share the source and
> client command line used to demonstrate this improvement?

Rather simple cli:

$qemu_path $img_path -netdev tap,id=hn0,queues=$queues,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hn0,mq=on,vectors=32,mac=$mac -drive file=$img_path,id=img1,if=none,format=qcow2 -enable-kvm $@ -vnc :11 -m 4G 


 We have tried
> several command lines per your suggestion, and found no improvement, but I'm
> certain that Quan Wenli would be happy to prove us wrong.

I do think it's a bug of el7 :). But suspect it was a regression. So I'm asking for qe for a baseline to do bisection.

Thanks

Comment 17 Quan Wenli 2016-01-25 06:52:38 UTC

(In reply to jason wang from comment #14)
> (In reply to Quan Wenli from comment #11)
> > Test with pin mode, pin vhost/vcpus to unique pvpus on host.
> > 
> > Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm
> > One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1
> > 
> > +--------------------------------+--------------------------+
> > |   Guest   |  None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) |
> > +-----------+--------------------+--------------------------+
> > | upstream  |           20       |      19.7                |             
> > | 327.9.1   |           19.9     |      19.8                |
> > | 327.5.1   |           19.9     |      19.8                |
> > | 327       |           19.8     |      19.8                |
> > | 229       |           16.9     |      17.2                |
> > | 113       |           17.1     |      17.0                |
> > +-----------+--------------------+--------------------------+
> > 
> > So, I think it's not a regression.
> 
> That's expected since only one session is used.
> 
> > 
> > Btw, I also switch the host kernel to upstream, but it's always host crash
> > when neterf was running on guest.
> 
> Can you please retest with more than one sessions (e.g 2)? My test on
> upstream show 1.6x-1.8x improvement in this case for 2q.
> 
> Thanks

Yes, tried with 2 sessions of netperf. throughput is 20Gb/s around for none-mq.throughput is 34Gb/s around with mq. it's same with your result.

Comment 18 Quan Wenli 2016-01-25 07:56:42 UTC

(In reply to Quan Wenli from comment #17)
> (In reply to jason wang from comment #14)
> > (In reply to Quan Wenli from comment #11)
> > > Test with pin mode, pin vhost/vcpus to unique pvpus on host.
> > > 
> > > Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm
> > > One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1
> > > 
> > > +--------------------------------+--------------------------+
> > > |   Guest   |  None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) |
> > > +-----------+--------------------+--------------------------+
> > > | upstream  |           20       |      19.7                |             
> > > | 327.9.1   |           19.9     |      19.8                |
> > > | 327.5.1   |           19.9     |      19.8                |
> > > | 327       |           19.8     |      19.8                |
> > > | 229       |           16.9     |      17.2                |
> > > | 113       |           17.1     |      17.0                |
> > > +-----------+--------------------+--------------------------+
> > > 
> > > So, I think it's not a regression.
> > 
> > That's expected since only one session is used.
> > 
> > > 
> > > Btw, I also switch the host kernel to upstream, but it's always host crash
> > > when neterf was running on guest.
> > 
> > Can you please retest with more than one sessions (e.g 2)? My test on
> > upstream show 1.6x-1.8x improvement in this case for 2q.
> > 
> > Thanks
> 
> Yes, tried with 2 sessions of netperf. throughput is 20Gb/s around for
> none-mq.throughput is 34Gb/s around with mq. it's same with your result.

Also tried with iperf with -P 2, got the same result. 
None-mq: 20 Gbits/sec, MQ: 38.5 Gbits/sec

Comment 19 jason wang 2016-01-28 02:49:21 UTC

Dan:

Looks like QE could not reproduce the problem. So a question is can you reproduce the issue in another environment/setup? And I would check the setup again it is available for me.

Thanks

Comment 20 Dan Kenigsberg 2016-01-28 08:09:15 UTC

Would you please share your client and server netperf command line?

None of the ones that we have tried showed any benefit for mq, so I'd like to imitate you as closely as possible.

Comment 21 Quan Wenli 2016-01-28 08:34:06 UTC

(In reply to Dan Kenigsberg from comment #20)
> Would you please share your client and server netperf command line?
> 
> None of the ones that we have tried showed any benefit for mq, so I'd like
> to imitate you as closely as possible.

Of course. 

Run "netserver" on first guest. run " for i in `seq 2`; do  netperf -H ip_of_netserver -l 60 -D 1 & done " on second guest.

Comment 23 jason wang 2016-07-01 05:34:13 UTC

Defer to 7.4.

Dan:

Do you still meet this?

Thanks

Comment 24 Dan Kenigsberg 2016-07-03 06:20:36 UTC

I'll have it reopen if and when we have the resources to chase the performance benefit again, and fail to see it.

Comment 25 jason wang 2016-07-04 02:40:39 UTC

*** Bug 1272311 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.