Bug 1293233
| Summary: | el7: multiqueue virt-io does not show performance benefit | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Dan Kenigsberg <danken> |
| Component: | kernel | Assignee: | jason wang <jasowang> |
| kernel sub component: | NIC Drivers | QA Contact: | Virtualization Bugs <virt-bugs> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | medium | ||
| Priority: | unspecified | CC: | ailan, danken, huding, jasowang, juzhang, mmucha, network-qe, sherold, virt-bugs, weliao, wquan, xfu, yama, ykaul, zhanghm.zhm, zhanghongming |
| Version: | 7.1 | ||
| Target Milestone: | rc | ||
| Target Release: | 7.3 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-07-03 06:20:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 962749, 1258206, 1309274 | ||
|
Description
Dan Kenigsberg
2015-12-21 07:26:04 UTC
Can QE reproduce this on a recent kernel/qemu-kvm-{rhev} ?
(In reply to jason wang from comment #1) > Can QE reproduce this on a recent kernel/qemu-kvm-{rhev} ? Hi Wenli and Yanhui, Could you handle this issue? Best Regards, Junyi I had qemu-kvm-ev-2.3.0-29.1.el7.x86_64 installed Tried it on my host. it can be reproduced. 3.10.0-327.5.1.el7.x86_64 (host/guest) qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64 Netperf version 2.6.0 When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6 Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec When set MTU=64000 on both guest and two tap on host, performance with single queue (no mq enabled) is 20.8 Gbits/sec , and performance with queues=2 (mq enabled) is 21.1 Gbits/sec. (In reply to Quan Wenli from comment #4) > Tried it on my host. it can be reproduced. > > 3.10.0-327.5.1.el7.x86_64 (host/guest) > qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64 > Netperf version 2.6.0 > > When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6 > Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec > > When set MTU=64000 on both guest and two tap on host, performance with > single queue (no mq enabled) is 20.8 Gbits/sec , and performance with > queues=2 (mq enabled) is 21.1 Gbits/sec. Quick test on upstream shows no such issue. Is this a regression? (In reply to jason wang from comment #7) > (In reply to Quan Wenli from comment #4) > > Tried it on my host. it can be reproduced. > > > > 3.10.0-327.5.1.el7.x86_64 (host/guest) > > qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64 > > Netperf version 2.6.0 > > > > When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6 > > Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec > > > > When set MTU=64000 on both guest and two tap on host, performance with > > single queue (no mq enabled) is 20.8 Gbits/sec , and performance with > > queues=2 (mq enabled) is 21.1 Gbits/sec. > > Quick test on upstream shows no such issue. Is this a regression? My machines was running for 6.8 tests, I will try if it's a regression 3 days later. All the machines in china's lab are going migration. I will try it once I get the machines. (In reply to Quan Wenli from comment #4) > Tried it on my host. it can be reproduced. > > 3.10.0-327.5.1.el7.x86_64 (host/guest) > qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64 > Netperf version 2.6.0 > > When MTU=1500(default), performance with singe queue (no mq enabled) is 17.6 > Gbits/sec. and performance with queues=2 (mq enabled) is 19.7 Gbits/sec > > When set MTU=64000 on both guest and two tap on host, performance with > single queue (no mq enabled) is 20.8 Gbits/sec , and performance with > queues=2 (mq enabled) is 21.1 Gbits/sec. How many sessions are you used in your testing? I rem Dan use 2 instances of iperfs in parallel, am I right, Dan? Test with pin mode, pin vhost/vcpus to unique pvpus on host. Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1 +--------------------------------+--------------------------+ | Guest | None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) | +-----------+--------------------+--------------------------+ | upstream | 20 | 19.7 | | 327.9.1 | 19.9 | 19.8 | | 327.5.1 | 19.9 | 19.8 | | 327 | 19.8 | 19.8 | | 229 | 16.9 | 17.2 | | 113 | 17.1 | 17.0 | +-----------+--------------------+--------------------------+ So, I think it's not a regression. Btw, I also switch the host kernel to upstream, but it's always host crash when neterf was running on guest. (In reply to jason wang from comment #10) > How many sessions are you used in your testing? I rem Dan use 2 instances of > iperfs in parallel, am I right, Dan? I believe we used a single iperf client process, with -P 2. But it does not really matter: we would like to know how can we gain anything from using multi-queues. (In reply to Dan Kenigsberg from comment #12) > (In reply to jason wang from comment #10) > > > How many sessions are you used in your testing? I rem Dan use 2 instances of > > iperfs in parallel, am I right, Dan? > > I believe we used a single iperf client process, with -P 2. But it does not > really matter: we would like to know how can we gain anything from using > multi-queues. Well, you need produce more than one flows in order to gain improvement from multiqueue. If only one flow is used, in default configuration, only one queue will be used even if multiqueue is enabled. (In reply to Quan Wenli from comment #11) > Test with pin mode, pin vhost/vcpus to unique pvpus on host. > > Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm > One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1 > > +--------------------------------+--------------------------+ > | Guest | None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) | > +-----------+--------------------+--------------------------+ > | upstream | 20 | 19.7 | > | 327.9.1 | 19.9 | 19.8 | > | 327.5.1 | 19.9 | 19.8 | > | 327 | 19.8 | 19.8 | > | 229 | 16.9 | 17.2 | > | 113 | 17.1 | 17.0 | > +-----------+--------------------+--------------------------+ > > So, I think it's not a regression. That's expected since only one session is used. > > Btw, I also switch the host kernel to upstream, but it's always host crash > when neterf was running on guest. Can you please retest with more than one sessions (e.g 2)? My test on upstream show 1.6x-1.8x improvement in this case for 2q. Thanks (In reply to jason wang from comment #13) > Well, you need produce more than one flows in order to gain improvement from > multiqueue. Doesn't -P 2 do the trick? (In reply to jason wang from comment #14) > > Can you please retest with more than one sessions (e.g 2)? My test on > upstream show 1.6x-1.8x improvement in this case for 2q. And what about downstream el7? Would you be kind to share the source and client command line used to demonstrate this improvement? We have tried several command lines per your suggestion, and found no improvement, but I'm certain that Quan Wenli would be happy to prove us wrong. (In reply to Dan Kenigsberg from comment #15) > (In reply to jason wang from comment #13) > > > Well, you need produce more than one flows in order to gain improvement from > > multiqueue. > > Doesn't -P 2 do the trick? Yes, but if I read the test result correctly. Wenli only use 1 sessions of netperf and that's why I'm asking for retesting with more than 1. > > (In reply to jason wang from comment #14) > > > > > Can you please retest with more than one sessions (e.g 2)? My test on > > upstream show 1.6x-1.8x improvement in this case for 2q. > > And what about downstream el7? Haven't found time to do this. Would you be kind to share the source and > client command line used to demonstrate this improvement? Rather simple cli: $qemu_path $img_path -netdev tap,id=hn0,queues=$queues,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hn0,mq=on,vectors=32,mac=$mac -drive file=$img_path,id=img1,if=none,format=qcow2 -enable-kvm $@ -vnc :11 -m 4G We have tried > several command lines per your suggestion, and found no improvement, but I'm > certain that Quan Wenli would be happy to prove us wrong. I do think it's a bug of el7 :). But suspect it was a regression. So I'm asking for qe for a baseline to do bisection. Thanks (In reply to jason wang from comment #14) > (In reply to Quan Wenli from comment #11) > > Test with pin mode, pin vhost/vcpus to unique pvpus on host. > > > > Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm > > One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1 > > > > +--------------------------------+--------------------------+ > > | Guest | None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) | > > +-----------+--------------------+--------------------------+ > > | upstream | 20 | 19.7 | > > | 327.9.1 | 19.9 | 19.8 | > > | 327.5.1 | 19.9 | 19.8 | > > | 327 | 19.8 | 19.8 | > > | 229 | 16.9 | 17.2 | > > | 113 | 17.1 | 17.0 | > > +-----------+--------------------+--------------------------+ > > > > So, I think it's not a regression. > > That's expected since only one session is used. > > > > > Btw, I also switch the host kernel to upstream, but it's always host crash > > when neterf was running on guest. > > Can you please retest with more than one sessions (e.g 2)? My test on > upstream show 1.6x-1.8x improvement in this case for 2q. > > Thanks Yes, tried with 2 sessions of netperf. throughput is 20Gb/s around for none-mq.throughput is 34Gb/s around with mq. it's same with your result. (In reply to Quan Wenli from comment #17) > (In reply to jason wang from comment #14) > > (In reply to Quan Wenli from comment #11) > > > Test with pin mode, pin vhost/vcpus to unique pvpus on host. > > > > > > Host: kernel-3.10.0-327.9.1.el7.x86_64.rpm > > > One neperf session run on guest like: netperf -H $netserver_ip -l 60 -D 1 > > > > > > +--------------------------------+--------------------------+ > > > | Guest | None-MQ(Gbits/sec)| MQ with 2 queues (2vpus) | > > > +-----------+--------------------+--------------------------+ > > > | upstream | 20 | 19.7 | > > > | 327.9.1 | 19.9 | 19.8 | > > > | 327.5.1 | 19.9 | 19.8 | > > > | 327 | 19.8 | 19.8 | > > > | 229 | 16.9 | 17.2 | > > > | 113 | 17.1 | 17.0 | > > > +-----------+--------------------+--------------------------+ > > > > > > So, I think it's not a regression. > > > > That's expected since only one session is used. > > > > > > > > Btw, I also switch the host kernel to upstream, but it's always host crash > > > when neterf was running on guest. > > > > Can you please retest with more than one sessions (e.g 2)? My test on > > upstream show 1.6x-1.8x improvement in this case for 2q. > > > > Thanks > > Yes, tried with 2 sessions of netperf. throughput is 20Gb/s around for > none-mq.throughput is 34Gb/s around with mq. it's same with your result. Also tried with iperf with -P 2, got the same result. None-mq: 20 Gbits/sec, MQ: 38.5 Gbits/sec Dan: Looks like QE could not reproduce the problem. So a question is can you reproduce the issue in another environment/setup? And I would check the setup again it is available for me. Thanks Would you please share your client and server netperf command line? None of the ones that we have tried showed any benefit for mq, so I'd like to imitate you as closely as possible. (In reply to Dan Kenigsberg from comment #20) > Would you please share your client and server netperf command line? > > None of the ones that we have tried showed any benefit for mq, so I'd like > to imitate you as closely as possible. Of course. Run "netserver" on first guest. run " for i in `seq 2`; do netperf -H ip_of_netserver -l 60 -D 1 & done " on second guest. Defer to 7.4. Dan: Do you still meet this? Thanks I'll have it reopen if and when we have the resources to chase the performance benefit again, and fail to see it. *** Bug 1272311 has been marked as a duplicate of this bug. *** |