Bug 1263591
Summary: | Guest network works abnormally(ping out or netperf test failed) when use multi queue of the virtio-net-pci macvtap | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Gu Nini <ngu> |
Component: | qemu-kvm-rhev | Assignee: | Laurent Vivier <lvivier> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.2 | CC: | bugproxy, danken, dgibson, gkurz, hannsj_uhl, huding, juzhang, knoel, lvivier, michal.skrivanek, michen, mrezanin, ngu, qzhang, thuth, virt-maint, xfu, xuhan, xuma, zhengtli |
Target Milestone: | rc | ||
Target Release: | 7.3 | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-2.6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-07 20:38:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1288337, 1359843 |
Description
Gu Nini
2015-09-16 09:07:29 UTC
Have tried to start the guest with '-netdev tap,id=hostnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on,queues=4 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c4:e7:16,vectors=10,mq=on' instead of the macvtap on another power pc host, there is no the bug problem with the same steps. Seems to be an "endian" issue, as a ppc64le guest works well with a ppc64le host. Host kernel: 3.10.0-315.el7.ppc64le Guest kernel: 3.10.0-315.el7.ppc64le Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-23.el7.ppc64le I'm not able to reproduce it with ppc64 guest and ppc64le host. Could you check with the latest releases, please ? Host kernel: 3.10.0-316.el7.ppc64le Guest kernel: 3.10.0-316.el7.ppc64 Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-23.el7.ppc64le (In reply to Laurent Vivier from comment #4) > I'm not able to reproduce it with ppc64 guest and ppc64le host. > > Could you check with the latest releases, please ? > > Host kernel: 3.10.0-316.el7.ppc64le > Guest kernel: 3.10.0-316.el7.ppc64 > Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-23.el7.ppc64le It's a pity that I have no available host today. In my later test, sometimes the ping is ok, but if I used 'netperf -H 10.16.67.19 -l 300', it failed. So can you check the netperf test to and external host as the ping one? If you still could not reproduce the bug, I will tried it on the latest releases. Change the bug summary accordingly. Not yet set as exception or blocker, and I don't see an immediate cause to. Therefore, bumping to 7.3. I'm able to reproduce the bug with netperf. thanks. It is not specific to RHEL kernel/qemu as I have been able to reproduce it with upstream kernel/qemu (host/guest kernel 4.2 and qemu 2.4.50 5fdb467). The endianess is only set for the first tap device, this is why when we have more tap devices it cannot work. The fix is as easy as this: --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -311,12 +311,11 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, goto err; } - r = vhost_net_set_vnet_endian(dev, ncs[0].peer, true); - if (r < 0) { - goto err; - } - for (i = 0; i < total_queues; i++) { + r = vhost_net_set_vnet_endian(dev, ncs[i].peer, true); + if (r < 0) { + goto err; + } vhost_net_set_vq_index(get_vhost_net(ncs[i].peer), i * 2); } But a work is currently ongoing to move this to the vnet backend. I need to investigate more to see how to integrate this change in it. I've put my series on hold. Please proceed with your fix. Patch sent: http://patchwork.ozlabs.org/patch/567120/ Hi Laurent, This patch got 3 R-b tags on qemu-devel@... Is there anything that prevents upstream acceptance ? Thanks. -- Greg (In reply to Greg Kurz from comment #13) > Hi Laurent, > > This patch got 3 R-b tags on qemu-devel@... Is there anything that prevents > upstream acceptance ? No, but while it is not in a maintainer branch, or better in the master branch, we cannot be sure it will be taken. Now upstream. a407644 net: set endianness on all backend devices This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions According to the comment #0 , I can reproduce the problem , with the packages below: Host kernel:3.10.0-418.el7.ppc64le qemu-kvm-rhev-2.5.0-4.el7 Guest kernel: 3.10.0-418.el7.ppc64 After upgrading the version of qemu-kvm-rhev to qemu-kvm-rhev-2.6.0-4.el7. The problem didn't show up. So put the bug to be verified. Details: 1. Add macvtap device in host: [root@ibm-p8-rhevm-10 test]# ip link add link enP3p9s0f0 name macvtap0 type macvtap mode bridge [root@ibm-p8-rhevm-10 test]# ip link set macvtap0 address c2:ac:d3:c7:c4:0f up 2. Boot guest with the tap device: /usr/libexec/qemu-kvm ... -device virtio-net-pci,any_layout=on,netdev=macvtap0,mac=c2:ac:d3:c7:c4:0f,id=net1,vectors=10,mq=on 678<>/dev/tap7 679<>/dev/tap7 680<>/dev/tap7 681<>/dev/tap7 \ -netdev tap,id=macvtap0,vhost=on,fds=678:679:680:681 ... 3. After guest boot up.Check mq configuration : [root@dhcp71-27 ~]# ethtool -l eth0 Channel parameters for eth0: Pre-set maximums: RX: 0 TX: 0 Other: 0 Combined: 4 Current hardware settings: RX: 0 TX: 0 Other: 0 Combined: 1 ping external host ip: [root@dhcp71-27 ~]# ping 10.16.67.19 PING 10.16.67.19 (10.16.67.19) 56(84) bytes of data. 64 bytes from 10.16.67.19: icmp_seq=1 ttl=61 time=0.295 ms 64 bytes from 10.16.67.19: icmp_seq=2 ttl=61 time=0.256 ms 64 bytes from 10.16.67.19: icmp_seq=3 ttl=61 time=0.252 ms 64 bytes from 10.16.67.19: icmp_seq=4 ttl=61 time=0.250 ms 64 bytes from 10.16.67.19: icmp_seq=5 ttl=61 time=0.246 ms 64 bytes from 10.16.67.19: icmp_seq=6 ttl=61 time=0.263 ms 4. stop pinging process ,and change the mq configuration [root@dhcp71-27 ~]# ethtool -L eth0 combined 2 5. start ping out again [root@dhcp71-27 ~]# ping 10.16.67.19 PING 10.16.67.19 (10.16.67.19) 56(84) bytes of data. 64 bytes from 10.16.67.19: icmp_seq=1 ttl=61 time=0.321 ms 64 bytes from 10.16.67.19: icmp_seq=2 ttl=61 time=0.292 ms 64 bytes from 10.16.67.19: icmp_seq=3 ttl=61 time=0.270 ms 64 bytes from 10.16.67.19: icmp_seq=4 ttl=61 time=0.269 ms 64 bytes from 10.16.67.19: icmp_seq=5 ttl=61 time=0.272 ms .... 64 bytes from 10.16.67.19: icmp_seq=128 ttl=61 time=0.252 ms 64 bytes from 10.16.67.19: icmp_seq=129 ttl=61 time=0.262 ms 64 bytes from 10.16.67.19: icmp_seq=130 ttl=61 time=0.265 ms 64 bytes from 10.16.67.19: icmp_seq=131 ttl=61 time=0.271 ms 64 bytes from 10.16.67.19: icmp_seq=132 ttl=61 time=0.247 ms 64 bytes from 10.16.67.19: icmp_seq=133 ttl=61 time=0.262 ms This step last for a little long time. and all the packets transmitted successfully. 6. Stop pinging process and change the mq back to one. [root@dhcp71-27 ~]# ethtool -L eth0 combined 1 7. Ping out again. [root@dhcp71-27 ~]# ping 10.16.67.19 PING 10.16.67.19 (10.16.67.19) 56(84) bytes of data. 64 bytes from 10.16.67.19: icmp_seq=1 ttl=61 time=0.314 ms 64 bytes from 10.16.67.19: icmp_seq=2 ttl=61 time=0.213 ms 64 bytes from 10.16.67.19: icmp_seq=3 ttl=61 time=0.219 ms 64 bytes from 10.16.67.19: icmp_seq=4 ttl=61 time=0.245 ms 64 bytes from 10.16.67.19: icmp_seq=5 ttl=61 time=0.219 ms 64 bytes from 10.16.67.19: icmp_seq=6 ttl=61 time=0.209 ms 64 bytes from 10.16.67.19: icmp_seq=7 ttl=61 time=0.209 ms The result is good. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |