Bug 1263591

Summary:	Guest network works abnormally(ping out or netperf test failed) when use multi queue of the virtio-net-pci macvtap
Product:	Red Hat Enterprise Linux 7	Reporter:	Gu Nini <ngu>
Component:	qemu-kvm-rhev	Assignee:	Laurent Vivier <lvivier>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.2	CC:	bugproxy, danken, dgibson, gkurz, hannsj_uhl, huding, juzhang, knoel, lvivier, michal.skrivanek, michen, mrezanin, ngu, qzhang, thuth, virt-maint, xfu, xuhan, xuma, zhengtli
Target Milestone:	rc
Target Release:	7.3
Hardware:	ppc64le
OS:	Linux
Whiteboard:
Fixed In Version:	qemu-2.6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-11-07 20:38:29 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1288337, 1359843

Description Gu Nini 2015-09-16 09:07:29 UTC

Description of problem:
If boot up a guest with macvtap virtual net port, after set multi queue for it, it failed to ping out.  

Version-Release number of selected component (if applicable):
Host kernel: 3.10.0-306.0.1.el7.ppc64le
Guest kernel: 3.10.0-229.14.1.el7.ppc64
Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-22.el7.ppc64le

How reproducible:
100%


Steps to Reproduce:
1. Create a macvtap on host net port enP3p9s0f0:
# ip link add link enP3p9s0f0 name macvtap0 type macvtap mode bridge
# ip link set macvtap0 address c2:ac:d3:c7:c4:0f up
# ll /dev/tap*
crw-------. 1 root root 247, 1 Sep  9 22:33 /dev/tap672

2. Start a guest with the macvtap and set multi queue as on:
/usr/libexec/qemu-kvm -name spaprraw-0910 -machine pseries,accel=kvm,usb=off -m 2048M -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -uuid 95346a10-1828-403a-a610-ac5a52a29416 -no-user-config -nodefaults -monitor stdio -rtc base=utc,clock=host -no-shutdown -boot strict=on -device pci-ohci,id=ohci0 -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=/home/spaprraw-0910,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,reg=0x30000000 -vnc 0:06 -device VGA,id=video0,bus=pci.0,addr=0x8 -msg timestamp=on -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 ****-device virtio-net-pci,any_layout=on,netdev=macvtap0,mac=c2:ac:d3:c7:c4:0f,id=net1,vectors=10,mq=on 678<>/dev/tap672 679<>/dev/tap672 680<>/dev/tap672 681<>/dev/tap672 -netdev tap,id=macvtap0,vhost=on,fds=678:679:680:681****

3. After the guest boots up, check the guest net port eth0 channel parameters, and try to ping out to some external host:
# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:		0
TX:		0
Other:		0
Combined:	4
Current hardware settings:
RX:		0
TX:		0
Other:		0
Combined:	1
# ping 10.16.67.19
PING 10.16.67.19 (10.16.67.19) 56(84) bytes of data.
64 bytes from 10.16.67.19: icmp_seq=1 ttl=61 time=0.251 ms
64 bytes from 10.16.67.19: icmp_seq=2 ttl=61 time=0.195 ms
64 bytes from 10.16.67.19: icmp_seq=3 ttl=61 time=0.195 ms
64 bytes from 10.16.67.19: icmp_seq=4 ttl=61 time=0.211 ms
......

4. Stop the ping progress in step3, enable multi queues to be 2 for eth0, then try to ping out to the external host again:
# ethtool -L eth0 combined 2
# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:		0
TX:		0
Other:		0
Combined:	4
Current hardware settings:
RX:		0
TX:		0
Other:		0
Combined:	2
#ping 10.16.67.19
PING 10.16.67.19 (10.16.67.19) 56(84) bytes of data.
64 bytes from 10.16.67.19: icmp_seq=1 ttl=61 time=0.516 ms
From 10.19.106.248 icmp_seq=39 Destination Host Unreachable
From 10.19.106.248 icmp_seq=40 Destination Host Unreachable
From 10.19.106.248 icmp_seq=41 Destination Host Unreachable
64 bytes from 10.16.67.19: icmp_seq=72 ttl=61 time=0.203 ms
64 bytes from 10.16.67.19: icmp_seq=82 ttl=61 time=0.273 ms
64 bytes from 10.16.67.19: icmp_seq=84 ttl=61 time=0.209 ms
64 bytes from 10.16.67.19: icmp_seq=100 ttl=61 time=0.271 ms
64 bytes from 10.16.67.19: icmp_seq=104 ttl=61 time=0.218 ms
64 bytes from 10.16.67.19: icmp_seq=108 ttl=61 time=0.264 ms
64 bytes from 10.16.67.19: icmp_seq=116 ttl=61 time=0.245 ms
64 bytes from 10.16.67.19: icmp_seq=132 ttl=61 time=0.312 ms
ping: sendmsg: No uffer space availabe
ping: sendmsg: No uffer space availabe
ping: sendmsg: No uffer space availabe
ping: sendmsg: No uffer space availabe
ping: sendmsg: No uffer space availabe
......

5. Stop the ping progress in step4, reset queues to be 1 for eth0, then try to ping out to the external host again:
# ethtool -L eth0 combined 1
# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:		0
TX:		0
Other:		0
Combined:	4
Current hardware settings:
RX:		0
TX:		0
Other:		0
Combined:	1
#ping 10.16.67.19
PING 10.16.67.19 (10.16.67.19) 56(84) bytes of data.
PING 10.16.67.19 (10.16.67.19) 56(84) bytes of data.
64 bytes from 10.16.67.19: icmp_seq=1 ttl=61 time=0.269 ms
64 bytes from 10.16.67.19: icmp_seq=2 ttl=61 time=0.192 ms
64 bytes from 10.16.67.19: icmp_seq=3 ttl=61 time=0.198 ms
64 bytes from 10.16.67.19: icmp_seq=4 ttl=61 time=0.209 ms
......


Actual results:
In step4, after set multi queue for eth0, the guest ping out failed; most of the times, the packets lost rate would be 100%, although there is a few chance it can ping out for a few moment as showed in the step.
In steps 2 and 5, when the queue of eth0 is 1, guest can ping out without any problem.

Expected results:
With multi queue, the net port could work correctly without any problem

Additional info:

Comment 1 Gu Nini 2015-09-16 09:12:05 UTC

Have tried to start the guest with '-netdev tap,id=hostnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on,queues=4 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c4:e7:16,vectors=10,mq=on' instead of the macvtap on another power pc host, there is no the bug problem with the same steps.

Comment 3 Laurent Vivier 2015-09-17 19:52:54 UTC

Seems to be an "endian" issue, as a ppc64le guest works well with a ppc64le host.

Host kernel:   3.10.0-315.el7.ppc64le
Guest kernel:  3.10.0-315.el7.ppc64le
Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-23.el7.ppc64le

Comment 4 Laurent Vivier 2015-09-17 23:28:47 UTC

I'm not able to reproduce it with ppc64 guest and ppc64le host.

Could you check with the latest releases, please ?

Host kernel:   3.10.0-316.el7.ppc64le
Guest kernel:  3.10.0-316.el7.ppc64
Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-23.el7.ppc64le

Comment 5 Gu Nini 2015-09-18 09:34:26 UTC

(In reply to Laurent Vivier from comment #4)
> I'm not able to reproduce it with ppc64 guest and ppc64le host.
> 
> Could you check with the latest releases, please ?
> 
> Host kernel:   3.10.0-316.el7.ppc64le
> Guest kernel:  3.10.0-316.el7.ppc64
> Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-23.el7.ppc64le


It's a pity that I have no available host today.

In my later test, sometimes the ping is ok, but if I used 'netperf -H 10.16.67.19 -l 300', it failed. So can you check the netperf test to and external host as the ping one? If you still could not reproduce the bug, I will tried it on the latest releases. 

Change the bug summary accordingly.

Comment 6 David Gibson 2015-09-21 00:30:38 UTC

Not yet set as exception or blocker, and I don't see an immediate cause to.

Therefore, bumping to 7.3.

Comment 7 Laurent Vivier 2015-10-02 12:55:37 UTC

I'm able to reproduce the bug with netperf. thanks.

Comment 8 Laurent Vivier 2015-10-07 22:15:54 UTC

It is not specific to RHEL kernel/qemu as I have been able to reproduce it with upstream kernel/qemu (host/guest kernel 4.2 and qemu 2.4.50 5fdb467).

Comment 10 Laurent Vivier 2016-01-13 16:50:06 UTC

The endianess is only set for the first tap device, this is why when we have more tap devices it cannot work. The fix is as easy as this:

--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -311,12 +311,11 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
         goto err;
     }
 
-    r = vhost_net_set_vnet_endian(dev, ncs[0].peer, true);
-    if (r < 0) {
-        goto err;
-    }
-
     for (i = 0; i < total_queues; i++) {
+        r = vhost_net_set_vnet_endian(dev, ncs[i].peer, true);
+        if (r < 0) {
+            goto err;
+        }
         vhost_net_set_vq_index(get_vhost_net(ncs[i].peer), i * 2);
     }
 
But a work is currently ongoing to move this to the vnet backend. I need to investigate more to see how to integrate this change in it.

Comment 11 Greg Kurz 2016-01-13 19:27:28 UTC

I've put my series on hold. Please proceed with your fix.

Comment 12 Laurent Vivier 2016-01-14 12:04:50 UTC

Patch sent:

http://patchwork.ozlabs.org/patch/567120/

Comment 13 Greg Kurz 2016-01-27 09:56:29 UTC

Hi Laurent,

This patch got 3 R-b tags on qemu-devel@... Is there anything that prevents upstream acceptance ?

Thanks.

--
Greg

Comment 14 Laurent Vivier 2016-01-27 09:59:05 UTC

(In reply to Greg Kurz from comment #13)
> Hi Laurent,
> 
> This patch got 3 R-b tags on qemu-devel@... Is there anything that prevents
> upstream acceptance ?

No, but while it is not in a maintainer branch, or better in the master branch, we cannot be sure it will be taken.

Comment 18 Laurent Vivier 2016-02-08 13:22:50 UTC

Now upstream.

a407644 net: set endianness on all backend devices

Comment 19 Mike McCune 2016-03-28 22:58:03 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 21 Zhengtong 2016-05-30 05:29:47 UTC

According to the comment #0 , I can reproduce the problem , with the packages below:
Host kernel:3.10.0-418.el7.ppc64le
qemu-kvm-rhev-2.5.0-4.el7
Guest kernel: 3.10.0-418.el7.ppc64

After upgrading the version of qemu-kvm-rhev to qemu-kvm-rhev-2.6.0-4.el7. The problem didn't show up.  So put the bug to be verified.

Details:

1. Add macvtap device in host:
[root@ibm-p8-rhevm-10 test]# ip link add link enP3p9s0f0 name macvtap0 type macvtap mode bridge
[root@ibm-p8-rhevm-10 test]# ip link set macvtap0 address c2:ac:d3:c7:c4:0f up

2. Boot guest with the tap device:
/usr/libexec/qemu-kvm ...
-device virtio-net-pci,any_layout=on,netdev=macvtap0,mac=c2:ac:d3:c7:c4:0f,id=net1,vectors=10,mq=on 678<>/dev/tap7 679<>/dev/tap7 680<>/dev/tap7 681<>/dev/tap7 \
-netdev tap,id=macvtap0,vhost=on,fds=678:679:680:681
...

3. After guest boot up.Check mq configuration :
[root@dhcp71-27 ~]# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:		0
TX:		0
Other:		0
Combined:	4
Current hardware settings:
RX:		0
TX:		0
Other:		0
Combined:	1

ping external host ip:
[root@dhcp71-27 ~]# ping 10.16.67.19
PING 10.16.67.19 (10.16.67.19) 56(84) bytes of data.
64 bytes from 10.16.67.19: icmp_seq=1 ttl=61 time=0.295 ms
64 bytes from 10.16.67.19: icmp_seq=2 ttl=61 time=0.256 ms
64 bytes from 10.16.67.19: icmp_seq=3 ttl=61 time=0.252 ms
64 bytes from 10.16.67.19: icmp_seq=4 ttl=61 time=0.250 ms
64 bytes from 10.16.67.19: icmp_seq=5 ttl=61 time=0.246 ms
64 bytes from 10.16.67.19: icmp_seq=6 ttl=61 time=0.263 ms


4. stop pinging process ,and change the mq configuration

[root@dhcp71-27 ~]# ethtool -L eth0 combined 2

5. start ping out again
[root@dhcp71-27 ~]# ping 10.16.67.19
PING 10.16.67.19 (10.16.67.19) 56(84) bytes of data.
64 bytes from 10.16.67.19: icmp_seq=1 ttl=61 time=0.321 ms
64 bytes from 10.16.67.19: icmp_seq=2 ttl=61 time=0.292 ms
64 bytes from 10.16.67.19: icmp_seq=3 ttl=61 time=0.270 ms
64 bytes from 10.16.67.19: icmp_seq=4 ttl=61 time=0.269 ms
64 bytes from 10.16.67.19: icmp_seq=5 ttl=61 time=0.272 ms
....
64 bytes from 10.16.67.19: icmp_seq=128 ttl=61 time=0.252 ms
64 bytes from 10.16.67.19: icmp_seq=129 ttl=61 time=0.262 ms
64 bytes from 10.16.67.19: icmp_seq=130 ttl=61 time=0.265 ms
64 bytes from 10.16.67.19: icmp_seq=131 ttl=61 time=0.271 ms
64 bytes from 10.16.67.19: icmp_seq=132 ttl=61 time=0.247 ms
64 bytes from 10.16.67.19: icmp_seq=133 ttl=61 time=0.262 ms

This step last for a little long time. and all the packets transmitted successfully.

6. Stop pinging process and change the mq back to one.
[root@dhcp71-27 ~]# ethtool -L eth0 combined 1

7. Ping out again.
[root@dhcp71-27 ~]# ping 10.16.67.19
PING 10.16.67.19 (10.16.67.19) 56(84) bytes of data.
64 bytes from 10.16.67.19: icmp_seq=1 ttl=61 time=0.314 ms
64 bytes from 10.16.67.19: icmp_seq=2 ttl=61 time=0.213 ms
64 bytes from 10.16.67.19: icmp_seq=3 ttl=61 time=0.219 ms
64 bytes from 10.16.67.19: icmp_seq=4 ttl=61 time=0.245 ms
64 bytes from 10.16.67.19: icmp_seq=5 ttl=61 time=0.219 ms
64 bytes from 10.16.67.19: icmp_seq=6 ttl=61 time=0.209 ms
64 bytes from 10.16.67.19: icmp_seq=7 ttl=61 time=0.209 ms


The result is good.

Comment 23 errata-xmlrpc 2016-11-07 20:38:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html