Bug 1306466

Summary:	Concurrent VM to VM netperf/UDP_STREAM tests over multi-queue eth0 failed due to "No route to host"
Product:	Red Hat Enterprise Linux 7	Reporter:	Jean-Tsung Hsiao <jhsiao>
Component:	openvswitch	Assignee:	Eric Garver <egarver>
Status:	CLOSED CANTFIX	QA Contact:	Jean-Tsung Hsiao <jhsiao>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.2	CC:	aloughla, atragler, jhsiao, kzhang, mleitner, rcain, rkhan
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-09-22 19:15:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jean-Tsung Hsiao 2016-02-10 22:38:42 UTC

Description of problem: Concurrent VM to VM netperf/UDP_STREAM tests over multi-queue eth0 failed due to "No route to host"

[root@localhost jhsiao]# . run_netperf_udp_eth0_big.sh
-rw-r--r--. 1 root root 1 Feb 10 16:35 udp_eth0.log.big
netperf: send_omni: send_data failed: No route to host
netperf: send_omni: send_data failed: No route to host
[1]-  Exit 1                  netperf -H 172.16.3.120 -l 300 -t UDP_STREAM >> $LOG
[2]+  Exit 1                  netperf -H 172.16.3.120 -l 300 -t UDP_STREAM >> $LOG

[root@localhost jhsiao]# cat udp_eth0.log.big

MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.3.120 () port 0 AF_INET
send_data: data send error: errno 113
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.3.120 () port 0 AF_INET
send_data: data send error: errno 113

[root@localhost jhsiao]# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:		0
TX:		0
Other:		0
Combined:	4
Current hardware settings:
RX:		0
TX:		0
Other:		0
Combined:	4


Version-Release number of selected component (if applicable):
* Host
[root@netqe5 ~]# uname -a
Linux netqe5.knqe.lab.eng.bos.redhat.com 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

[root@netqe5 ~]# rpm -qa | grep openvswitch
openvswitch-2.4.0-1.el7.x86_64

* VM
[root@localhost jhsiao]# uname -a
Linux localhost.localdomain 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux

How reproducible: Reproducible


Steps to Reproduce:
Two Hosts needed
1.On each host configure a multi-queue VM with <driver queues='4'/> for eth0
2.On each host configure an OVS with VM and a 40 Gb mlx4 NIC 
3.Run concurrent netperf/UDP_STREAM with two threads or more
* scipt
[root@localhost jhsiao]# cat run_netperf_udp_eth0_big.sh
LOG=udp_eth0.log.big
echo $size > $LOG 
ls -l $LOG
netperf -H 172.16.3.120 -l 300 -t UDP_STREAM >> $LOG &
netperf -H 172.16.3.120 -l 300 -t UDP_STREAM >> $LOG &
#netperf -H 172.16.3.120 -l 300 -t UDP_STREAM >> $LOG &
#netperf -H 172.16.3.120 -l 300 -t UDP_STREAM >> $LOG &
wait

Actual results:
failed due to "No route to host"

Expected results:
should succeed

Additional info:
NOTE: There is no such issue when Combined set to 1.

[root@localhost tmp]# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:		0
TX:		0
Other:		0
Combined:	4
Current hardware settings:
RX:		0
TX:		0
Other:		0
Combined:	1

Comment 2 Flavio Leitner 2016-08-25 05:32:51 UTC

Can you please confirm that it is using NON-DPDK OVS?
Since that was with 2.4, could you please double check with OVS 2.5?
Thanks!

Comment 3 Eric Garver 2016-09-02 14:19:25 UTC

Jean-Tsung,
Please see Flavio's questions in comment 2.

Comment 4 Jean-Tsung Hsiao 2016-09-07 18:38:54 UTC

(In reply to Flavio Leitner from comment #2)
> Can you please confirm that it is using NON-DPDK OVS?
Yes, it is using NON-DPDK OVS.

> Since that was with 2.4, could you please double check with OVS 2.5?

Will do.
> Thanks!

Comment 5 Jean-Tsung Hsiao 2016-09-07 20:38:40 UTC

Ran the reproducer for only 45 seconds, and there were three out of four failed:

[root@localhost jhsiao]# cat log.45.0 log.45.1 log.45.2 log.45.3
netperf: send_omni: send_data failed: No route to host
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.200 () port 0 AF_INET
send_data: data send error: errno 113
netperf: send_omni: send_data failed: No route to host
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.200 () port 0 AF_INET
send_data: data send error: errno 113
netperf: send_omni: send_data failed: No route to host
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.200 () port 0 AF_INET
send_data: data send error: errno 113
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.200 () port 0 AF_INET
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   45.00     3745197      0    43615.33
212992           45.00        4688             54.59

Comment 6 Eric Garver 2016-09-10 14:44:47 UTC

Can you describe the network setup a bit more? What are the guest and host IPs? Is there a router between the hosts?

Please replace OVS with a native linux bridge and re-run the test. That will possibly rule out OVS.

Comment 7 Jean-Tsung Hsiao 2016-09-12 17:44:42 UTC

Repeat what we discussed during our IRC session:

VM -- host(OVS) -- host(OVS) -- VM

OVS is a non-dpdk OVS. NIC's are connected back to back. There are no other switches or router along the way.

Traffic is running between eth0(192.168.3.110) on one VM and eth0(192.168.3.120)

The issue is that when pumping 64K UDP_STREAM traffic with 4Q, probably, too much data traffic for the destination to absorb

Comment 8 Eric Garver 2016-09-15 23:44:41 UTC

Jean,

After looking at your setup today this looks to be caused by ARPs getting missed/dropped which causes temporary route loss.

I was able to reproduce the problem on your setup. While doing so I was capturing ARP traffic on the host. The "No route to host" messages from netperf in the VM coincided with the ARP traffic captured on the host.

To verify this was the cause I added static ARP entries in both VMs.

  $ ip neigh change <ip> lladdr <mac> nud permanent dev eth0

Then I ran the reproducer for an hour in a loop and never saw the issue.

You could try it with a native linux bridge instead of OVS out of curiosity. But I think the answer is; the queues are overloaded and we start missing/dropping ARPs so the route goes down.

Comment 9 Jean-Tsung Hsiao 2016-09-16 14:16:05 UTC

(In reply to Eric Garver from comment #8)
> Jean,
> 
> After looking at your setup today this looks to be caused by ARPs getting
> missed/dropped which causes temporary route loss.
> 
> I was able to reproduce the problem on your setup. While doing so I was
> capturing ARP traffic on the host. The "No route to host" messages from
> netperf in the VM coincided with the ARP traffic captured on the host.
> 
> To verify this was the cause I added static ARP entries in both VMs.
> 
>   $ ip neigh change <ip> lladdr <mac> nud permanent dev eth0
> 
> Then I ran the reproducer for an hour in a loop and never saw the issue.
> 
> You could try it with a native linux bridge instead of OVS out of curiosity.
> But I think the answer is; the queues are overloaded and we start
> missing/dropping ARPs so the route goes down.

Hi Eric,

I still saw the same issue when running 8 concurrent for 300 seconds.

[root@localhost ~]# cat log.1
netperf: send_omni: send_data failed: No route to host
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.200 () port 0 AF_INET
send_data: data send error: errno 113

The other 7 logs show the same thing.

The ARP you added is still there:
[root@localhost ~]# arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.122.1            ether   52:54:00:1b:f9:f4   C                     eth1
192.168.3.200            ether   52:54:00:db:fa:e2   C                     eth0
[root@localhost ~]# 

[root@localhost ~]# arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.122.1            ether   52:54:00:1b:30:22   C                     eth1
192.168.3.100            ether   52:54:00:44:44:7d   CM                    eth0
[root@localhost ~]#

Comment 10 Eric Garver 2016-09-16 14:44:15 UTC

I had deleted it on the client side (which is the side that matters). I did this to verify it failed again.

I readded the permanent entry. Please try again.

[root@localhost ~]# ip neigh
192.168.122.1 dev eth1 lladdr 52:54:00:1b:f9:f4 REACHABLE
192.168.3.200 dev eth0 lladdr 52:54:00:db:fa:e2 STALE

[root@localhost ~]# ip neigh change 192.168.3.200 lladdr 52:54:00:db:fa:e2 nud permanent dev eth0
[root@localhost ~]# ip neigh
192.168.122.1 dev eth1 lladdr 52:54:00:1b:f9:f4 REACHABLE
192.168.3.200 dev eth0 lladdr 52:54:00:db:fa:e2 PERMANENT

Comment 11 Jean-Tsung Hsiao 2016-09-16 14:55:21 UTC

(In reply to Eric Garver from comment #10)
> I had deleted it on the client side (which is the side that matters). I did
> this to verify it failed again.
> 
> I readded the permanent entry. Please try again.
> 
> [root@localhost ~]# ip neigh
> 192.168.122.1 dev eth1 lladdr 52:54:00:1b:f9:f4 REACHABLE
> 192.168.3.200 dev eth0 lladdr 52:54:00:db:fa:e2 STALE
> 
> [root@localhost ~]# ip neigh change 192.168.3.200 lladdr 52:54:00:db:fa:e2
> nud permanent dev eth0
> [root@localhost ~]# ip neigh
> 192.168.122.1 dev eth1 lladdr 52:54:00:1b:f9:f4 REACHABLE
> 192.168.3.200 dev eth0 lladdr 52:54:00:db:fa:e2 PERMANENT

Eric,

Now, all 8 conncurent tests completed successfully.

Thanks!

Jean

Comment 12 Jean-Tsung Hsiao 2016-09-16 15:49:39 UTC

 
(In reply to Eric Garver from comment #8)
> Jean
> 
> You could try it with a native linux bridge instead of OVS out of curiosity.

Yes, same issue happened with linux bridge when 8 concurrent tests for 300 seconds.

> But I think the answer is; the queues are overloaded and we start
> missing/dropping ARPs so the route goes down.

The default netperf/UDP_STREAM message size is 64 K so the queues are overloaded with 8 concurrent tests.

Comment 13 Eric Garver 2016-09-20 19:12:45 UTC

Jean,

Currently the client side is making use of the 4 queues for TX. On the server/RX side all the traffic appears to hit the same queue, so we're not taking advantage of the multiple queues at all.

Two things to try:

1) Try starting multiple netserver processes each bound to a separate IP address (use -L option), then start a client for each netserver. So 4 netserver and 4 netperf. The different IP address may allow load balancing on the RX queue for the server side.

2) Try toggling RPS so the packets are hashed to different queues in software.

  # echo f > /sys/class/net/eth0/queues/rx-*/rps_cpus

Comment 14 Jean-Tsung Hsiao 2016-09-22 16:27:25 UTC

(In reply to Eric Garver from comment #13)
> Jean,
> 
> Currently the client side is making use of the 4 queues for TX. On the
> server/RX side all the traffic appears to hit the same queue, so we're not
> taking advantage of the multiple queues at all.
> 
> Two things to try:
> 
> 1) Try starting multiple netserver processes each bound to a separate IP
> address (use -L option), then start a client for each netserver. So 4
> netserver and 4 netperf. The different IP address may allow load balancing
> on the RX queue for the server side.
> 
> 2) Try toggling RPS so the packets are hashed to different queues in
> software.
> 
>   # echo f > /sys/class/net/eth0/queues/rx-*/rps_cpus

Hi Eric,

I know your point, as I have done that kind of exercise --- using different interfaces.

For the OVS-dpdk bonding testing I did configure each guest with an OVS with 64 ports, each having its own subnet like 172.16.i.0/24 where i =1,2,3, ..., 64. Thus, all four CPU's were equally/fully utilized as concurrency going up to 4, 8, 32 and 64. Consequently, drops of ARP happened when UDP_STREAM message was at 64K --- typically, at 32 and 64 concurrency.

You have identified the root cause --- ARP dropped due to heavy UDP_STREAM traffic. We can close it for now if you want.

Thanks!

Jean

Comment 15 Eric Garver 2016-09-22 19:00:19 UTC

(In reply to Jean-Tsung Hsiao from comment #14)
> For the OVS-dpdk bonding testing I did configure each guest with an OVS with
> 64 ports, each having its own subnet like 172.16.i.0/24 where i =1,2,3, ...,
> 64. Thus, all four CPU's were equally/fully utilized as concurrency going up
> to 4, 8, 32 and 64. Consequently, drops of ARP happened when UDP_STREAM
> message was at 64K --- typically, at 32 and 64 concurrency.

I see. This is slightly different.

I think RPS will hash on the TCP/UDP ports as well as SIP/DIP. So it may still be worth trying the single interface with multi-queue. You just need to configure RPS as I indicated in comment 13 option #2.


> You have identified the root cause --- ARP dropped due to heavy UDP_STREAM
> traffic. We can close it for now if you want.

I'm fine with closing this BZ as CANTFIX.

Comment 16 Jean-Tsung Hsiao 2016-09-22 20:14:21 UTC

(In reply to Eric Garver from comment #15)
> (In reply to Jean-Tsung Hsiao from comment #14)
> > For the OVS-dpdk bonding testing I did configure each guest with an OVS with
> > 64 ports, each having its own subnet like 172.16.i.0/24 where i =1,2,3, ...,
> > 64. Thus, all four CPU's were equally/fully utilized as concurrency going up
> > to 4, 8, 32 and 64. Consequently, drops of ARP happened when UDP_STREAM
> > message was at 64K --- typically, at 32 and 64 concurrency.
> 
> I see. This is slightly different.
> 
> I think RPS will hash on the TCP/UDP ports as well as SIP/DIP. So it may
> still be worth trying the single interface with multi-queue. You just need
> to configure RPS as I indicated in comment 13 option #2.
> 
Hi Eric,

I'll try the RPS scheme when I have a chance.

Thanks!

Jean
> 
> > You have identified the root cause --- ARP dropped due to heavy UDP_STREAM
> > traffic. We can close it for now if you want.
> 
> I'm fine with closing this BZ as CANTFIX.

Comment 17 Flavio Leitner 2016-09-29 14:15:01 UTC

*** Bug 1358026 has been marked as a duplicate of this bug. ***