Bug 1417143

Summary:	Failover to inactive port of team interface, delays in around 20 seconds.
Product:	Red Hat Enterprise Linux 7	Reporter:	Fani Orestiadou <forestia>
Component:	libteam	Assignee:	Xin Long <lxin>
Status:	CLOSED NOTABUG	QA Contact:	Network QE <network-qe>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.2	CC:	aloughla, lxin
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-09 15:27:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1420851

Description Fani Orestiadou 2017-01-27 09:51:56 UTC

Description of problem:

Team interface, configured in active-backup mode with arp_ping link watch and link_watch.missed_max set to 1.

When testing the failover, it takes 20 seconds for the inactive port to start handling the traffic again. 

$ cat sos_commands/teamd/teamdctl_team1_config_dump 
{
    "device": "team1",
    "link_watch": {
        "interval": 5000,
        "missed_max": 1,
        "name": "arp_ping",
        "source_host": "10.254.35.11",
        "target_host": "10.254.35.1",
        "validate_active": true
    },
    "mcast_rejoin": {
        "count": 1
    },
    "notify_peers": {
        "count": 1
    },
    "ports": {
        "eno3": {
            "prio": 100
        },
        "ens2f2": {
            "prio": 90
        }
    },
    "runner": {
        "hwaddr_policy": "only_active",
        "name": "activebackup"
    }
}

ARP traffic the time of the failover 
(Delay is visible on frame number 5)

$ tshark -r ens2f2.pcap "arp" -T fields -e frame.number -e frame.time_delta -e eth.src -e eth.dst -e col.Info
1	0.000000000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Who has 10.254.35.1?  Tell 10.254.35.11
2	5.000003000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Who has 10.254.35.1?  Tell 10.254.35.11
3	4.999989000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Who has 10.254.35.1?  Tell 10.254.35.11
4	5.000007000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Who has 10.254.35.1?  Tell 10.254.35.11
5	18.035781000	28:80:23:b3:98:96	ff:ff:ff:ff:ff:ff	Who has 10.254.35.1?  Tell 10.254.35.11 (duplicate use of 10.254.35.11 detected!)
6	0.001094000	00:00:5e:00:01:23	28:80:23:b3:98:96	10.254.35.1 is at 00:00:5e:00:01:23 (duplicate use of 10.254.35.11 detected!)
7	4.999073000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Who has 10.254.35.1?  Tell 10.254.35.11 (duplicate use of 10.254.35.11 detected!)
8	0.000126000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Gratuitous ARP for 10.254.35.11 (Request)
110	0.028338000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Who has 10.254.35.1?  Tell 10.254.35.11
111	0.000537000	00:00:5e:00:01:23	94:57:a5:59:20:e6	10.254.35.1 is at 00:00:5e:00:01:23
212	0.017603000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Who has 10.254.35.1?  Tell 10.254.35.11
213	0.000552000	00:00:5e:00:01:23	94:57:a5:59:20:e6	10.254.35.1 is at 00:00:5e:00:01:23
314	0.007123000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Who has 10.254.35.1?  Tell 10.254.35.11
315	0.000525000	00:00:5e:00:01:23	94:57:a5:59:20:e6	10.254.35.1 is at 00:00:5e:00:01:23
354	0.067746000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Gratuitous ARP for 10.254.35.11 (Request)
355	0.000018000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Gratuitous ARP for 10.254.35.11 (Request)
358	4.999622000	94:57:a5:59:20:e6	ff:ff:ff:ff:ff:ff	Who has 10.254.35.1?  Tell 10.254.35.11

Version-Release number of selected component (if applicable):
$ cat etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)

teamd-1.17-7.el7_2.x86_64   
libteam-1.17-7.el7_2.x86_64

How reproducible:
Configure active-backup team interface, with arp_ping link watch set the missed_max set to 1. Start tcpdump on both ports, unplug the active one and wait till the inactive port starts handling the traffic again.

Actual results:
Delay of 20 seconds, until second port is able to handle the traffic

Expected results:
Smaller delay

Additional info:
Checking and comparing the same on bond interface, inactive slave starts handling the traffic in less than 5 seconds. 

Please let me know in case you need any further details. 

Thank you

Comment 2 Xin Long 2017-02-05 12:01:03 UTC

The failover time 20s is expected. Note that you did not set linkwatch.send_always = true for arp_ping. It means only the active port sent arp_request, the active port will check if the port is alive by arp_REPLY while the backup port do it by the arp_REQUEST forwarded by the switch dev.

so when you disable the active port:
1st 5s timeout: missed++, missed = 1, your "missed_max": 1, no action.
2nd 5s timeout: missed++, missed = 2, both ports went down as missed > missed_max.
                as no ports were alive, it set force_send = true for both ports.
3rd 5s timeout: both ports would send arp_requests.
4th 5s timeout: the backup port got the arp_reply and changed to active port and
                set the old active port's force_send = false. only the new active
                port sent arp_request again as before.

After that the failover process finished.

If you want it to process faster, pls try to set linkwatch.send_always = true. 

Thanks.

Comment 3 Fani Orestiadou 2017-02-05 12:07:55 UTC

Hello, 

Thank you for your reply. 
Actually we have tried to set the send_always option to true, but then we face different issue from the switch side. 

1. If hw_policy is set to same all, there is MAC address flapping on the switch

MAC Address Table:
First time:
9457-a559-20e6   3335       Learned          GE1/0/18                 Y
Second time:
9457-a559-20e6   3335       Learned          XGE1/0/25                Y

[root@srv-as1-01 ~]# ping -i0.1 -W0.1 10.254.51.11
PING 10.254.51.11 (10.254.51.11) 56(84) bytes of data.
64 bytes from 10.254.51.11: icmp_seq=37 ttl=61 time=0.525 ms
64 bytes from 10.254.51.11: icmp_seq=38 ttl=61 time=0.495 ms
64 bytes from 10.254.51.11: icmp_seq=39 ttl=61 time=0.503 ms  <---
64 bytes from 10.254.51.11: icmp_seq=83 ttl=61 time=0.501 ms  <---
64 bytes from 10.254.51.11: icmp_seq=84 ttl=61 time=0.505 ms

2. If hw_policy is set to only_active, frequent changes of the ARP table of the switch (L3 switch) causes pause of traffic flow (5 seconds each time, "interval": 5000)

First time:
10.254.35.11    2880-23b3-9896 3335       GE1/0/18                 20    D
Second time:
10.254.35.11    9457-a559-20e6 3335       XGE1/0/25                20    D
Third time:
10.254.35.11    2880-23b3-9896 3335       GE1/0/18                 20    D

[root@srv-as1-01 ~]# ping 10.254.51.11
PING 10.254.51.11 (10.254.51.11) 56(84) bytes of data.
64 bytes from 10.254.51.11: icmp_seq=3 ttl=61 time=0.551 ms
64 bytes from 10.254.51.11: icmp_seq=8 ttl=61 time=0.552 ms
64 bytes from 10.254.51.11: icmp_seq=13 ttl=61 time=0.561 ms

And in my opinion this is also a BUG related to the send_always option, unless i am missing something here. 

Thank you

Comment 4 Xin Long 2017-02-05 12:39:03 UTC

(In reply to Fani Orestiadou from comment #3)
> Hello, 
> 
> Thank you for your reply. 
> Actually we have tried to set the send_always option to true, but then we
> face different issue from the switch side. 
> 
> 1. If hw_policy is set to same all, there is MAC address flapping on the
> switch
> 
> MAC Address Table:
> First time:
> 9457-a559-20e6   3335       Learned          GE1/0/18                 Y
> Second time:
> 9457-a559-20e6   3335       Learned          XGE1/0/25                Y
I think it can't be avoided :(

> 
> [root@srv-as1-01 ~]# ping -i0.1 -W0.1 10.254.51.11
> PING 10.254.51.11 (10.254.51.11) 56(84) bytes of data.
> 64 bytes from 10.254.51.11: icmp_seq=37 ttl=61 time=0.525 ms
> 64 bytes from 10.254.51.11: icmp_seq=38 ttl=61 time=0.495 ms
> 64 bytes from 10.254.51.11: icmp_seq=39 ttl=61 time=0.503 ms  <---
> 64 bytes from 10.254.51.11: icmp_seq=83 ttl=61 time=0.501 ms  <---
> 64 bytes from 10.254.51.11: icmp_seq=84 ttl=61 time=0.505 ms
> 
> 2. If hw_policy is set to only_active, frequent changes of the ARP table of
> the switch (L3 switch) causes pause of traffic flow (5 seconds each time,
> "interval": 5000)
> 
> First time:
> 10.254.35.11    2880-23b3-9896 3335       GE1/0/18                 20    D
> Second time:
> 10.254.35.11    9457-a559-20e6 3335       XGE1/0/25                20    D
> Third time:
> 10.254.35.11    2880-23b3-9896 3335       GE1/0/18                 20    D
can you try these 2 ways to work around it:

1. remove "source_host": "10.254.35.11" :)
or
2. make "link_watch" per port, like:
                "eth2": {
                        "link_watch":   {
   then set different "source_host" addrs in each port's link_watch.

> 
> [root@srv-as1-01 ~]# ping 10.254.51.11
> PING 10.254.51.11 (10.254.51.11) 56(84) bytes of data.
> 64 bytes from 10.254.51.11: icmp_seq=3 ttl=61 time=0.551 ms
> 64 bytes from 10.254.51.11: icmp_seq=8 ttl=61 time=0.552 ms
> 64 bytes from 10.254.51.11: icmp_seq=13 ttl=61 time=0.561 ms
> 
> And in my opinion this is also a BUG related to the send_always option,
> unless i am missing something here. 
> 
> Thank you

Comment 5 Fani Orestiadou 2017-02-08 08:54:05 UTC

Hello, 

OK, tested to remove the source_host and now there is no delay during failover and no issue from switch side, but there is still issue if customer enables the validate_active option. 
It seems that even if the ARP probes are sent and received, still the team increases the missed counter. 

So having the below configuration:
TEAM_CONFIG='{"runner": {"name": "activebackup", "hwaddr_policy": "only_active"}, "link_watch": { "validate_active": true, "name": "arp_ping", "interval": 5000, "missed_max": 2, "target_host": "10.254.35.1", "send_always": true }}'

Testing the failover:

Tue Feb  7 14:59:09 UTC 2017
        "eno3": {
                "ifname": "eno3"
                        "interval": 5000,
                        "missed": 0,
                        "missed_max": 2,
        "ens2f2": {
                "ifname": "ens2f2"
                        "interval": 5000,
                        "missed": 0,
                        "missed_max": 2,
        "active_port": "eno3"
Tue Feb  7 14:59:15 UTC 2017
        "eno3": {
                "ifname": "eno3"
                        "interval": 5000,
                        "missed": 1,
                        "missed_max": 2,
        "ens2f2": {
                "ifname": "ens2f2"
                        "interval": 5000,
                        "missed": 0,
                        "missed_max": 2,
        "active_port": "eno3"
Tue Feb  7 14:59:20 UTC 2017
        "eno3": {
                "ifname": "eno3"
                        "interval": 5000,
                        "missed": 2,
                        "missed_max": 2,
        "ens2f2": {
                "ifname": "ens2f2"
                        "interval": 5000,
                        "missed": 0,
                        "missed_max": 2,
        "active_port": "eno3"

Tue Feb  7 14:59:25 UTC 2017
        "eno3": {
                "ifname": "eno3"
                        "interval": 5000,
                        "missed": 3,
                        "missed_max": 2,
        "ens2f2": {
                "ifname": "ens2f2"
                        "interval": 5000,
                        "missed": 0,
                        "missed_max": 2,
        "active_port": "ens2f2"    <---- failover occurs

Now the missed counter wrongly increases for active port
Tue Feb  7 14:59:30 UTC 2017
        "eno3": {
                "ifname": "eno3"
                        "interval": 5000,
                        "missed": 4,
                        "missed_max": 2,
        "ens2f2": {
                "ifname": "ens2f2"
                        "interval": 5000,
                        "missed": 1,     <---- 
                        "missed_max": 2,
        "active_port": "ens2f2"
Tue Feb  7 14:59:35 UTC 2017
        "eno3": {
                "ifname": "eno3"
                        "interval": 5000,
                        "missed": 5,
                        "missed_max": 2,
        "ens2f2": {
                "ifname": "ens2f2"
                        "interval": 5000,
                        "missed": 2,      <-----
                        "missed_max": 2,
        "active_port": "ens2f2"
Tue Feb  7 14:59:41 UTC 2017
        "eno3": {
                "ifname": "eno3"
                        "interval": 5000,
                        "missed": 6,
                        "missed_max": 2,
        "ens2f2": {
                "ifname": "ens2f2"
                        "interval": 5000,
                        "missed": 3,      <------
                        "missed_max": 2,
        "active_port": ""


# tcpdump -nlpev -i ens2f2 arp
# 14:59:40.342416 28:80:23:b3:98:96 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.254.35.1 (Broadcast) tell 0.0.0.0, length 28
# 14:59:40.342906 00:00:5e:00:01:23 > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.254.35.1 is-at 00:00:5e:00:01:23, length 46


In the case that both validate_active and validate_inactive are enabled, the failover does not work at.

Any idea why there is such behaviour? 

Thank you!

Comment 6 Xin Long 2017-02-08 11:00:55 UTC

(In reply to Fani Orestiadou from comment #5)
> Hello, 
> 
> OK, tested to remove the source_host and now there is no delay during
> failover and no issue from switch side, but there is still issue if customer
> enables the validate_active option. 
> It seems that even if the ARP probes are sent and received, still the team
> increases the missed counter. 
> 
> So having the below configuration:
> TEAM_CONFIG='{"runner": {"name": "activebackup", "hwaddr_policy":
> "only_active"}, "link_watch": { "validate_active": true, "name": "arp_ping",
> "interval": 5000, "missed_max": 2, "target_host": "10.254.35.1",
> "send_always": true }}'
> 
> Testing the failover:
> 
> Tue Feb  7 14:59:09 UTC 2017
>         "eno3": {
>                 "ifname": "eno3"
>                         "interval": 5000,
>                         "missed": 0,
>                         "missed_max": 2,
>         "ens2f2": {
>                 "ifname": "ens2f2"
>                         "interval": 5000,
>                         "missed": 0,
>                         "missed_max": 2,
>         "active_port": "eno3"
> Tue Feb  7 14:59:15 UTC 2017
>         "eno3": {
>                 "ifname": "eno3"
>                         "interval": 5000,
>                         "missed": 1,
>                         "missed_max": 2,
>         "ens2f2": {
>                 "ifname": "ens2f2"
>                         "interval": 5000,
>                         "missed": 0,
>                         "missed_max": 2,
>         "active_port": "eno3"
> Tue Feb  7 14:59:20 UTC 2017
>         "eno3": {
>                 "ifname": "eno3"
>                         "interval": 5000,
>                         "missed": 2,
>                         "missed_max": 2,
>         "ens2f2": {
>                 "ifname": "ens2f2"
>                         "interval": 5000,
>                         "missed": 0,
>                         "missed_max": 2,
>         "active_port": "eno3"
> 
> Tue Feb  7 14:59:25 UTC 2017
>         "eno3": {
>                 "ifname": "eno3"
>                         "interval": 5000,
>                         "missed": 3,
>                         "missed_max": 2,
>         "ens2f2": {
>                 "ifname": "ens2f2"
>                         "interval": 5000,
>                         "missed": 0,
>                         "missed_max": 2,
>         "active_port": "ens2f2"    <---- failover occurs
> 
> Now the missed counter wrongly increases for active port
> Tue Feb  7 14:59:30 UTC 2017
>         "eno3": {
>                 "ifname": "eno3"
>                         "interval": 5000,
>                         "missed": 4,
>                         "missed_max": 2,
>         "ens2f2": {
>                 "ifname": "ens2f2"
>                         "interval": 5000,
>                         "missed": 1,     <---- 
>                         "missed_max": 2,
>         "active_port": "ens2f2"
> Tue Feb  7 14:59:35 UTC 2017
>         "eno3": {
>                 "ifname": "eno3"
>                         "interval": 5000,
>                         "missed": 5,
>                         "missed_max": 2,
>         "ens2f2": {
>                 "ifname": "ens2f2"
>                         "interval": 5000,
>                         "missed": 2,      <-----
>                         "missed_max": 2,
>         "active_port": "ens2f2"
> Tue Feb  7 14:59:41 UTC 2017
>         "eno3": {
>                 "ifname": "eno3"
>                         "interval": 5000,
>                         "missed": 6,
>                         "missed_max": 2,
>         "ens2f2": {
>                 "ifname": "ens2f2"
>                         "interval": 5000,
>                         "missed": 3,      <------
>                         "missed_max": 2,
>         "active_port": ""
> 
> 
> # tcpdump -nlpev -i ens2f2 arp
> # 14:59:40.342416 28:80:23:b3:98:96 > Broadcast, ethertype ARP (0x0806),
> length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.254.35.1
> (Broadcast) tell 0.0.0.0, length 28
> # 14:59:40.342906 00:00:5e:00:01:23 > Broadcast, ethertype ARP (0x0806),
> length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.254.35.1 is-at
> 00:00:5e:00:01:23, length 46
> 
> 
> In the case that both validate_active and validate_inactive are enabled, the
> failover does not work at.
> 
> Any idea why there is such behaviour? 
I couldn't reproduce this issue in my env, pls ensure target_host 10.254.35.1 is still reachable when this issue happens.

can you also give the team1 and all ports' mac address when this issue happens ?

have you tried with both validate_active and validate_inactive are false ? btw

> 
> Thank you!

Comment 7 Xin Long 2017-03-09 15:27:08 UTC

Fani and I have solved this issue in customer's env, so just close it.

Thanks Fani, nice collaboration!