Bug 1445499

Summary:	team with link_watch = nsna_ping does not stay up
Product:	Red Hat Enterprise Linux 7	Reporter:	Amit Supugade <asupugad>
Component:	libteam	Assignee:	Xin Long <lxin>
Status:	CLOSED ERRATA	QA Contact:	Rick Alongi <ralongi>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.4	CC:	aiyengar, atragler, lxin, ralongi, rkhan, sukulkar
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	libteam-1.27-1.el7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-10 18:48:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Amit Supugade 2017-04-25 19:44:41 UTC

Description of problem:
team with link_watch = nsna_ping does not stay up. team interfaces tries to come up it comes up for a short time and goes fails again


Version-Release number of selected component (if applicable):
kernel-3.10.0-656.el7.x86_64
libteam-1.25-5.el7.x86_64
teamd-1.25-5.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
port0=enp7s0f0
port1=enp7s0f1
teamd -d -t team0 -c '{ "runner" : { "name": "roundrobin" },  "link_watch" : { "name": "nsna_ping", "interval": 500, "target_host": "2001::254" } }'
ip link set team0 up
teamdctl team0 port add $port0
teamdctl team0 port add $port1
ip a

Actual results:
team0 does not stay up

Expected results:
team0 should stay up

Additional info:

[root@sam ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp7s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0100000000000000000000333135384643 state UP qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
3: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0200000000000000000000333135384643 state UP qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
4: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether e4:11:5b:dd:e6:6c brd ff:ff:ff:ff:ff:ff
    inet 10.19.15.26/24 brd 10.19.15.255 scope global dynamic enp5s0f0
       valid_lft 85452sec preferred_lft 85452sec
    inet6 2620:52:0:130b:e611:5bff:fedd:e66c/64 scope global noprefixroute dynamic 
       valid_lft 2591958sec preferred_lft 604758sec
    inet6 fe80::e611:5bff:fedd:e66c/64 scope link 
       valid_lft forever preferred_lft forever
5: enp5s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether e4:11:5b:dd:e6:6d brd ff:ff:ff:ff:ff:ff
10: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
    inet6 2001::290:faff:fe8a:5bfa/64 scope global mngtmpaddr dynamic 
       valid_lft 2591796sec preferred_lft 604596sec
    inet6 fe80::290:faff:fe8a:5bfa/64 scope link 
       valid_lft forever preferred_lft forever

[root@sam ~]# teamdctl team0 state dump
{
    "ports": {
        "enp7s0f0": {
            "ifinfo": {
                "dev_addr": "00:90:fa:8a:5b:fa",
                "dev_addr_len": 6,
                "ifindex": 2,
                "ifname": "enp7s0f0"
            },
            "link": {
                "duplex": "half",
                "speed": 0,
                "up": true
            },
            "link_watches": {
                "list": {
                    "link_watch_0": {
                        "down_count": 20,
                        "init_wait": 0,
                        "interval": 500,
                        "missed": 6,
                        "missed_max": 3,
                        "name": "nsna_ping",
                        "target_host": "2001::254",
                        "up": false
                    }
                },
                "up": false
            }
        },
        "enp7s0f1": {
            "ifinfo": {
                "dev_addr": "00:90:fa:8a:5b:fa",
                "dev_addr_len": 6,
                "ifindex": 3,
                "ifname": "enp7s0f1"
            },
            "link": {
                "duplex": "half",
                "speed": 0,
                "up": true
            },
            "link_watches": {
                "list": {
                    "link_watch_0": {
                        "down_count": 20,
                        "init_wait": 0,
                        "interval": 500,
                        "missed": 6,
                        "missed_max": 3,
                        "name": "nsna_ping",
                        "target_host": "2001::254",
                        "up": false
                    }
                },
                "up": false
            }
        }
    },
    "setup": {
        "daemonized": true,
        "dbus_enabled": false,
        "debug_level": 0,
        "kernel_team_mode_name": "roundrobin",
        "pid": 21443,
        "pid_file": "/var/run/teamd/team0.pid",
        "runner_name": "roundrobin",
        "zmq_enabled": false
    },
    "team_device": {
        "ifinfo": {
            "dev_addr": "00:90:fa:8a:5b:fa",
            "dev_addr_len": 6,
            "ifindex": 10,
            "ifname": "team0"
        }
    }
}

Comment 2 Xin Long 2017-05-02 09:57:54 UTC

(In reply to Amit Supugade from comment #0)
> Description of problem:
> team with link_watch = nsna_ping does not stay up. team interfaces tries to
> come up it comes up for a short time and goes fails again
> 
...
> 2: enp7s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> team0 portid 0100000000000000000000333135384643 state UP qlen 1000
>     link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
> 3: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> team0 portid 0200000000000000000000333135384643 state UP qlen 1000
>     link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
These two NICs have no any ipv6 link/local addrs, which causes no route for ns packet. I think they are managed by NM while NM has no connects for them. pls try with:

1. make them out of the NM's control
  # nmcli dev set enp7s0f0 managed no
  # nmcli dev set enp7s0f1 managed no

or
2. let NM ignore their IPV6
  # nmcli con add type ethernet ifname enp7s0f0 ipv6.method ignore
  # nmcli con add type ethernet ifname enp7s0f1 ipv6.method ignore

or
3. just disable NM to do this test.

Comment 4 Amit Supugade 2017-05-04 15:01:37 UTC

Hi Xin,
I tried it by disabling NetworkManager and it still fails.

LOG-
[root@sam ~]# systemctl stop NetworkManager
[root@sam ~]# systemctl disable NetworkManager
Removed symlink /etc/systemd/system/multi-user.target.wants/NetworkManager.service.
Removed symlink /etc/systemd/system/dbus-org.freedesktop.NetworkManager.service.
Removed symlink /etc/systemd/system/dbus-org.freedesktop.nm-dispatcher.service.
[root@sam ~]# systemctl status NetworkManager
● NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled; vendor preset: enabled)
   Active: inactive (dead) since Thu 2017-05-04 10:57:21 EDT; 10s ago
     Docs: man:NetworkManager(8)
 Main PID: 824 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/NetworkManager.service
           └─884 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-enp5s0f0.pid -lf /var/l...

May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info>  [1493909468.2944] policy: set 'enp5s0f0' (enp...DNS
May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info>  [1493909468.3195] device (enp5s0f0): Activati...ed.
May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info>  [1493909468.3211] manager: NetworkManager sta...BAL
May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info>  [1493909468.3527] policy: set-hostname: set h...up)
May 04 10:51:09 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info>  [1493909469.0762] manager: startup c...te
May 04 10:51:10 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info>  [1493909470.2102] policy: set 'enp5s...NS
May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com systemd[1]: Stopping Network Manager...
May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info>  [1493909841.8774] caught SIGTERM, sh...y.
May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info>  [1493909841.9240] exiting (success)
May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com systemd[1]: Stopped Network Manager.
Hint: Some lines were ellipsized, use -l to show in full.

[root@sam ~]# port0=enp7s0f0
[root@sam ~]# port1=enp7s0f1
[root@sam ~]# teamd -d -t team0 -c '{ "runner" : { "name": "roundrobin" },  "link_watch" : { "name": "nsna_ping", "interval": 500, "target_host": "2001::254" } }'
This program is not intended to be run as root.
[root@sam ~]# ip link set team0 up
[root@sam ~]# teamdctl team0 port add $port0
[root@sam ~]# teamdctl team0 port add $port1
[root@sam ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether e4:11:5b:dd:e6:6c brd ff:ff:ff:ff:ff:ff
    inet 10.19.15.26/24 brd 10.19.15.255 scope global dynamic enp5s0f0
       valid_lft 85975sec preferred_lft 85975sec
    inet6 2620:52:0:130b:e611:5bff:fedd:e66c/64 scope global noprefixroute dynamic 
       valid_lft 2591577sec preferred_lft 604377sec
    inet6 fe80::e611:5bff:fedd:e66c/64 scope link 
       valid_lft forever preferred_lft forever
3: enp7s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0100000000000000000000333135384643 state UP qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
    inet6 fe80::e4bb:31ff:fe93:ddf9/64 scope link tentative 
       valid_lft forever preferred_lft forever
4: enp5s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether e4:11:5b:dd:e6:6d brd ff:ff:ff:ff:ff:ff
5: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0200000000000000000000333135384643 state UP qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
    inet6 fe80::290:faff:fe8a:5bfa/64 scope link tentative 
       valid_lft forever preferred_lft forever
7: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
    inet6 fe80::290:faff:fe8a:5bfa/64 scope link tentative dadfailed 
       valid_lft forever preferred_lft forever

Comment 5 Xin Long 2017-05-05 09:10:59 UTC

I could see many "missed" happened in your env. Before starting teamd, can you check if "target_host" arrives ? in your host with:
  ndisc6 2001::254 enp7s0f0
  ndisc6 2001::254 enp7s0f1

If it works well, I think there's something changing the ns/na packets when forwarding in your switch. can I check on your env, or pls provide the packet you captured on enp7s0f0 and enp7s0f1 ?

Thanks.

Comment 7 Xin Long 2017-05-06 10:52:42 UTC

As we expected, the switch indeed did something different from linux:
1. ipv6 ns packet's dscp from your switch is 0xc which it is 0x0 in linux
2. ipv6 ns packet's source addr from your switch is the link/local addr (fe80::de38:e1ff:fe9c:4d41) instead of global target addr (2001::254) which it is global target addr (2001::254) in linux's

These two caused teamd to fail to validate ns and not process ns packet.

I will post upstream the following fix:

--- a/teamd/teamd_lw_nsna_ping.c
+++ b/teamd/teamd_lw_nsna_ping.c
@@ -247,11 +247,11 @@ static int lw_nsnap_receive(struct lw_psr_port_priv *psr_ppriv)
                return err;

        /* check IPV6 header */
-       if (nap.ip6h.ip6_vfc != 0x60 /* IPV6 */ ||
+       if ((nap.ip6h.ip6_vfc & 0xf0) != 0x60 /* IPV6 */ ||
            nap.ip6h.ip6_plen != htons(sizeof(nap) - sizeof(nap.ip6h)) ||
            nap.ip6h.ip6_nxt != IPPROTO_ICMPV6 ||
            nap.ip6h.ip6_hlim != 255 /* Do not route */ ||
-           memcmp(&nap.ip6h.ip6_src, &nsnap_ppriv->dst.sin6_addr,
+           memcmp(&nap.nah.nd_na_target, &nsnap_ppriv->dst.sin6_addr,
                   sizeof(struct in6_addr)))
                return 0;


btw, is your switch/router cisco or something else ?

Thanks.

Comment 8 Xin Long 2017-05-08 08:30:21 UTC

upstream fix:
https://github.com/jpirko/libteam/commit/9a9fbff3e75f78cbff76e9dbd1cfa0a05fd1f120
https://github.com/jpirko/libteam/commit/49c1de9b67a5a26f120294743d206f3a9286a314

Comment 10 Amit Supugade 2017-05-11 13:45:24 UTC

Hi Xin, 
Test failed on RHEL-7.3.
As we discussed, it could be because of switch software update. Removing Regression tag. Thanks!

Comment 17 errata-xmlrpc 2018-04-10 18:48:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1011