1445499 – team with link_watch = nsna_ping does not stay up

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1445499 - team with link_watch = nsna_ping does not stay up

Summary: team with link_watch = nsna_ping does not stay up

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libteam
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Xin Long
QA Contact:	Rick Alongi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-04-25 19:44 UTC by Amit Supugade
Modified:	2021-06-10 12:14 UTC (History)
CC List:	6 users (show)
Fixed In Version:	libteam-1.27-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-10 18:48:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:1011	0	None	None	None	2018-04-10 18:49:19 UTC

Description Amit Supugade 2017-04-25 19:44:41 UTC

Description of problem:
team with link_watch = nsna_ping does not stay up. team interfaces tries to come up it comes up for a short time and goes fails again


Version-Release number of selected component (if applicable):
kernel-3.10.0-656.el7.x86_64
libteam-1.25-5.el7.x86_64
teamd-1.25-5.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
port0=enp7s0f0
port1=enp7s0f1
teamd -d -t team0 -c '{ "runner" : { "name": "roundrobin" },  "link_watch" : { "name": "nsna_ping", "interval": 500, "target_host": "2001::254" } }'
ip link set team0 up
teamdctl team0 port add $port0
teamdctl team0 port add $port1
ip a

Actual results:
team0 does not stay up

Expected results:
team0 should stay up

Additional info:

[root@sam ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp7s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0100000000000000000000333135384643 state UP qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
3: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0200000000000000000000333135384643 state UP qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
4: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether e4:11:5b:dd:e6:6c brd ff:ff:ff:ff:ff:ff
    inet 10.19.15.26/24 brd 10.19.15.255 scope global dynamic enp5s0f0
       valid_lft 85452sec preferred_lft 85452sec
    inet6 2620:52:0:130b:e611:5bff:fedd:e66c/64 scope global noprefixroute dynamic 
       valid_lft 2591958sec preferred_lft 604758sec
    inet6 fe80::e611:5bff:fedd:e66c/64 scope link 
       valid_lft forever preferred_lft forever
5: enp5s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether e4:11:5b:dd:e6:6d brd ff:ff:ff:ff:ff:ff
10: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
    inet6 2001::290:faff:fe8a:5bfa/64 scope global mngtmpaddr dynamic 
       valid_lft 2591796sec preferred_lft 604596sec
    inet6 fe80::290:faff:fe8a:5bfa/64 scope link 
       valid_lft forever preferred_lft forever

[root@sam ~]# teamdctl team0 state dump
{
    "ports": {
        "enp7s0f0": {
            "ifinfo": {
                "dev_addr": "00:90:fa:8a:5b:fa",
                "dev_addr_len": 6,
                "ifindex": 2,
                "ifname": "enp7s0f0"
            },
            "link": {
                "duplex": "half",
                "speed": 0,
                "up": true
            },
            "link_watches": {
                "list": {
                    "link_watch_0": {
                        "down_count": 20,
                        "init_wait": 0,
                        "interval": 500,
                        "missed": 6,
                        "missed_max": 3,
                        "name": "nsna_ping",
                        "target_host": "2001::254",
                        "up": false
                    }
                },
                "up": false
            }
        },
        "enp7s0f1": {
            "ifinfo": {
                "dev_addr": "00:90:fa:8a:5b:fa",
                "dev_addr_len": 6,
                "ifindex": 3,
                "ifname": "enp7s0f1"
            },
            "link": {
                "duplex": "half",
                "speed": 0,
                "up": true
            },
            "link_watches": {
                "list": {
                    "link_watch_0": {
                        "down_count": 20,
                        "init_wait": 0,
                        "interval": 500,
                        "missed": 6,
                        "missed_max": 3,
                        "name": "nsna_ping",
                        "target_host": "2001::254",
                        "up": false
                    }
                },
                "up": false
            }
        }
    },
    "setup": {
        "daemonized": true,
        "dbus_enabled": false,
        "debug_level": 0,
        "kernel_team_mode_name": "roundrobin",
        "pid": 21443,
        "pid_file": "/var/run/teamd/team0.pid",
        "runner_name": "roundrobin",
        "zmq_enabled": false
    },
    "team_device": {
        "ifinfo": {
            "dev_addr": "00:90:fa:8a:5b:fa",
            "dev_addr_len": 6,
            "ifindex": 10,
            "ifname": "team0"
        }
    }
}

Comment 2 Xin Long 2017-05-02 09:57:54 UTC

(In reply to Amit Supugade from comment #0)
> Description of problem:
> team with link_watch = nsna_ping does not stay up. team interfaces tries to
> come up it comes up for a short time and goes fails again
> 
...
> 2: enp7s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> team0 portid 0100000000000000000000333135384643 state UP qlen 1000
>     link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
> 3: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> team0 portid 0200000000000000000000333135384643 state UP qlen 1000
>     link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
These two NICs have no any ipv6 link/local addrs, which causes no route for ns packet. I think they are managed by NM while NM has no connects for them. pls try with:

1. make them out of the NM's control
  # nmcli dev set enp7s0f0 managed no
  # nmcli dev set enp7s0f1 managed no

or
2. let NM ignore their IPV6
  # nmcli con add type ethernet ifname enp7s0f0 ipv6.method ignore
  # nmcli con add type ethernet ifname enp7s0f1 ipv6.method ignore

or
3. just disable NM to do this test.

Comment 4 Amit Supugade 2017-05-04 15:01:37 UTC

Hi Xin,
I tried it by disabling NetworkManager and it still fails.

LOG-
[root@sam ~]# systemctl stop NetworkManager
[root@sam ~]# systemctl disable NetworkManager
Removed symlink /etc/systemd/system/multi-user.target.wants/NetworkManager.service.
Removed symlink /etc/systemd/system/dbus-org.freedesktop.NetworkManager.service.
Removed symlink /etc/systemd/system/dbus-org.freedesktop.nm-dispatcher.service.
[root@sam ~]# systemctl status NetworkManager
● NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled; vendor preset: enabled)
   Active: inactive (dead) since Thu 2017-05-04 10:57:21 EDT; 10s ago
     Docs: man:NetworkManager(8)
 Main PID: 824 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/NetworkManager.service
           └─884 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-enp5s0f0.pid -lf /var/l...

May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info>  [1493909468.2944] policy: set 'enp5s0f0' (enp...DNS
May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info>  [1493909468.3195] device (enp5s0f0): Activati...ed.
May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info>  [1493909468.3211] manager: NetworkManager sta...BAL
May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info>  [1493909468.3527] policy: set-hostname: set h...up)
May 04 10:51:09 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info>  [1493909469.0762] manager: startup c...te
May 04 10:51:10 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info>  [1493909470.2102] policy: set 'enp5s...NS
May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com systemd[1]: Stopping Network Manager...
May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info>  [1493909841.8774] caught SIGTERM, sh...y.
May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info>  [1493909841.9240] exiting (success)
May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com systemd[1]: Stopped Network Manager.
Hint: Some lines were ellipsized, use -l to show in full.

[root@sam ~]# port0=enp7s0f0
[root@sam ~]# port1=enp7s0f1
[root@sam ~]# teamd -d -t team0 -c '{ "runner" : { "name": "roundrobin" },  "link_watch" : { "name": "nsna_ping", "interval": 500, "target_host": "2001::254" } }'
This program is not intended to be run as root.
[root@sam ~]# ip link set team0 up
[root@sam ~]# teamdctl team0 port add $port0
[root@sam ~]# teamdctl team0 port add $port1
[root@sam ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether e4:11:5b:dd:e6:6c brd ff:ff:ff:ff:ff:ff
    inet 10.19.15.26/24 brd 10.19.15.255 scope global dynamic enp5s0f0
       valid_lft 85975sec preferred_lft 85975sec
    inet6 2620:52:0:130b:e611:5bff:fedd:e66c/64 scope global noprefixroute dynamic 
       valid_lft 2591577sec preferred_lft 604377sec
    inet6 fe80::e611:5bff:fedd:e66c/64 scope link 
       valid_lft forever preferred_lft forever
3: enp7s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0100000000000000000000333135384643 state UP qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
    inet6 fe80::e4bb:31ff:fe93:ddf9/64 scope link tentative 
       valid_lft forever preferred_lft forever
4: enp5s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether e4:11:5b:dd:e6:6d brd ff:ff:ff:ff:ff:ff
5: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0200000000000000000000333135384643 state UP qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
    inet6 fe80::290:faff:fe8a:5bfa/64 scope link tentative 
       valid_lft forever preferred_lft forever
7: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
    link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff
    inet6 fe80::290:faff:fe8a:5bfa/64 scope link tentative dadfailed 
       valid_lft forever preferred_lft forever

Comment 5 Xin Long 2017-05-05 09:10:59 UTC

I could see many "missed" happened in your env. Before starting teamd, can you check if "target_host" arrives ? in your host with:
  ndisc6 2001::254 enp7s0f0
  ndisc6 2001::254 enp7s0f1

If it works well, I think there's something changing the ns/na packets when forwarding in your switch. can I check on your env, or pls provide the packet you captured on enp7s0f0 and enp7s0f1 ?

Thanks.

Comment 7 Xin Long 2017-05-06 10:52:42 UTC

As we expected, the switch indeed did something different from linux:
1. ipv6 ns packet's dscp from your switch is 0xc which it is 0x0 in linux
2. ipv6 ns packet's source addr from your switch is the link/local addr (fe80::de38:e1ff:fe9c:4d41) instead of global target addr (2001::254) which it is global target addr (2001::254) in linux's

These two caused teamd to fail to validate ns and not process ns packet.

I will post upstream the following fix:

--- a/teamd/teamd_lw_nsna_ping.c
+++ b/teamd/teamd_lw_nsna_ping.c
@@ -247,11 +247,11 @@ static int lw_nsnap_receive(struct lw_psr_port_priv *psr_ppriv)
                return err;

        /* check IPV6 header */
-       if (nap.ip6h.ip6_vfc != 0x60 /* IPV6 */ ||
+       if ((nap.ip6h.ip6_vfc & 0xf0) != 0x60 /* IPV6 */ ||
            nap.ip6h.ip6_plen != htons(sizeof(nap) - sizeof(nap.ip6h)) ||
            nap.ip6h.ip6_nxt != IPPROTO_ICMPV6 ||
            nap.ip6h.ip6_hlim != 255 /* Do not route */ ||
-           memcmp(&nap.ip6h.ip6_src, &nsnap_ppriv->dst.sin6_addr,
+           memcmp(&nap.nah.nd_na_target, &nsnap_ppriv->dst.sin6_addr,
                   sizeof(struct in6_addr)))
                return 0;


btw, is your switch/router cisco or something else ?

Thanks.

Comment 8 Xin Long 2017-05-08 08:30:21 UTC

upstream fix:
https://github.com/jpirko/libteam/commit/9a9fbff3e75f78cbff76e9dbd1cfa0a05fd1f120
https://github.com/jpirko/libteam/commit/49c1de9b67a5a26f120294743d206f3a9286a314

Comment 10 Amit Supugade 2017-05-11 13:45:24 UTC

Hi Xin, 
Test failed on RHEL-7.3.
As we discussed, it could be because of switch software update. Removing Regression tag. Thanks!

Comment 17 errata-xmlrpc 2018-04-10 18:48:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1011

Note You need to log in before you can comment on or make changes to this bug.