Red Hat Bugzilla – Bug 1445499
team with link_watch = nsna_ping does not stay up
Last modified: 2018-04-10 14:49:20 EDT
Description of problem: team with link_watch = nsna_ping does not stay up. team interfaces tries to come up it comes up for a short time and goes fails again Version-Release number of selected component (if applicable): kernel-3.10.0-656.el7.x86_64 libteam-1.25-5.el7.x86_64 teamd-1.25-5.el7.x86_64 How reproducible: Always Steps to Reproduce: port0=enp7s0f0 port1=enp7s0f1 teamd -d -t team0 -c '{ "runner" : { "name": "roundrobin" }, "link_watch" : { "name": "nsna_ping", "interval": 500, "target_host": "2001::254" } }' ip link set team0 up teamdctl team0 port add $port0 teamdctl team0 port add $port1 ip a Actual results: team0 does not stay up Expected results: team0 should stay up Additional info: [root@sam ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp7s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0100000000000000000000333135384643 state UP qlen 1000 link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff 3: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0200000000000000000000333135384643 state UP qlen 1000 link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff 4: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether e4:11:5b:dd:e6:6c brd ff:ff:ff:ff:ff:ff inet 10.19.15.26/24 brd 10.19.15.255 scope global dynamic enp5s0f0 valid_lft 85452sec preferred_lft 85452sec inet6 2620:52:0:130b:e611:5bff:fedd:e66c/64 scope global noprefixroute dynamic valid_lft 2591958sec preferred_lft 604758sec inet6 fe80::e611:5bff:fedd:e66c/64 scope link valid_lft forever preferred_lft forever 5: enp5s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether e4:11:5b:dd:e6:6d brd ff:ff:ff:ff:ff:ff 10: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000 link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff inet6 2001::290:faff:fe8a:5bfa/64 scope global mngtmpaddr dynamic valid_lft 2591796sec preferred_lft 604596sec inet6 fe80::290:faff:fe8a:5bfa/64 scope link valid_lft forever preferred_lft forever [root@sam ~]# teamdctl team0 state dump { "ports": { "enp7s0f0": { "ifinfo": { "dev_addr": "00:90:fa:8a:5b:fa", "dev_addr_len": 6, "ifindex": 2, "ifname": "enp7s0f0" }, "link": { "duplex": "half", "speed": 0, "up": true }, "link_watches": { "list": { "link_watch_0": { "down_count": 20, "init_wait": 0, "interval": 500, "missed": 6, "missed_max": 3, "name": "nsna_ping", "target_host": "2001::254", "up": false } }, "up": false } }, "enp7s0f1": { "ifinfo": { "dev_addr": "00:90:fa:8a:5b:fa", "dev_addr_len": 6, "ifindex": 3, "ifname": "enp7s0f1" }, "link": { "duplex": "half", "speed": 0, "up": true }, "link_watches": { "list": { "link_watch_0": { "down_count": 20, "init_wait": 0, "interval": 500, "missed": 6, "missed_max": 3, "name": "nsna_ping", "target_host": "2001::254", "up": false } }, "up": false } } }, "setup": { "daemonized": true, "dbus_enabled": false, "debug_level": 0, "kernel_team_mode_name": "roundrobin", "pid": 21443, "pid_file": "/var/run/teamd/team0.pid", "runner_name": "roundrobin", "zmq_enabled": false }, "team_device": { "ifinfo": { "dev_addr": "00:90:fa:8a:5b:fa", "dev_addr_len": 6, "ifindex": 10, "ifname": "team0" } } }
(In reply to Amit Supugade from comment #0) > Description of problem: > team with link_watch = nsna_ping does not stay up. team interfaces tries to > come up it comes up for a short time and goes fails again > ... > 2: enp7s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master > team0 portid 0100000000000000000000333135384643 state UP qlen 1000 > link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff > 3: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master > team0 portid 0200000000000000000000333135384643 state UP qlen 1000 > link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff These two NICs have no any ipv6 link/local addrs, which causes no route for ns packet. I think they are managed by NM while NM has no connects for them. pls try with: 1. make them out of the NM's control # nmcli dev set enp7s0f0 managed no # nmcli dev set enp7s0f1 managed no or 2. let NM ignore their IPV6 # nmcli con add type ethernet ifname enp7s0f0 ipv6.method ignore # nmcli con add type ethernet ifname enp7s0f1 ipv6.method ignore or 3. just disable NM to do this test.
Hi Xin, I tried it by disabling NetworkManager and it still fails. LOG- [root@sam ~]# systemctl stop NetworkManager [root@sam ~]# systemctl disable NetworkManager Removed symlink /etc/systemd/system/multi-user.target.wants/NetworkManager.service. Removed symlink /etc/systemd/system/dbus-org.freedesktop.NetworkManager.service. Removed symlink /etc/systemd/system/dbus-org.freedesktop.nm-dispatcher.service. [root@sam ~]# systemctl status NetworkManager ● NetworkManager.service - Network Manager Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled; vendor preset: enabled) Active: inactive (dead) since Thu 2017-05-04 10:57:21 EDT; 10s ago Docs: man:NetworkManager(8) Main PID: 824 (code=exited, status=0/SUCCESS) CGroup: /system.slice/NetworkManager.service └─884 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-enp5s0f0.pid -lf /var/l... May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info> [1493909468.2944] policy: set 'enp5s0f0' (enp...DNS May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info> [1493909468.3195] device (enp5s0f0): Activati...ed. May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info> [1493909468.3211] manager: NetworkManager sta...BAL May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info> [1493909468.3527] policy: set-hostname: set h...up) May 04 10:51:09 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info> [1493909469.0762] manager: startup c...te May 04 10:51:10 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info> [1493909470.2102] policy: set 'enp5s...NS May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com systemd[1]: Stopping Network Manager... May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info> [1493909841.8774] caught SIGTERM, sh...y. May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info> [1493909841.9240] exiting (success) May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com systemd[1]: Stopped Network Manager. Hint: Some lines were ellipsized, use -l to show in full. [root@sam ~]# port0=enp7s0f0 [root@sam ~]# port1=enp7s0f1 [root@sam ~]# teamd -d -t team0 -c '{ "runner" : { "name": "roundrobin" }, "link_watch" : { "name": "nsna_ping", "interval": 500, "target_host": "2001::254" } }' This program is not intended to be run as root. [root@sam ~]# ip link set team0 up [root@sam ~]# teamdctl team0 port add $port0 [root@sam ~]# teamdctl team0 port add $port1 [root@sam ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether e4:11:5b:dd:e6:6c brd ff:ff:ff:ff:ff:ff inet 10.19.15.26/24 brd 10.19.15.255 scope global dynamic enp5s0f0 valid_lft 85975sec preferred_lft 85975sec inet6 2620:52:0:130b:e611:5bff:fedd:e66c/64 scope global noprefixroute dynamic valid_lft 2591577sec preferred_lft 604377sec inet6 fe80::e611:5bff:fedd:e66c/64 scope link valid_lft forever preferred_lft forever 3: enp7s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0100000000000000000000333135384643 state UP qlen 1000 link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff inet6 fe80::e4bb:31ff:fe93:ddf9/64 scope link tentative valid_lft forever preferred_lft forever 4: enp5s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether e4:11:5b:dd:e6:6d brd ff:ff:ff:ff:ff:ff 5: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0200000000000000000000333135384643 state UP qlen 1000 link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff inet6 fe80::290:faff:fe8a:5bfa/64 scope link tentative valid_lft forever preferred_lft forever 7: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000 link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff inet6 fe80::290:faff:fe8a:5bfa/64 scope link tentative dadfailed valid_lft forever preferred_lft forever
I could see many "missed" happened in your env. Before starting teamd, can you check if "target_host" arrives ? in your host with: ndisc6 2001::254 enp7s0f0 ndisc6 2001::254 enp7s0f1 If it works well, I think there's something changing the ns/na packets when forwarding in your switch. can I check on your env, or pls provide the packet you captured on enp7s0f0 and enp7s0f1 ? Thanks.
As we expected, the switch indeed did something different from linux: 1. ipv6 ns packet's dscp from your switch is 0xc which it is 0x0 in linux 2. ipv6 ns packet's source addr from your switch is the link/local addr (fe80::de38:e1ff:fe9c:4d41) instead of global target addr (2001::254) which it is global target addr (2001::254) in linux's These two caused teamd to fail to validate ns and not process ns packet. I will post upstream the following fix: --- a/teamd/teamd_lw_nsna_ping.c +++ b/teamd/teamd_lw_nsna_ping.c @@ -247,11 +247,11 @@ static int lw_nsnap_receive(struct lw_psr_port_priv *psr_ppriv) return err; /* check IPV6 header */ - if (nap.ip6h.ip6_vfc != 0x60 /* IPV6 */ || + if ((nap.ip6h.ip6_vfc & 0xf0) != 0x60 /* IPV6 */ || nap.ip6h.ip6_plen != htons(sizeof(nap) - sizeof(nap.ip6h)) || nap.ip6h.ip6_nxt != IPPROTO_ICMPV6 || nap.ip6h.ip6_hlim != 255 /* Do not route */ || - memcmp(&nap.ip6h.ip6_src, &nsnap_ppriv->dst.sin6_addr, + memcmp(&nap.nah.nd_na_target, &nsnap_ppriv->dst.sin6_addr, sizeof(struct in6_addr))) return 0; btw, is your switch/router cisco or something else ? Thanks.
upstream fix: https://github.com/jpirko/libteam/commit/9a9fbff3e75f78cbff76e9dbd1cfa0a05fd1f120 https://github.com/jpirko/libteam/commit/49c1de9b67a5a26f120294743d206f3a9286a314
Hi Xin, Test failed on RHEL-7.3. As we discussed, it could be because of switch software update. Removing Regression tag. Thanks!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1011