Bug 1445499
Summary: | team with link_watch = nsna_ping does not stay up | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Amit Supugade <asupugad> |
Component: | libteam | Assignee: | Xin Long <lxin> |
Status: | CLOSED ERRATA | QA Contact: | Rick Alongi <ralongi> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.4 | CC: | aiyengar, atragler, lxin, ralongi, rkhan, sukulkar |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libteam-1.27-1.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-10 18:48:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Amit Supugade
2017-04-25 19:44:41 UTC
(In reply to Amit Supugade from comment #0) > Description of problem: > team with link_watch = nsna_ping does not stay up. team interfaces tries to > come up it comes up for a short time and goes fails again > ... > 2: enp7s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master > team0 portid 0100000000000000000000333135384643 state UP qlen 1000 > link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff > 3: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master > team0 portid 0200000000000000000000333135384643 state UP qlen 1000 > link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff These two NICs have no any ipv6 link/local addrs, which causes no route for ns packet. I think they are managed by NM while NM has no connects for them. pls try with: 1. make them out of the NM's control # nmcli dev set enp7s0f0 managed no # nmcli dev set enp7s0f1 managed no or 2. let NM ignore their IPV6 # nmcli con add type ethernet ifname enp7s0f0 ipv6.method ignore # nmcli con add type ethernet ifname enp7s0f1 ipv6.method ignore or 3. just disable NM to do this test. Hi Xin, I tried it by disabling NetworkManager and it still fails. LOG- [root@sam ~]# systemctl stop NetworkManager [root@sam ~]# systemctl disable NetworkManager Removed symlink /etc/systemd/system/multi-user.target.wants/NetworkManager.service. Removed symlink /etc/systemd/system/dbus-org.freedesktop.NetworkManager.service. Removed symlink /etc/systemd/system/dbus-org.freedesktop.nm-dispatcher.service. [root@sam ~]# systemctl status NetworkManager ● NetworkManager.service - Network Manager Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled; vendor preset: enabled) Active: inactive (dead) since Thu 2017-05-04 10:57:21 EDT; 10s ago Docs: man:NetworkManager(8) Main PID: 824 (code=exited, status=0/SUCCESS) CGroup: /system.slice/NetworkManager.service └─884 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-enp5s0f0.pid -lf /var/l... May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info> [1493909468.2944] policy: set 'enp5s0f0' (enp...DNS May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info> [1493909468.3195] device (enp5s0f0): Activati...ed. May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info> [1493909468.3211] manager: NetworkManager sta...BAL May 04 10:51:08 localhost.localdomain NetworkManager[824]: <info> [1493909468.3527] policy: set-hostname: set h...up) May 04 10:51:09 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info> [1493909469.0762] manager: startup c...te May 04 10:51:10 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info> [1493909470.2102] policy: set 'enp5s...NS May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com systemd[1]: Stopping Network Manager... May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info> [1493909841.8774] caught SIGTERM, sh...y. May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com NetworkManager[824]: <info> [1493909841.9240] exiting (success) May 04 10:57:21 sam.knqe.lab.eng.bos.redhat.com systemd[1]: Stopped Network Manager. Hint: Some lines were ellipsized, use -l to show in full. [root@sam ~]# port0=enp7s0f0 [root@sam ~]# port1=enp7s0f1 [root@sam ~]# teamd -d -t team0 -c '{ "runner" : { "name": "roundrobin" }, "link_watch" : { "name": "nsna_ping", "interval": 500, "target_host": "2001::254" } }' This program is not intended to be run as root. [root@sam ~]# ip link set team0 up [root@sam ~]# teamdctl team0 port add $port0 [root@sam ~]# teamdctl team0 port add $port1 [root@sam ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether e4:11:5b:dd:e6:6c brd ff:ff:ff:ff:ff:ff inet 10.19.15.26/24 brd 10.19.15.255 scope global dynamic enp5s0f0 valid_lft 85975sec preferred_lft 85975sec inet6 2620:52:0:130b:e611:5bff:fedd:e66c/64 scope global noprefixroute dynamic valid_lft 2591577sec preferred_lft 604377sec inet6 fe80::e611:5bff:fedd:e66c/64 scope link valid_lft forever preferred_lft forever 3: enp7s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0100000000000000000000333135384643 state UP qlen 1000 link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff inet6 fe80::e4bb:31ff:fe93:ddf9/64 scope link tentative valid_lft forever preferred_lft forever 4: enp5s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether e4:11:5b:dd:e6:6d brd ff:ff:ff:ff:ff:ff 5: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 portid 0200000000000000000000333135384643 state UP qlen 1000 link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff inet6 fe80::290:faff:fe8a:5bfa/64 scope link tentative valid_lft forever preferred_lft forever 7: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000 link/ether 00:90:fa:8a:5b:fa brd ff:ff:ff:ff:ff:ff inet6 fe80::290:faff:fe8a:5bfa/64 scope link tentative dadfailed valid_lft forever preferred_lft forever I could see many "missed" happened in your env. Before starting teamd, can you check if "target_host" arrives ? in your host with: ndisc6 2001::254 enp7s0f0 ndisc6 2001::254 enp7s0f1 If it works well, I think there's something changing the ns/na packets when forwarding in your switch. can I check on your env, or pls provide the packet you captured on enp7s0f0 and enp7s0f1 ? Thanks. As we expected, the switch indeed did something different from linux: 1. ipv6 ns packet's dscp from your switch is 0xc which it is 0x0 in linux 2. ipv6 ns packet's source addr from your switch is the link/local addr (fe80::de38:e1ff:fe9c:4d41) instead of global target addr (2001::254) which it is global target addr (2001::254) in linux's These two caused teamd to fail to validate ns and not process ns packet. I will post upstream the following fix: --- a/teamd/teamd_lw_nsna_ping.c +++ b/teamd/teamd_lw_nsna_ping.c @@ -247,11 +247,11 @@ static int lw_nsnap_receive(struct lw_psr_port_priv *psr_ppriv) return err; /* check IPV6 header */ - if (nap.ip6h.ip6_vfc != 0x60 /* IPV6 */ || + if ((nap.ip6h.ip6_vfc & 0xf0) != 0x60 /* IPV6 */ || nap.ip6h.ip6_plen != htons(sizeof(nap) - sizeof(nap.ip6h)) || nap.ip6h.ip6_nxt != IPPROTO_ICMPV6 || nap.ip6h.ip6_hlim != 255 /* Do not route */ || - memcmp(&nap.ip6h.ip6_src, &nsnap_ppriv->dst.sin6_addr, + memcmp(&nap.nah.nd_na_target, &nsnap_ppriv->dst.sin6_addr, sizeof(struct in6_addr))) return 0; btw, is your switch/router cisco or something else ? Thanks. upstream fix: https://github.com/jpirko/libteam/commit/9a9fbff3e75f78cbff76e9dbd1cfa0a05fd1f120 https://github.com/jpirko/libteam/commit/49c1de9b67a5a26f120294743d206f3a9286a314 Hi Xin, Test failed on RHEL-7.3. As we discussed, it could be because of switch software update. Removing Regression tag. Thanks! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1011 |