Bug 1393430
Summary: | troubles on a loadbalance team with a link down port | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Juanjo Villaplana <villapla> | ||||
Component: | libteam | Assignee: | Xin Long <lxin> | ||||
Status: | CLOSED ERRATA | QA Contact: | Amit Supugade <asupugad> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.3 | CC: | aloughla, asupugad, atragler, bgalvani, greg.bock, lxin, mleitner, sukulkar, thaller | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | libteam-1.25-5.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-08-01 23:07:21 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Juanjo Villaplana
2016-11-09 14:33:47 UTC
pls try with "link_watch": {"name": "ethtool"} in team config. Hi Xin, "ethtool" seems to be te default "link_watch": # teamdctl team-system state view -v setup: runner: loadbalance kernel team mode: loadbalance D-BUS enabled: yes ZeroMQ enabled: no debug level: 0 daemonized: no PID: 586 PID file: /var/run/teamd/team-system.pid ports: ens192 ifindex: 3 addr: 00:50:56:b8:3a:ae ethtool link: 0mbit/halfduplex/down link watches: link summary: down instance[link_watch_0]: name: ethtool link: down down count: 0 link up delay: 0 link down delay: 0 ens224 ifindex: 4 addr: 00:50:56:b8:3a:ae ethtool link: 10000mbit/fullduplex/up link watches: link summary: up instance[link_watch_0]: name: ethtool link: up down count: 0 link up delay: 0 link down delay: 0 I forgot to mention this same setup *works fine* on a 7.2 with latest updates. Adding "link_watch" to team config didn't help: # nmcli connection add type team con-name team-system ifname team-system config '{"runner": {"name": "loadbalance"}, "link_watch": {"name": "ethtool"}}' Connection 'team-system' (f7bebdfa-92c3-4d2c-b8a5-60592d342374) successfully added. # nmcli connection modify team-system ipv4.method static ipv4.addresses 192.168.3.5/24 ipv6.method ignore # nmcli connection add type team-slave con-name team-system-port1 ifname ens192 master team-system Connection 'team-system-port1' (23098a5f-13ba-4a2e-bb58-132df3d0b2d3) successfully added. # nmcli connection add type team-slave con-name team-system-port2 ifname ens224 master team-system Connection 'team-system-port2' (d6185d97-f4f4-4ce5-b566-23e41f0f02f8) successfully added. # ip link show ens192 3: ens192: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master team-system state DOWN mode DEFAULT qlen 1000 link/ether 00:50:56:b8:3a:ae brd ff:ff:ff:ff:ff:ff # ip link show ens224 4: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master team-system state UP mode DEFAULT qlen 1000 link/ether 00:50:56:b8:3a:ae brd ff:ff:ff:ff:ff:ff # nmcli connection show NAME UUID TYPE DEVICE ens160 e887e0bb-710a-49c5-bbdd-d3bb0fa49853 802-3-ethernet ens160 team-system f7bebdfa-92c3-4d2c-b8a5-60592d342374 team team-system team-system-port1 23098a5f-13ba-4a2e-bb58-132df3d0b2d3 802-3-ethernet ens192 team-system-port2 d6185d97-f4f4-4ce5-b566-23e41f0f02f8 802-3-ethernet ens224 # ip addr show team-system 9: team-system: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 00:50:56:b8:3a:ae brd ff:ff:ff:ff:ff:ff inet6 fe80::cbb6:2208:23bb:3f86/64 scope link valid_lft forever preferred_lft forever # ping -c 5 192.168.3.1 PING 192.168.3.1 (192.168.3.1) 56(84) bytes of data. --- 192.168.3.1 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 3999ms # teamnl team-system option | grep -E "link|enabled" user_linkup_enabled (port:ens224) false user_linkup (port:ens224) true enabled (port:ens224) true user_linkup_enabled (port:ens192) false user_linkup (port:ens192) false enabled (port:ens192) true # teamdctl team-system state view -v setup: runner: loadbalance kernel team mode: loadbalance D-BUS enabled: yes ZeroMQ enabled: no debug level: 0 daemonized: no PID: 2444 PID file: /var/run/teamd/team-system.pid ports: ens192 ifindex: 3 addr: 00:50:56:b8:3a:ae ethtool link: 0mbit/halfduplex/down link watches: link summary: down instance[link_watch_0]: name: ethtool link: down down count: 0 link up delay: 0 link down delay: 0 ens224 ifindex: 4 addr: 00:50:56:b8:3a:ae ethtool link: 10000mbit/fullduplex/up link watches: link summary: up instance[link_watch_0]: name: ethtool link: up down count: 0 link up delay: 0 link down delay: 0 # teamnl -p ens192 team-system setoption enabled false # ping -c 5 192.168.3.1 PING 192.168.3.1 (192.168.3.1) 56(84) bytes of data. 64 bytes from 192.168.3.1: icmp_seq=1 ttl=64 time=0.630 ms 64 bytes from 192.168.3.1: icmp_seq=2 ttl=64 time=0.340 ms 64 bytes from 192.168.3.1: icmp_seq=3 ttl=64 time=0.303 ms 64 bytes from 192.168.3.1: icmp_seq=4 ttl=64 time=0.372 ms 64 bytes from 192.168.3.1: icmp_seq=5 ttl=64 time=0.382 ms --- 192.168.3.1 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4000ms rtt min/avg/max/mdev = 0.303/0.405/0.630/0.117 ms (In reply to Juanjo Villaplana from comment #3) > Hi Xin, > ... > I forgot to mention this same setup *works fine* on a 7.2 with latest > updates. commit a109ed9be629e6bc9618a8b31dd49028bdb242f1 Author: Ivan Vecera <ivecera> Date: Tue Mar 29 15:54:37 2016 -0400 [net] add netnotifier event for upper device change this one may cause this, will check it for sure, thanks. (In reply to Juanjo Villaplana from comment #4) > Adding "link_watch" to team config didn't help: > ... > # teamdctl team-system state view -v > setup: > runner: loadbalance > kernel team mode: loadbalance > D-BUS enabled: yes > ZeroMQ enabled: no > debug level: 0 > daemonized: no > PID: 2444 > PID file: /var/run/teamd/team-system.pid > ports: > ens192 > ifindex: 3 > addr: 00:50:56:b8:3a:ae > ethtool link: 0mbit/halfduplex/down > link watches: > link summary: down > instance[link_watch_0]: > name: ethtool > link: down > down count: 0 > link up delay: 0 > link down delay: 0 > ens224 > ifindex: 4 > addr: 00:50:56:b8:3a:ae > ethtool link: 10000mbit/fullduplex/up > link watches: > link summary: up > instance[link_watch_0]: > name: ethtool > link: up > down count: 0 > link up delay: 0 > link down delay: 0 the strange thing is these 2 ports' mac addrs in your box are still the same. I tried as comment 4 said: # nmcli connection add type team con-name team0 ifname team0 config '{"runner": {"name": "loadbalance"}, "link_watch": {"name": "ethtool"}}' # nmcli connection add type team-slave con-name team0-port1 ifname eth1 master team0 # nmcli connection add type team-slave con-name team0-port2 ifname eth2 master team0 # nmcli connection modify team0 ipv4.method static ipv4.addresses 192.168.11.1/24 ipv6.method ignore # ip link show eth1 # ip link show eth2 # nmcli connection show # ip addr show team0 wait 1-2 minutes... after NM restart the team0 eth1 team0 had eth1's hwaddr, but eth2 had team0's old hwaddr. Then I logged this, and found teamd works well, at last it tried set eth2 the same hwaddr with eth1/team0's, but in kernel, it's totally passed with another hwaddr, which might be team0's hwaddr that NM-team had. besides, the order of hwaddr setting is also different with in teamd. I guess NM-team works as a middle layer between teamd and kernel, but it set eth2's hwaddr with the incorrect hwaddr, which leads to loadbalance didn't work. I couldn't reproduce this issue with teamd/teamctl. Hi Thomas, Can you pls help identify this issue, as it's probably a NM-team's issue ? Thanks. Created attachment 1226437 [details]
Scripts to reproduce the issue
If NM sets different MAC addresses compared to teamd, that's because by default it restores the permanent MAC address of the device upon activation; in comment 0 I see that all the slaves and the team have the same MAC address, and this seems correct. I don't think the issue is related to NetworkManager; indeed, I can reproduce it with teamd alone using the attached scripts: # ./setup.sh # ip netns exec ns1 ./team1.sh # ./team0.sh # ping 192.168.11.3 PING 192.168.11.3 (192.168.11.3) 56(84) bytes of data. ^C --- 192.168.11.3 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999m # teamnl -p veth0 team0 setoption enabled false # ping 192.168.11.3 PING 192.168.11.3 (192.168.11.3) 56(84) bytes of data. 64 bytes from 192.168.11.3: icmp_seq=1 ttl=64 time=0.068 ms 64 bytes from 192.168.11.3: icmp_seq=2 ttl=64 time=0.100 ms (In reply to Beniamino Galvani from comment #8) > If NM sets different MAC addresses compared to teamd, that's because by > default it restores the permanent MAC address of the device upon activation; > in comment 0 I see that all the slaves and the team have the same MAC > address, and this seems correct. > > I don't think the issue is related to NetworkManager; indeed, I can > reproduce it with teamd alone using the attached scripts: > > # ./setup.sh > # ip netns exec ns1 ./team1.sh > # ./team0.sh here, can you try ping 192.168.11.1 in netns ns1 instead # ip netns exec ns1 ping 192.168.11.1 just make sure it's same with comment 0. > # ping 192.168.11.3 > PING 192.168.11.3 (192.168.11.3) 56(84) bytes of data. > ^C > --- 192.168.11.3 ping statistics --- > 2 packets transmitted, 0 received, 100% packet loss, time 999m > > # teamnl -p veth0 team0 setoption enabled false > # ping 192.168.11.3 > PING 192.168.11.3 (192.168.11.3) 56(84) bytes of data. > 64 bytes from 192.168.11.3: icmp_seq=1 ttl=64 time=0.068 ms > 64 bytes from 192.168.11.3: icmp_seq=2 ttl=64 time=0.100 ms besides, that NM sets different MAC addresses compared to teamd is unexpected for team. As with different MAC addresses, team wouldn't work well either. (In reply to Xin Long from comment #9) > here, can you try ping 192.168.11.1 in netns ns1 instead > # ip netns exec ns1 ping 192.168.11.1 > > just make sure it's same with comment 0. In order to simulate the link down, veth1@ns1 is brought down, which results in veth0 having no carrier. So, to reproduce the scenario in comment 0 the ping should be executed on the side with no carrier (that is, the default namespace), no? (In reply to Xin Long from comment #10) > besides, that NM sets different MAC addresses compared to teamd is > unexpected for team. As with different MAC addresses, team wouldn't work > well either. NM only changes the MAC address of interfaces before enslaving them to the team, so I don't think this could affect the team in any way. (In reply to Beniamino Galvani from comment #11) > (In reply to Xin Long from comment #9) > > here, can you try ping 192.168.11.1 in netns ns1 instead > > # ip netns exec ns1 ping 192.168.11.1 > > > > just make sure it's same with comment 0. > > In order to simulate the link down, veth1@ns1 is brought down, which results > in veth0 having no carrier. So, to reproduce the scenario in comment 0 the > ping should be executed on the side with no carrier (that is, the default > namespace), no? I got you now, you're right. Thanks a fix for team driver is needed: --- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -1233,7 +1233,8 @@ static int team_port_add(struct team *team, struct net_device *port_dev) port->index = -1; list_add_tail_rcu(&port->list, &team->port_list); - team_port_enable(team, port); + if (netif_carrier_ok(port_dev)) + team_port_enable(team, port); (In reply to Beniamino Galvani from comment #12) > (In reply to Xin Long from comment #10) > > besides, that NM sets different MAC addresses compared to teamd is > > unexpected for team. As with different MAC addresses, team wouldn't work > > well either. > > NM only changes the MAC address of interfaces before enslaving them to the > team, so I don't think this could affect the team in any way. This issue (not related to this bug) may caused by the conflict between current team0 and the team0 that NM create by ifup later. I guess we have to avoid this by "systemctl restart network", it makes sence to me. as the command in comment 6 wrote the ifcfg-* file. what do you think ? (In reply to Xin Long from comment #14) > (In reply to Beniamino Galvani from comment #12) > > (In reply to Xin Long from comment #10) > > > besides, that NM sets different MAC addresses compared to teamd is > > > unexpected for team. As with different MAC addresses, team wouldn't work > > > well either. > > > > NM only changes the MAC address of interfaces before enslaving them to the > > team, so I don't think this could affect the team in any way. > This issue (not related to this bug) may caused by the conflict between > current team0 and the team0 that NM create by ifup later. Which kind of conflict do you mean? > I guess we have to avoid this by "systemctl restart network", it makes sence > to me. as the command in comment 6 wrote the ifcfg-* file. If I understand correctly, you suggest to restart the network after a change to the ifcfg file. Actually, this is not needed. When these steps are executed: # nmcli connection add type team con-name team0 ifname team0 config '{"runner": {"name": "loadbalance"}, "link_watch": {"name": "ethtool"}}' # nmcli connection add type team-slave con-name team0-port1 ifname eth1 master team0 # nmcli connection add type team-slave con-name team0-port2 ifname eth2 master team0 # nmcli connection modify team0 ipv4.method static ipv4.addresses 192.168.11.1/24 the first steps also activates the connection. Any modification done later to the connection requires a re-activation of the connection, so you can simply add the following: # nmcli connection up team0 to apply the changes. Or, in alternative, specify the ipv4 method and addresses directly in the first command. [root@robin ~]# rpm -q libteam teamd libteam-1.25-5.el7.x86_64 teamd-1.25-5.el7.x86_64 [root@robin ~]# uname -r 3.10.0-637.el7.x86_64 [root@robin ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp7s0f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq portid 0100000000000000000000363635384643 state DOWN qlen 1000 link/ether 00:90:fa:8a:5c:7a brd ff:ff:ff:ff:ff:ff 3: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq portid 0200000000000000000000363635384643 state UP qlen 1000 link/ether 00:90:fa:8a:5c:82 brd ff:ff:ff:ff:ff:ff 4: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether e4:11:5b:dd:d6:06 brd ff:ff:ff:ff:ff:ff inet 10.19.15.27/24 brd 10.19.15.255 scope global dynamic enp5s0f0 valid_lft 55646sec preferred_lft 55646sec inet6 2620:52:0:130b:e611:5bff:fedd:d606/64 scope global noprefixroute dynamic valid_lft 2591710sec preferred_lft 604510sec inet6 fe80::e611:5bff:fedd:d606/64 scope link valid_lft forever preferred_lft forever 5: enp5s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether e4:11:5b:dd:d6:07 brd ff:ff:ff:ff:ff:ff [root@robin ~]# nmcli connection add type team con-name team-system ifname team-system config '{"runner": {"name": "loadbalance"}}' Connection 'team-system' (4edc41ca-638e-45e1-98d2-35aaf6d99a77) successfully added. [root@robin ~]# nmcli connection modify team-system ipv4.method static ipv4.addresses 192.168.1.155/24 ipv6.method ignore [root@robin ~]# nmcli connection add type team-slave con-name team-system-port1 ifname enp7s0f0 master team-system Connection 'team-system-port1' (e45f727c-e76b-4cd1-9130-a70b02771c88) successfully added. [root@robin ~]# nmcli connection add type team-slave con-name team-system-port2 ifname enp7s0f1 master team-system Connection 'team-system-port2' (331f2ab8-fc71-406d-9014-3b415a8f877e) successfully added. [root@robin ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp7s0f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master team-system portid 0100000000000000000000363635384643 state DOWN qlen 1000 link/ether 06:17:a3:22:24:0e brd ff:ff:ff:ff:ff:ff 3: enp7s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team-system portid 0200000000000000000000363635384643 state UP qlen 1000 link/ether 06:17:a3:22:24:0e brd ff:ff:ff:ff:ff:ff 4: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether e4:11:5b:dd:d6:06 brd ff:ff:ff:ff:ff:ff inet 10.19.15.27/24 brd 10.19.15.255 scope global dynamic enp5s0f0 valid_lft 55626sec preferred_lft 55626sec inet6 2620:52:0:130b:e611:5bff:fedd:d606/64 scope global noprefixroute dynamic valid_lft 2591690sec preferred_lft 604490sec inet6 fe80::e611:5bff:fedd:d606/64 scope link valid_lft forever preferred_lft forever 5: enp5s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether e4:11:5b:dd:d6:07 brd ff:ff:ff:ff:ff:ff 6: team-system: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 06:17:a3:22:24:0e brd ff:ff:ff:ff:ff:ff inet 192.168.1.21/24 brd 192.168.1.255 scope global dynamic team-system valid_lft 86399sec preferred_lft 86399sec inet6 fe80::d2f0:507e:7ae:54e8/64 scope link valid_lft forever preferred_lft forever [root@robin ~]# nmcli conn show NAME UUID TYPE DEVICE enp5s0f0 0d8fa60c-6adc-4f02-93b6-cdddec4f32b4 802-3-ethernet enp5s0f0 team-system 4edc41ca-638e-45e1-98d2-35aaf6d99a77 team team-system team-system-port1 e45f727c-e76b-4cd1-9130-a70b02771c88 802-3-ethernet enp7s0f0 team-system-port2 331f2ab8-fc71-406d-9014-3b415a8f877e 802-3-ethernet enp7s0f1 enp5s0f1 f1343197-234b-4d74-98c2-ba0a8b7b7b28 802-3-ethernet -- enp7s0f0 18f9a655-33da-46d2-812b-02db80edda60 802-3-ethernet -- enp7s0f1 53da974f-b801-4f7d-bd9e-9a031cd13b55 802-3-ethernet -- [root@robin ~]# [root@robin ~]# [root@robin ~]# ip addr show team-system 6: team-system: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 06:17:a3:22:24:0e brd ff:ff:ff:ff:ff:ff inet 192.168.1.21/24 brd 192.168.1.255 scope global dynamic team-system valid_lft 86371sec preferred_lft 86371sec inet6 2001::875a:5013:bf28:e174/64 scope global noprefixroute dynamic valid_lft 2591973sec preferred_lft 604773sec inet6 fe80::d2f0:507e:7ae:54e8/64 scope link valid_lft forever preferred_lft forever [root@robin ~]# teamnl team-system option | grep -E "link|enabled" user_linkup_enabled (port:enp7s0f1) false user_linkup (port:enp7s0f1) true enabled (port:enp7s0f1) true user_linkup_enabled (port:enp7s0f0) false user_linkup (port:enp7s0f0) false enabled (port:enp7s0f0) false [root@robin ~]# ping -c 5 192.168.1.20 PING 192.168.1.20 (192.168.1.20) 56(84) bytes of data. 64 bytes from 192.168.1.20: icmp_seq=1 ttl=64 time=0.244 ms 64 bytes from 192.168.1.20: icmp_seq=2 ttl=64 time=0.101 ms 64 bytes from 192.168.1.20: icmp_seq=3 ttl=64 time=0.158 ms 64 bytes from 192.168.1.20: icmp_seq=4 ttl=64 time=0.096 ms 64 bytes from 192.168.1.20: icmp_seq=5 ttl=64 time=0.151 ms --- 192.168.1.20 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 3999ms rtt min/avg/max/mdev = 0.096/0.150/0.244/0.053 ms Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2201 |