Hide Forgot
Description of problem: Port options " sticky, priority " are not respected during the boot on active backup teaming configuration. I would like to use the equivalent bonding options primary=iface, primary_reselect=failure for my teaming. I need ens224 interface to be the primary port and if its link is lost, ens192 will take over and remain the active port even when link of ens224 is recovered. Using the ports.PORTIFNAME.prio and ports.PORTIFNAME.sticky to set up the configuration I need: [root@fastvm-rhel-7-8-171 ~]# teamdctl team0 config dump { "device": "team0", "link_watch": { "name": "ethtool" }, "mcast_rejoin": { "count": 1 }, "notify_peers": { "count": 1 }, "ports": { "ens192": { "prio": -100, "sticky": true }, "ens224": { "prio": 100 } }, "runner": { "name": "activebackup" } } During boot the priority value is not respected if ens192 will come up first. It becomes the active port and since the sticky option is enabled it remains the primary one. [root@fastvm-rhel-7-8-171 ~]# teamdctl team0 state setup: runner: activebackup ports: ens192 link watches: link summary: up instance[link_watch_0]: name: ethtool link: up down count: 0 ens224 link watches: link summary: up instance[link_watch_0]: name: ethtool link: up down count: 0 runner: active port: ens192 Version-Release number of selected component (if applicable): RHEL 7.8 , 3.10.0-1127.13.1.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. Create team configuration and set priority and sticky options to the ports 2. Reboot and check which port is the active one Actual results: The "backup" interface becomes the active port because it gets initialized first Expected results: The interface with the higher priority should be the active one if its link is up. The sticky option should have "lower effect" versus the priority option during boot.
Hi, at boot NM first activates connections for physical interfaces, in order of ifindex (RHEL7) or interface name (RHEL8). Then it activates connections for virtual interfaces in order of autoconnect-priority, then timestamp of last activation, then UUID. This also determines the order in which ports are attached to the team. Note that when the first team port is activated, the controlling team is also activated at the same time.
(In reply to Beniamino Galvani from comment #13) > Hi, > > at boot NM first activates connections for physical interfaces, in order of > ifindex (RHEL7) or interface name (RHEL8). Then it activates connections for > virtual interfaces in order of autoconnect-priority, then timestamp of last > activation, then UUID. > > This also determines the order in which ports are attached to the team. Note > that when the first team port is activated, the controlling team is also > activated at the same time. I see, thanks Beniamino, Do you think it's possible to activate/attach the team ports in the order of 'prio', like: "ports": { "ens192": { "prio": -100, "sticky": true }, "ens224": { "prio": 100 } }, always makes ens224 is activated/attached earlier than ens192? only when these ports have the same prio, go back to the old order rules(ifindex/name...)
(In reply to Xin Long from comment #14) > Do you think it's possible to activate/attach the team ports in the order of > 'prio', like: The current order is best-effort, because it is true that NM activates interfaces ordered by interface name, but if a interface appears a bit later during boot (because the kernel takes longer to discover it, or because of udev), it will be activated last anyway independently of the name. The same would happen for the priority. For this reason, when you need a feature that depends on the order of activation (for example, a fixed MAC on the bridge/bond/team), it is always suggested to set that property directly in the connection profile (in the example, the cloned-mac-address property on the master). In other words, you should account for the fact that interfaces can come up in any order. Okay, in most cases this problem is probably not visible as all devices are discovered quickly before NM starts. In theses cases, changing the order of activation could break the user expectation. It is quite common to rely on the fact that a bridge/bond/team inherits the MAC address of the first port added, which is until now the one with alphabetically "lower" interface name. Changing the order means that the master will get a different MAC and potentially a different IP address. Maybe we could add a new NM property for team devices, like "team.autoconnect-slaves-order" which specifies the order when the team itself has the property "connection.autoconnect-slaves" set. Then you would need to disable autoconnect for the team ports connection, because the team will connect them when it's being activated. Or, this ordering by priority could be just the default as you said and we could ignore the slight change in behavior. If users have a port with higher priority they also probably want to have it added first. What do others think?
(In reply to Beniamino Galvani from comment #15) > > (In reply to Xin Long from comment #14) > > Do you think it's possible to activate/attach the team ports in the order of > > 'prio', like: > > The current order is best-effort, because it is true that NM activates > interfaces ordered by interface name, but if a interface appears a bit > later during boot (because the kernel takes longer to discover it, or > because of udev), it will be activated last anyway independently of > the name. The same would happen for the priority. > > For this reason, when you need a feature that depends on the order of > activation (for example, a fixed MAC on the bridge/bond/team), it is > always suggested to set that property directly in the connection > profile (in the example, the cloned-mac-address property on the > master). In other words, you should account for the fact that > interfaces can come up in any order. True. > > Okay, in most cases this problem is probably not visible as all > devices are discovered quickly before NM starts. > > In theses cases, changing the order of activation could break the user > expectation. It is quite common to rely on the fact that a > bridge/bond/team inherits the MAC address of the first port added, > which is until now the one with alphabetically "lower" interface > name. Changing the order means that the master will get a different > MAC and potentially a different IP address. > > Maybe we could add a new NM property for team devices, like > "team.autoconnect-slaves-order" which specifies the order when the > team itself has the property "connection.autoconnect-slaves" set. Then > you would need to disable autoconnect for the team ports connection, > because the team will connect them when it's being activated. > > Or, this ordering by priority could be just the default as you said > and we could ignore the slight change in behavior. If users have a > port with higher priority they also probably want to have it added > first. > > What do others think? so now when the port is activated/up, it will be added to the team device right? like: ifcfg-team0 ifcfg-team0-port1 ifcfg-team0-port2 when port1 is activated/up, and it will get ${TEAM_MASTER}(team0) from ifcfg-team0-port1, then read ifcfg-team0 and activate team0, and add port1 to team0. when there is connection.autoconnect-slaves, after reading ifcfg-team0 and get ${autoconnect-slaves} list, and read ifcfg-xxx and add xxx one by one, right? that means each time when prio is changed in one ifcfg-xxx port, it needs to get all ports info and modify ifcfg-team0. Maybe we should fix it in libteam: when a new port with higher prio is added, this port will become the active one even if the current active port has sticky option. Not sure if it will break a user's behaviour.
> so now when the port is activated/up, it will be added to the team device right? like: ifcfg-team0 ifcfg-team0-port1 ifcfg-team0-port2 > when port1 is activated/up, and it will get ${TEAM_MASTER}(team0) from > ifcfg-team0-port1, then read ifcfg-team0 and activate team0, and add > port1 to team0. Yes, the idea is right. With the exception that NM doesn't read ifcfg as they are needed. It load them all together at boot (or after a "nmcli connection reload") and only after it starts activating them. > when there is connection.autoconnect-slaves, after > reading ifcfg-team0 and get ${autoconnect-slaves} list, and read > ifcfg-xxx and add xxx one by one, right? that means each time when > prio is changed in one ifcfg-xxx port, it needs to get all ports info > and modify ifcfg-team0. When the master has autoconnect-slaves=1, NM builds a list of candidate ports (without reading again the files, because all the connections are already in memory) and starts them. If the priority gets changed in one connection file, first the user needs to manually reload connections. But even after that, NM doesn't apply changes automatically, the user needs to reactivate the connections. When the user reactivates the master connection, NM will start again all ports connections based on the new order. > Maybe we should fix it in libteam: when a new port with higher prio > is added, this port will become the active one even if the current > active port has sticky option. Not sure if it will break a user's > behaviour. I can't tell for sure, but yeah, it's possible that it will break something.
(In reply to Beniamino Galvani from comment #17) > > so now when the port is activated/up, it will be added to the team device right? like: > > ifcfg-team0 ifcfg-team0-port1 ifcfg-team0-port2 > > > when port1 is activated/up, and it will get ${TEAM_MASTER}(team0) from > > ifcfg-team0-port1, then read ifcfg-team0 and activate team0, and add > > port1 to team0. > > Yes, the idea is right. With the exception that NM doesn't read ifcfg > as they are needed. It load them all together at boot (or after a > "nmcli connection reload") and only after it starts activating them. > > > when there is connection.autoconnect-slaves, after > > reading ifcfg-team0 and get ${autoconnect-slaves} list, and read > > ifcfg-xxx and add xxx one by one, right? that means each time when > > prio is changed in one ifcfg-xxx port, it needs to get all ports info > > and modify ifcfg-team0. > > When the master has autoconnect-slaves=1, NM builds a list of candidate > ports (without reading again the files, because all the > connections are already in memory) and starts them. That sounds cool. So we will reorder the slave ports list according to their 'prio', and then start/enslave them one by one? > > If the priority gets changed in one connection file, first the user > needs to manually reload connections. But even after that, NM doesn't > apply changes automatically, the user needs to reactivate the > connections. When the user reactivates the master connection, NM will > start again all ports connections based on the new order. Got it. 'team.autoconnect-slaves-order' you mentioned above is a bool or a port list?
> So we will reorder the slave ports list according to their 'prio', and then start/enslave them one by one? Yes, that's the idea. > > If the priority gets changed in one connection file, first the user > > needs to manually reload connections. But even after that, NM doesn't > > apply changes automatically, the user needs to reactivate the > > connections. When the user reactivates the master connection, NM will > > start again all ports connections based on the new order. > Got it. > > 'team.autoconnect-slaves-order' you mentioned above is a bool or a port list? It can be for example an enum with values { DEFAULT, IFNAME, PRIORITY , ...}, where DEFAULT selects the value configured in NM.conf, if any, otherwise it means IFNAME. In that way, you can define a global value for all connections. Note that if we add this property, users who want this order at boot need to disable autoconnect for all the slave connections, so that the team connection will decide the right order when bringing them up. Otherwise if slaves can autoconnect autonomously, the order in team.autoconnect-slaves-order will not be respected.
(In reply to Beniamino Galvani from comment #19) > > So we will reorder the slave ports list according to their 'prio', and then start/enslave them one by one? > > Yes, that's the idea. great! > > > > If the priority gets changed in one connection file, first the user > > > needs to manually reload connections. But even after that, NM doesn't > > > apply changes automatically, the user needs to reactivate the > > > connections. When the user reactivates the master connection, NM will > > > start again all ports connections based on the new order. > > Got it. > > > > 'team.autoconnect-slaves-order' you mentioned above is a bool or a port list? > > It can be for example an enum with values { DEFAULT, IFNAME, PRIORITY > , ...}, where DEFAULT selects the value configured in NM.conf, if any, > otherwise it means IFNAME. In that way, you can define a global value > for all connections. make sense. > > Note that if we add this property, users who want this order at boot > need to disable autoconnect for all the slave connections, so that the > team connection will decide the right order when bringing them up. > Otherwise if slaves can autoconnect autonomously, the order in > team.autoconnect-slaves-order will not be respected. OK, I think this is the best way I could see by now. Hi, Fani, does this look good to you?
(In reply to Xin Long from comment #20) > (In reply to Beniamino Galvani from comment #19) > > > So we will reorder the slave ports list according to their 'prio', and then start/enslave them one by one? > > > > Yes, that's the idea. > great! > > > > > > > If the priority gets changed in one connection file, first the user > > > > needs to manually reload connections. But even after that, NM doesn't > > > > apply changes automatically, the user needs to reactivate the > > > > connections. When the user reactivates the master connection, NM will > > > > start again all ports connections based on the new order. > > > Got it. > > > > > > 'team.autoconnect-slaves-order' you mentioned above is a bool or a port list? > > > > It can be for example an enum with values { DEFAULT, IFNAME, PRIORITY > > , ...}, where DEFAULT selects the value configured in NM.conf, if any, > > otherwise it means IFNAME. In that way, you can define a global value > > for all connections. > make sense. > > > > > Note that if we add this property, users who want this order at boot > > need to disable autoconnect for all the slave connections, so that the > > team connection will decide the right order when bringing them up. > > Otherwise if slaves can autoconnect autonomously, the order in > > team.autoconnect-slaves-order will not be respected. > OK, I think this is the best way I could see by now. > > Hi, Fani, does this look good to you? Hello Xin, Sounds good to me!
Is there anything left to be done on this bz? By last comments it seems it can be done with just NM config options.
(In reply to Marcelo Ricardo Leitner from comment #22) > Is there anything left to be done on this bz? > By last comments it seems it can be done with just NM config options. I think some change will be needed in NM-team source code, and I will park this to Beniamin and let him give it a try. Thanks.
To add to this request, could this new "autoconnect-slaves-order" property be generic across all connection types which can have a secondary? We have a customer who is using bonding with "primary_reselect" to achieve what's described in this bug. That just happens to work for their specific system's boot order and network interfaces but, as said in Comment 15, the device discovery order isn't designed or expected to be consistent so really this could break at any time.
Hi Fani, From my point of view, this feature does not fit well in RHEL 7 considering late maintenance stage of RHEL 7 and we can work around it. To work around this issue for Bond/Team/Bridge/etc where activation order matters, I would suggest(using bond0/etc1/eth2 as example): * Set `connection.autoconnect-slaves=no` for bond0 connection. * Set `connection.autoconnect=no` for eth1 and eth2 connection. * Create a NetworkManager dispatch script for bond0 up action: activate eth1 and eth2 in preferred order according to teamd/bond/bridge config. Could you reach customer with this workaround(after tested in lab) and convince them to change this bug to RHEL 8? In RHEL 8/9, for this feature, my summery on previous comments could be: * Set eth1/eth2 as `connection.autoconnect=no` * Set eth1 as `connection.slave_activation_priority=100` as primary interface. (or maybe in more inclusive language like `port_activation_priority`). * Set eth2 as `connection.slave_activation_priority=0` as backup interface. * When bond0/team0 activating, NM will honor the `slave_activation_priority` to reorder the auto activation on eth1/eth2.
Hello, (In reply to Gris Ge from comment #28) > Hi Fani, > > From my point of view, this feature does not fit well in RHEL 7 considering > late maintenance stage of RHEL 7 and we can work around it. > > To work around this issue for Bond/Team/Bridge/etc where activation order > matters, I would suggest(using bond0/etc1/eth2 as example): > * Set `connection.autoconnect-slaves=no` for bond0 connection. > * Set `connection.autoconnect=no` for eth1 and eth2 connection. > * Create a NetworkManager dispatch script for bond0 up action: activate > eth1 and eth2 in preferred order according to teamd/bond/bridge config. > Could you please pass me the dispatch script that you have in mind? Thank you!
Hi Fani Orestiadou, Below script demonstrate how it works. I have tested in my CentOS 7 VM. ############################ #!/bin/bash ## Assuming eth2 is preferred port, eth1 is standby port. nmcli c add type team ifname team0 connection.id team0 \ team.config '{ "runner" : { "name" : "activebackup" }, "link_watch" : { "name" : "ethtool" } }' \ ipv4.method disabled ipv6.method ignore \ connection.autoconnect-slaves no connection.autoconnect yes nmcli c add type ethernet ifname eth1 connection.id eth1 \ connection.autoconnect no \ connection.master team0 connection.slave-type team nmcli c modify eth1 team.config '{"prio" : -100, "sticky" : true }' nmcli c add type ethernet ifname eth2 connection.id eth2 \ connection.autoconnect no \ connection.master team0 connection.slave-type team nmcli c modify eth2 team.config '{"prio" : 100 }' echo ' #!/bin/bash IFACE_NAME=$1 ACTION=$2 if [ "$ACTION" = "up" ] && [ "$IFACE_NAME" = "team0" ];then nmcli c up eth2 nmcli c up eth1 fi' > /etc/NetworkManager/dispatcher.d/99-team0.sh chmod 700 /etc/NetworkManager/dispatcher.d/99-team0.sh chown root:root /etc/NetworkManager/dispatcher.d/99-team0.sh nmcli c down team0 nmcli c up team0 sleep 5 teamdctl team0 config dump teamdctl team0 state ############################ This dispatch script is hard-coding with eth2/eth1 order. The better way would be improve the dispatch script to parse team.config and reorder the activation. Or course, the ideal solution is NetworkManager order the auto-activation base on team.config.
(In reply to Gris Ge from comment #34) > Hi Fani Orestiadou, > > Below script demonstrate how it works. I have tested in my CentOS 7 VM. > > ############################ > #!/bin/bash > > ## Assuming eth2 is preferred port, eth1 is standby port. > > nmcli c add type team ifname team0 connection.id team0 \ > team.config '{ "runner" : { "name" : "activebackup" }, "link_watch" : { > "name" : "ethtool" } }' \ > ipv4.method disabled ipv6.method ignore \ > connection.autoconnect-slaves no connection.autoconnect yes > > nmcli c add type ethernet ifname eth1 connection.id eth1 \ > connection.autoconnect no \ > connection.master team0 connection.slave-type team > > nmcli c modify eth1 team.config '{"prio" : -100, "sticky" : true }' > > nmcli c add type ethernet ifname eth2 connection.id eth2 \ > connection.autoconnect no \ > connection.master team0 connection.slave-type team > nmcli c modify eth2 team.config '{"prio" : 100 }' > > echo ' > #!/bin/bash > > IFACE_NAME=$1 > ACTION=$2 > > if [ "$ACTION" = "up" ] && [ "$IFACE_NAME" = "team0" ];then > nmcli c up eth2 > nmcli c up eth1 > fi' > /etc/NetworkManager/dispatcher.d/99-team0.sh > > chmod 700 /etc/NetworkManager/dispatcher.d/99-team0.sh > chown root:root /etc/NetworkManager/dispatcher.d/99-team0.sh > > nmcli c down team0 > nmcli c up team0 > sleep 5 > > teamdctl team0 config dump > teamdctl team0 state > > > ############################ > > > This dispatch script is hard-coding with eth2/eth1 order. > The better way would be improve the dispatch script to parse team.config and > reorder the activation. > Or course, the ideal solution is NetworkManager order the auto-activation > base on team.config. Thank you, I just tested it on RHE7 and it works for me too. I will contact the customer and report back as soon as I have his inputs.
> Could you reach customer with this workaround(after tested in lab) and > convince them to change this bug to RHEL 8? > > In RHEL 8/9, for this feature, my summery on previous comments could be: > > * Set eth1/eth2 as `connection.autoconnect=no` > * Set eth1 as `connection.slave_activation_priority=100` as primary > interface. (or maybe in more inclusive language like > `port_activation_priority`). > * Set eth2 as `connection.slave_activation_priority=0` as backup interface. > * When bond0/team0 activating, NM will honor the > `slave_activation_priority` to reorder the auto activation on eth1/eth2. Hello Gris, I have agreed with the customer to change this feature to RHEL 8! Feel free to let me know if anything else is needed at the moment from our side. Thank you Fani
Changing to RHEL 8 per customer's request. Will review it during RHEL 8.6 planning.