Bug 1920398 - RFE: Support activating team and bond ports in a specific order
Summary: RFE: Support activating team and bond ports in a specific order
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.6
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: beta
: 8.8
Assignee: Fernando F. Mancera
QA Contact: Desktop QE
Marc Muehlfeld
URL:
Whiteboard:
Depends On:
Blocks: 2152304
TreeView+ depends on / blocked
 
Reported: 2021-01-26 09:02 UTC by Fani Orestiadou
Modified: 2023-03-15 07:28 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.NetworkManager does not support activating bond and team ports in a specific order NetworkManager activates interfaces alphabetically by interface names. However, if an interface appears later during the boot, for example, because the kernel needs more time to discover it, NetworkManager activates this interface later. NetworkManager does not support setting a priority on bond and team ports. Consequently, the order in which NetworkManager activates ports of these devices is not always predictable. To work around this problem, write a dispatcher script. For an example of such a script, see the corresponding link:https://bugzilla.redhat.com/show_bug.cgi?id=1920398#c34[comment] in the ticket.
Clone Of:
: 2152304 (view as bug list)
Environment:
Last Closed:
Type: Feature Request
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker NMT-17 0 None None None 2023-01-20 11:51:29 UTC
Red Hat Knowledge Base (Solution) 5980411 0 None None None 2021-04-21 00:48:09 UTC

Description Fani Orestiadou 2021-01-26 09:02:51 UTC
Description of problem: 
Port options " sticky, priority " are not respected during the boot on active backup teaming configuration. 

I would like to use the equivalent bonding options primary=iface, primary_reselect=failure for my teaming. I need ens224 interface to be the primary port and if its link is lost, ens192 will take over and remain the active port even when link of ens224 is recovered. Using the ports.PORTIFNAME.prio and ports.PORTIFNAME.sticky to set up the configuration I need: 

[root@fastvm-rhel-7-8-171 ~]# teamdctl team0 config dump
{
    "device": "team0",
    "link_watch": {
        "name": "ethtool"
    },
    "mcast_rejoin": {
        "count": 1
    },
    "notify_peers": {
        "count": 1
    },
    "ports": {
        "ens192": {
            "prio": -100,
            "sticky": true
        },
        "ens224": {
            "prio": 100
        }
    },
    "runner": {
        "name": "activebackup"
    }
}

During boot the priority value is not respected if ens192 will come up first. 
It becomes the active port and since the sticky option is enabled it remains the primary one.  


[root@fastvm-rhel-7-8-171 ~]# teamdctl team0 state               
setup:
  runner: activebackup
ports:
  ens192
    link watches:
      link summary: up
      instance[link_watch_0]:
        name: ethtool
        link: up
        down count: 0
  ens224
    link watches:
      link summary: up
      instance[link_watch_0]:
        name: ethtool
        link: up
        down count: 0
runner:
  active port: ens192

Version-Release number of selected component (if applicable):
RHEL 7.8 , 3.10.0-1127.13.1.el7.x86_64

How reproducible:
Always


Steps to Reproduce:
1. Create team configuration and set priority and sticky options to the ports 
2. Reboot and check which port is the active one 

Actual results:
The "backup" interface becomes the active port because it gets initialized first

Expected results:
The interface with the higher priority should be the active one if its link is up. 
The sticky option should have "lower effect" versus the priority option during boot.

Comment 13 Beniamino Galvani 2021-02-10 08:10:16 UTC
Hi,

at boot NM first activates connections for physical interfaces, in order of ifindex (RHEL7) or interface name (RHEL8). Then it activates connections for virtual interfaces in order of autoconnect-priority, then timestamp of last activation, then UUID.

This also determines the order in which ports are attached to the team. Note that when the first team port is activated, the controlling team is also activated at the same time.

Comment 14 Xin Long 2021-02-22 05:04:58 UTC
(In reply to Beniamino Galvani from comment #13)
> Hi,
> 
> at boot NM first activates connections for physical interfaces, in order of
> ifindex (RHEL7) or interface name (RHEL8). Then it activates connections for
> virtual interfaces in order of autoconnect-priority, then timestamp of last
> activation, then UUID.
> 
> This also determines the order in which ports are attached to the team. Note
> that when the first team port is activated, the controlling team is also
> activated at the same time.
I see, thanks Beniamino,

Do you think it's possible to activate/attach the team ports in the order of 'prio', like:

    "ports": {
        "ens192": {
            "prio": -100,
            "sticky": true
        },
        "ens224": {
            "prio": 100
        }
    },

always makes ens224 is activated/attached earlier than ens192?

only when these ports have the same prio, go back to the old order rules(ifindex/name...)

Comment 15 Beniamino Galvani 2021-02-22 08:22:53 UTC

(In reply to Xin Long from comment #14)
> Do you think it's possible to activate/attach the team ports in the order of
> 'prio', like:

The current order is best-effort, because it is true that NM activates
interfaces ordered by interface name, but if a interface appears a bit
later during boot (because the kernel takes longer to discover it, or
because of udev), it will be activated last anyway independently of
the name. The same would happen for the priority.

For this reason, when you need a feature that depends on the order of
activation (for example, a fixed MAC on the bridge/bond/team), it is
always suggested to set that property directly in the connection
profile (in the example, the cloned-mac-address property on the
master). In other words, you should account for the fact that
interfaces can come up in any order.

Okay, in most cases this problem is probably not visible as all
devices are discovered quickly before NM starts.

In theses cases, changing the order of activation could break the user
expectation. It is quite common to rely on the fact that a
bridge/bond/team inherits the MAC address of the first port added,
which is until now the one with alphabetically "lower" interface
name. Changing the order means that the master will get a different
MAC and potentially a different IP address.

Maybe we could add a new NM property for team devices, like
"team.autoconnect-slaves-order" which specifies the order when the
team itself has the property "connection.autoconnect-slaves" set. Then
you would need to disable autoconnect for the team ports connection,
because the team will connect them when it's being activated.

Or, this ordering by priority could be just the default as you said
and we could ignore the slight change in behavior. If users have a
port with higher priority they also probably want to have it added
first.

What do others think?

Comment 16 Xin Long 2021-02-23 10:52:15 UTC
(In reply to Beniamino Galvani from comment #15)
> 
> (In reply to Xin Long from comment #14)
> > Do you think it's possible to activate/attach the team ports in the order of
> > 'prio', like:
> 
> The current order is best-effort, because it is true that NM activates
> interfaces ordered by interface name, but if a interface appears a bit
> later during boot (because the kernel takes longer to discover it, or
> because of udev), it will be activated last anyway independently of
> the name. The same would happen for the priority.
> 
> For this reason, when you need a feature that depends on the order of
> activation (for example, a fixed MAC on the bridge/bond/team), it is
> always suggested to set that property directly in the connection
> profile (in the example, the cloned-mac-address property on the
> master). In other words, you should account for the fact that
> interfaces can come up in any order.
True.

> 
> Okay, in most cases this problem is probably not visible as all
> devices are discovered quickly before NM starts.
> 
> In theses cases, changing the order of activation could break the user
> expectation. It is quite common to rely on the fact that a
> bridge/bond/team inherits the MAC address of the first port added,
> which is until now the one with alphabetically "lower" interface
> name. Changing the order means that the master will get a different
> MAC and potentially a different IP address.
> 
> Maybe we could add a new NM property for team devices, like
> "team.autoconnect-slaves-order" which specifies the order when the
> team itself has the property "connection.autoconnect-slaves" set. Then
> you would need to disable autoconnect for the team ports connection,
> because the team will connect them when it's being activated.
> 
> Or, this ordering by priority could be just the default as you said
> and we could ignore the slight change in behavior. If users have a
> port with higher priority they also probably want to have it added
> first.
> 
> What do others think?
so now when the port is activated/up, it will be added to the team device right? like:

 ifcfg-team0  ifcfg-team0-port1  ifcfg-team0-port2

when port1 is activated/up, and it will get ${TEAM_MASTER}(team0) from ifcfg-team0-port1, then read ifcfg-team0 and activate team0, and add port1 to team0.
when there is connection.autoconnect-slaves, after reading ifcfg-team0 and get ${autoconnect-slaves} list, and read ifcfg-xxx and add xxx one by one, right?
that means each time when prio is changed in one ifcfg-xxx port, it needs to get all ports info and modify ifcfg-team0.

Maybe we should fix it in libteam: when a new port with higher prio is added, this port will become the active one even if the current active port has sticky option. Not sure if it will break a user's behaviour.

Comment 17 Beniamino Galvani 2021-02-23 13:30:30 UTC
> so now when the port is activated/up, it will be added to the team device right? like:

  ifcfg-team0  ifcfg-team0-port1  ifcfg-team0-port2

> when port1 is activated/up, and it will get ${TEAM_MASTER}(team0) from
> ifcfg-team0-port1, then read ifcfg-team0 and activate team0, and add
> port1 to team0.

Yes, the idea is right. With the exception that NM doesn't read ifcfg
as they are needed. It load them all together at boot (or after a
"nmcli connection reload") and only after it starts activating them.

> when there is connection.autoconnect-slaves, after
> reading ifcfg-team0 and get ${autoconnect-slaves} list, and read
> ifcfg-xxx and add xxx one by one, right?  that means each time when
> prio is changed in one ifcfg-xxx port, it needs to get all ports info
> and modify ifcfg-team0.

When the master has autoconnect-slaves=1, NM builds a list of candidate
ports (without reading again the files, because all the
connections are already in memory) and starts them.

If the priority gets changed in one connection file, first the user
needs to manually reload connections. But even after that, NM doesn't
apply changes automatically, the user needs to reactivate the
connections. When the user reactivates the master connection, NM will
start again all ports connections based on the new order.

> Maybe we should fix it in libteam: when a new port with higher prio
> is added, this port will become the active one even if the current
> active port has sticky option. Not sure if it will break a user's
> behaviour.

I can't tell for sure, but yeah, it's possible that it will break something.

Comment 18 Xin Long 2021-02-25 04:28:34 UTC
(In reply to Beniamino Galvani from comment #17)
> > so now when the port is activated/up, it will be added to the team device right? like:
> 
>   ifcfg-team0  ifcfg-team0-port1  ifcfg-team0-port2
> 
> > when port1 is activated/up, and it will get ${TEAM_MASTER}(team0) from
> > ifcfg-team0-port1, then read ifcfg-team0 and activate team0, and add
> > port1 to team0.
> 
> Yes, the idea is right. With the exception that NM doesn't read ifcfg
> as they are needed. It load them all together at boot (or after a
> "nmcli connection reload") and only after it starts activating them.
> 
> > when there is connection.autoconnect-slaves, after
> > reading ifcfg-team0 and get ${autoconnect-slaves} list, and read
> > ifcfg-xxx and add xxx one by one, right?  that means each time when
> > prio is changed in one ifcfg-xxx port, it needs to get all ports info
> > and modify ifcfg-team0.
> 
> When the master has autoconnect-slaves=1, NM builds a list of candidate
> ports (without reading again the files, because all the
> connections are already in memory) and starts them.
That sounds cool.

So we will reorder the slave ports list according to their 'prio', and then start/enslave them one by one?

> 
> If the priority gets changed in one connection file, first the user
> needs to manually reload connections. But even after that, NM doesn't
> apply changes automatically, the user needs to reactivate the
> connections. When the user reactivates the master connection, NM will
> start again all ports connections based on the new order.
Got it.

'team.autoconnect-slaves-order' you mentioned above is a bool or a port list?

Comment 19 Beniamino Galvani 2021-02-25 16:23:42 UTC
> So we will reorder the slave ports list according to their 'prio', and then start/enslave them one by one?

Yes, that's the idea.

> > If the priority gets changed in one connection file, first the user
> > needs to manually reload connections. But even after that, NM doesn't
> > apply changes automatically, the user needs to reactivate the
> > connections. When the user reactivates the master connection, NM will
> > start again all ports connections based on the new order.
> Got it.
>
> 'team.autoconnect-slaves-order' you mentioned above is a bool or a port list?

It can be for example an enum with values { DEFAULT, IFNAME, PRIORITY
, ...}, where DEFAULT selects the value configured in NM.conf, if any,
otherwise it means IFNAME. In that way, you can define a global value
for all connections.

Note that if we add this property, users who want this order at boot
need to disable autoconnect for all the slave connections, so that the
team connection will decide the right order when bringing them up.
Otherwise if slaves can autoconnect autonomously, the order in
team.autoconnect-slaves-order will not be respected.

Comment 20 Xin Long 2021-02-26 03:04:31 UTC
(In reply to Beniamino Galvani from comment #19)
> > So we will reorder the slave ports list according to their 'prio', and then start/enslave them one by one?
> 
> Yes, that's the idea.
great!

> 
> > > If the priority gets changed in one connection file, first the user
> > > needs to manually reload connections. But even after that, NM doesn't
> > > apply changes automatically, the user needs to reactivate the
> > > connections. When the user reactivates the master connection, NM will
> > > start again all ports connections based on the new order.
> > Got it.
> >
> > 'team.autoconnect-slaves-order' you mentioned above is a bool or a port list?
> 
> It can be for example an enum with values { DEFAULT, IFNAME, PRIORITY
> , ...}, where DEFAULT selects the value configured in NM.conf, if any,
> otherwise it means IFNAME. In that way, you can define a global value
> for all connections.
make sense.

> 
> Note that if we add this property, users who want this order at boot
> need to disable autoconnect for all the slave connections, so that the
> team connection will decide the right order when bringing them up.
> Otherwise if slaves can autoconnect autonomously, the order in
> team.autoconnect-slaves-order will not be respected.
OK, I think this is the best way I could see by now.

Hi, Fani, does this look good to you?

Comment 21 Fani Orestiadou 2021-02-26 07:57:45 UTC
(In reply to Xin Long from comment #20)
> (In reply to Beniamino Galvani from comment #19)
> > > So we will reorder the slave ports list according to their 'prio', and then start/enslave them one by one?
> > 
> > Yes, that's the idea.
> great!
> 
> > 
> > > > If the priority gets changed in one connection file, first the user
> > > > needs to manually reload connections. But even after that, NM doesn't
> > > > apply changes automatically, the user needs to reactivate the
> > > > connections. When the user reactivates the master connection, NM will
> > > > start again all ports connections based on the new order.
> > > Got it.
> > >
> > > 'team.autoconnect-slaves-order' you mentioned above is a bool or a port list?
> > 
> > It can be for example an enum with values { DEFAULT, IFNAME, PRIORITY
> > , ...}, where DEFAULT selects the value configured in NM.conf, if any,
> > otherwise it means IFNAME. In that way, you can define a global value
> > for all connections.
> make sense.
> 
> > 
> > Note that if we add this property, users who want this order at boot
> > need to disable autoconnect for all the slave connections, so that the
> > team connection will decide the right order when bringing them up.
> > Otherwise if slaves can autoconnect autonomously, the order in
> > team.autoconnect-slaves-order will not be respected.
> OK, I think this is the best way I could see by now.
> 
> Hi, Fani, does this look good to you?

Hello Xin, 

Sounds good to me!

Comment 22 Marcelo Ricardo Leitner 2021-03-29 15:00:36 UTC
Is there anything left to be done on this bz?
By last comments it seems it can be done with just NM config options.

Comment 23 Xin Long 2021-04-11 22:23:28 UTC
(In reply to Marcelo Ricardo Leitner from comment #22)
> Is there anything left to be done on this bz?
> By last comments it seems it can be done with just NM config options.

I think some change will be needed in NM-team source code, and I will park this to Beniamin and let him give it a try.

Thanks.

Comment 24 Jamie Bainbridge 2021-04-21 00:06:28 UTC
To add to this request, could this new "autoconnect-slaves-order" property be generic across all connection types which can have a secondary?

We have a customer who is using bonding with "primary_reselect" to achieve what's described in this bug.

That just happens to work for their specific system's boot order and network interfaces but, as said in Comment 15, the device discovery order isn't designed or expected to be consistent so really this could break at any time.

Comment 28 Gris Ge 2021-06-24 04:21:17 UTC
Hi Fani,

From my point of view, this feature does not fit well in RHEL 7 considering late maintenance stage of RHEL 7 and we can work around it.

To work around this issue for Bond/Team/Bridge/etc where activation order matters, I would suggest(using bond0/etc1/eth2 as example):
 * Set `connection.autoconnect-slaves=no` for bond0 connection.
 * Set `connection.autoconnect=no` for eth1 and eth2 connection.
 * Create a NetworkManager dispatch script for bond0 up action: activate eth1 and eth2 in preferred order according to teamd/bond/bridge config.

Could you reach customer with this workaround(after tested in lab) and convince them to change this bug to RHEL 8?

In RHEL 8/9, for this feature, my summery on previous comments could be:

 * Set eth1/eth2 as `connection.autoconnect=no`
 * Set eth1 as `connection.slave_activation_priority=100` as primary interface. (or maybe in more inclusive language like `port_activation_priority`).
 * Set eth2 as `connection.slave_activation_priority=0` as backup interface.
 * When bond0/team0 activating, NM will honor the `slave_activation_priority` to reorder the auto activation on eth1/eth2.

Comment 31 Fani Orestiadou 2021-06-28 09:44:58 UTC
Hello, 

(In reply to Gris Ge from comment #28)
> Hi Fani,
> 
> From my point of view, this feature does not fit well in RHEL 7 considering
> late maintenance stage of RHEL 7 and we can work around it.
> 
> To work around this issue for Bond/Team/Bridge/etc where activation order
> matters, I would suggest(using bond0/etc1/eth2 as example):
>  * Set `connection.autoconnect-slaves=no` for bond0 connection.
>  * Set `connection.autoconnect=no` for eth1 and eth2 connection.
>  * Create a NetworkManager dispatch script for bond0 up action: activate
> eth1 and eth2 in preferred order according to teamd/bond/bridge config.
> 

Could you please pass me the dispatch script that you have in mind?  

Thank you!

Comment 34 Gris Ge 2021-06-30 05:18:44 UTC
Hi Fani Orestiadou,

Below script demonstrate how it works. I have tested in my CentOS 7 VM.

############################
#!/bin/bash

## Assuming eth2 is preferred port, eth1 is standby port.

nmcli c add type team ifname team0 connection.id team0 \
    team.config '{ "runner" : {  "name" : "activebackup" }, "link_watch" : {  "name" : "ethtool" } }'  \
    ipv4.method disabled ipv6.method ignore \
    connection.autoconnect-slaves no connection.autoconnect yes

nmcli c add type ethernet ifname eth1 connection.id eth1 \
    connection.autoconnect no \
    connection.master team0 connection.slave-type team

nmcli c modify eth1 team.config '{"prio" : -100, "sticky" : true }'

nmcli c add type ethernet ifname eth2 connection.id eth2 \
    connection.autoconnect no \
    connection.master team0 connection.slave-type team
nmcli c modify eth2 team.config '{"prio" : 100 }'

echo '
#!/bin/bash

IFACE_NAME=$1
ACTION=$2

if [ "$ACTION" = "up" ] && [ "$IFACE_NAME" = "team0" ];then
    nmcli c up eth2
    nmcli c up eth1
fi' > /etc/NetworkManager/dispatcher.d/99-team0.sh

chmod 700  /etc/NetworkManager/dispatcher.d/99-team0.sh
chown root:root /etc/NetworkManager/dispatcher.d/99-team0.sh

nmcli c down team0
nmcli c up team0
sleep 5

teamdctl team0 config dump
teamdctl team0 state


############################


This dispatch script is hard-coding with eth2/eth1 order.
The better way would be improve the dispatch script to parse team.config and reorder the activation.
Or course, the ideal solution is NetworkManager order the auto-activation base on team.config.

Comment 37 Fani Orestiadou 2021-06-30 09:09:34 UTC
(In reply to Gris Ge from comment #34)
> Hi Fani Orestiadou,
> 
> Below script demonstrate how it works. I have tested in my CentOS 7 VM.
> 
> ############################
> #!/bin/bash
> 
> ## Assuming eth2 is preferred port, eth1 is standby port.
> 
> nmcli c add type team ifname team0 connection.id team0 \
>     team.config '{ "runner" : {  "name" : "activebackup" }, "link_watch" : {
> "name" : "ethtool" } }'  \
>     ipv4.method disabled ipv6.method ignore \
>     connection.autoconnect-slaves no connection.autoconnect yes
> 
> nmcli c add type ethernet ifname eth1 connection.id eth1 \
>     connection.autoconnect no \
>     connection.master team0 connection.slave-type team
> 
> nmcli c modify eth1 team.config '{"prio" : -100, "sticky" : true }'
> 
> nmcli c add type ethernet ifname eth2 connection.id eth2 \
>     connection.autoconnect no \
>     connection.master team0 connection.slave-type team
> nmcli c modify eth2 team.config '{"prio" : 100 }'
> 
> echo '
> #!/bin/bash
> 
> IFACE_NAME=$1
> ACTION=$2
> 
> if [ "$ACTION" = "up" ] && [ "$IFACE_NAME" = "team0" ];then
>     nmcli c up eth2
>     nmcli c up eth1
> fi' > /etc/NetworkManager/dispatcher.d/99-team0.sh
> 
> chmod 700  /etc/NetworkManager/dispatcher.d/99-team0.sh
> chown root:root /etc/NetworkManager/dispatcher.d/99-team0.sh
> 
> nmcli c down team0
> nmcli c up team0
> sleep 5
> 
> teamdctl team0 config dump
> teamdctl team0 state
> 
> 
> ############################
> 
> 
> This dispatch script is hard-coding with eth2/eth1 order.
> The better way would be improve the dispatch script to parse team.config and
> reorder the activation.
> Or course, the ideal solution is NetworkManager order the auto-activation
> base on team.config.

Thank you, I just tested it on RHE7 and it works for me too. 
I will contact the customer and report back as soon as I have his inputs.

Comment 42 Fani Orestiadou 2021-07-10 07:50:02 UTC
> Could you reach customer with this workaround(after tested in lab) and
> convince them to change this bug to RHEL 8?
> 
> In RHEL 8/9, for this feature, my summery on previous comments could be:
> 
>  * Set eth1/eth2 as `connection.autoconnect=no`
>  * Set eth1 as `connection.slave_activation_priority=100` as primary
> interface. (or maybe in more inclusive language like
> `port_activation_priority`).
>  * Set eth2 as `connection.slave_activation_priority=0` as backup interface.
>  * When bond0/team0 activating, NM will honor the
> `slave_activation_priority` to reorder the auto activation on eth1/eth2.

Hello Gris, 

I have agreed with the customer to change this feature to RHEL 8! 
Feel free to let me know if anything else is needed at the moment from our side. 

Thank you
Fani

Comment 45 Gris Ge 2021-07-29 04:27:44 UTC
Changing to RHEL 8 per customer's request.

Will review it during RHEL 8.6 planning.


Note You need to log in before you can comment on or make changes to this bug.