Bug 2188963

Summary: active-backup bond with 802.3ad bond as a slave and ONBOOT=no activated at boot time by NetworkManager
Product: Red Hat Enterprise Linux 9 Reporter: Andrew Schorr <ajschorr>
Component: NetworkManagerAssignee: NetworkManager Development Team <nm-team>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: CentOS StreamCC: bgalvani, bstinson, jwboyer, lrintel, rkhan, sfaye, sukulkar, thaller, till
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-19 11:43:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
NetworkManager journal messages
none
NetworkManager TRACE log
none
journalctl boot log none

Description Andrew Schorr 2023-04-23 19:40:49 UTC
Description of problem:
I'm having some problems with a configuration where an active-backup bond has an 802.3ad bond as a slave. Both bonds often end up with a random MAC address after booting due to what seems to be a race condition. See https://bugzilla.redhat.com/show_bug.cgi?id=2188100 . So I thought I could work around the race condition by bringing up the 802.3ad bond first, and then waiting for it to stabilize before bringing up the active-backup bond layered on top. So I edited /etc/sysconfig/network-scripts/ifcfg-bond1 to set ONBOOT=no for bond1 (the active-backup bond), as well as ifcfg-bond1.3 However, after rebooting, I see that NetworkManager started bond1 despite being told not to do so.

Version-Release number of selected component (if applicable):
NetworkManager-1.43.4-1.el9.x86_64


How reproducible:
always

Steps to Reproduce:
1. create 802.3ad bond0 and active-backup bond1 with bond1 ONBOOT=no
2. reboot
3.

Actual results:
Notice that NetworkManager started bond1

Expected results:
It should not start bond1.

Additional info:
I noticed some similarly strange issues in the way NetworkManager is linking these 2 bonds together. When I configure ONBOOT=yes for both bonds, I usually get a random MAC address. I thought I could run 'ifdown bond1; ifdown bond0; ifup bond0' and that this would give bond0 a chance to reassign the MAC address properly from one of its slave ethernet devices. But this behaves strangely. When I run 'ifdown bond1', it also removes the bond0 interface. I don't know why it does  this. And if I then run 'ifup bond0', it automatically activates bond1 as well. Why is that happening? I don't understand the linkage. I see a "connection.autoconnect-slaves" setting, which looks maybe relevant, but it's set to -1, which I think is the default, which should map to 0, shouln't it? What's telling it to auto-disconnect the slaves when I ifdown bond1? And why would it auto-connect the master (implicitly ifup bond1) when I run ifup bond0? This seems flaky to me. Would I get  better results if I migrated to the /etc/NetworkManager/system-connections format? I can't see why it should matter.

Comment 1 Thomas Haller 2023-04-24 08:37:43 UTC
What gives `nmcli connection` and `nmcli device`?

Please provide a logfile of the boot. Get it via `journalctl -u NetworkManager`.

Thank you.

Comment 2 Andrew Schorr 2023-04-24 13:42:25 UTC
Created attachment 1959545 [details]
NetworkManager journal messages

sh-5.1# nmcli connection
NAME          UUID                                  TYPE      DEVICE  
Vlan bond1.3  2682acff-f7af-8f05-0172-f959526caa05  vlan      bond1.3 
lo            baf144b5-9b68-4d65-a64c-4ccc4a11c81e  loopback  lo      
Bond bond1    92306dc1-4142-23de-097b-b1464cfab5ee  bond      bond1   
Bond bond0    ad33d8b0-1f7b-cab9-9447-ba07f855b143  bond      bond0   
System lan0   1969c5f0-fbcf-2bd8-5a7e-7a8a67908f8f  ethernet  lan0    
System lan2   38ac5fa0-df4c-251b-7afb-02fcb93ed035  ethernet  lan2    
System lan3   93ad060f-9feb-77e2-8938-1b17f04e706e  ethernet  lan3    

sh-5.1# nmcli device
DEVICE   TYPE      STATE      CONNECTION   
bond1.3  vlan      connected  Vlan bond1.3 
lo       loopback  connected  lo           
bond1    bond      connected  Bond bond1   
bond0    bond      connected  Bond bond0   
lan0     ethernet  connected  System lan0  
lan2     ethernet  connected  System lan2  
lan3     ethernet  connected  System lan3  
lan1     ethernet  unmanaged  --           
lan4     ethernet  unmanaged  --           
lan5     ethernet  unmanaged  --           

sh-5.1# journalctl -b -u NetworkManager > /tmp/nmboot.txt


I have attached nmboot.txt

In case it matters, I ran 'ifup bond1.3' after booting.

Comment 3 Andrew Schorr 2023-04-24 13:48:15 UTC
To be clear, in case it's confusing from the boot messages, I created a local systemd
service that runs 'ifup bond1.3' immediately after NetworkManager-wait-online.service
finishes and before network-online.target. NetworkManager is (correctly) not starting that
one because I have ONBOOT=no. But it is starting bond1 despite ONBOOT=no:

sh-5.1$ cd /etc/sysconfig/network-scripts/
sh-5.1$ egrep 'ONBOOT|NM_CONTROL' *
ifcfg-bond0:ONBOOT=yes
ifcfg-bond0:NM_CONTROLLED=yes
ifcfg-bond1:ONBOOT=no
ifcfg-bond1:NM_CONTROLLED=yes
ifcfg-bond1.3:ONBOOT=no
ifcfg-bond1.3:NM_CONTROLLED=yes
ifcfg-lan0:ONBOOT=yes
ifcfg-lan0:NM_CONTROLLED=yes
ifcfg-lan1:ONBOOT=no
ifcfg-lan1:NM_CONTROLLED=no
ifcfg-lan2:ONBOOT=yes
ifcfg-lan2:NM_CONTROLLED=yes
ifcfg-lan3:ONBOOT=yes
ifcfg-lan3:NM_CONTROLLED=yes
ifcfg-lan4:ONBOOT=no
ifcfg-lan4:NM_CONTROLLED=no
ifcfg-lan5:ONBOOT=no
ifcfg-lan5:NM_CONTROLLED=no

And here's my kludge to bring up bond1.3:
[Unit]
Description=Bring up stacked bonds after lower bonds have stabilized
After=NetworkManager-wait-online.service network.target
Before=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/sbin/ifup bond1.3

[Install]
WantedBy=multi-user.target

Somehow, this approach is fixing the problem whereby I was getting a random MAC address on the bonds.
Perhaps setting ONBOOT=no is at least delaying the startup of bond1 a bit. It's apparently enough to
give bond0 time to get a valid MAC address. Or so it seems from 5  or so reboots.
But I'd prefer if NetworkManager didn't start bond1 so I could have more control over this situation
and be confident that I can defeat the race condition.

Comment 4 Thomas Haller 2023-04-24 14:40:29 UTC
> But it is starting bond1 despite ONBOOT=no:
>
> [1682283629.2125] policy: auto-activating connection 'System lan0' (1969c5f0-fbcf-2bd8-5a7e-7a8a67908f8f)
> ...
> [1682283629.2789] device (bond1): attached bond port lan0

bond1 is activated already before, because you configured a port profile "System lan0" which is set to autoconnect. (Auto-)activating a port device/profile implies to bring up the controller bond interface.

If you don't want those profiles to auto-activate, then don't configure them to autoconnect.

On top of that, bond0 device has master=bond1:

> <info>  [1682283629.4195] device (bond1): attached bond port bond0

So for that reason also, activating bond0 will cause bond1 to be up.


Adding a systemd service for calling ifup seems not like any solution. If your problem is that an interface doesn't get the right MAC address, then let's not focus on how you'd like to manually connect interfaces (or how an interface is unexecptedly autoconnected), but rather how you can configure your system so that it works.


Configure suitable profiles. See all profiles with `nmcli connection`. See the settings of a profile with `nmcli connection show "$PROFILE_NAME"` (or `nmcli -o connection show "$PROFILE_NAME"). Modify the profile with `nmcli connection modify "$PROFILE_NAME" ...`.

Worst case, if you need a specific MAC address, you could configure it with `nmcli connection modify "$PROFILE_NAME" ethernet.cloned-mac-address "$MAC"`. Otherwise, the MAC address is chosen by kernel, and it depends on the bonding mode ("active-backup"). Consider setting the "primary" bond option.

  nmcli connection modify 'Bond bond1' +bond.options 'primary=lan0'



If you don't get it to work, then share
 - nmcli -f all connection
 - nmcli device
 - ip -d link
 - grep -R ^ /etc/NetworkManager/system-connections /etc/sysconfig/network-scripts
 - collect a complete `level=TRACE` log from a current boot. See DEBUGGING in `man NetworkManager`. You'll find the log in `journalctl -b0` output.

BEWARE the hint in the manual page about sharing private data, before sharing.

Comment 5 Andrew Schorr 2023-04-24 15:19:12 UTC
OK, so you are saying that bringing up a slave interface will always implicitly bring up the master interface.
I guess that maybe makes sense, but it doesn't seem inevitable.

The root cause problem here is that I want an active-backup bond in bond1 with one of its slaves bond0 to be an 802.3ad bond, and NetworkManager has a race condition when bringing up these interfaces. I'm not certain,
but I believe what's going on is that the 802.3ad bond in bond0 takes a while to come up because it needs to
exchange LACP packets with the switch. Until its first slave becomes active, I guess bond0 has a random MAC address
assigned. But bond1 is coming up before bond0 is stabilized, and bond1 steals the MAC address from bond0, which is random at that moment.

I'm not aware of any options I could set anywhere to fix this race condition. The only solution I can see
is to delay starting bond1 until bond0 stabilizes, i.e. finishes negotiating LACP with the switch on at least
one link. And in fact, setting bond1 ONBOOT=no does seem to delay starting bond1 just enough to solve
my problem, although that may just be luck from a small sample size.

What possible setting do you have in mind that could convince NetworkManager not to start bond1 until
after bond0 finishes negotiating LACP and has a reasonable MAC address?

Thanks,
Andy

P.S. And yes -- I agree that my solution to add a systemd service to call ifup is a hack. But I need a
hack that works, because I don't see how to get NetworkManager to do the right thing. I think that
NetworkManager is fundamentally too stupid to grok this configuration correctly. But I'm certainly
no expert in NetworkManager, and I've love to be convinced that I'm wrong. What hints could I gives
to NetworkManager to tell it to delay starting bond1 until bond0 has stabilized with a good MAC address?
Alternatively, if I could convince bond1 to steal the MAC address from lan0 instead of bond0, that could
fix the issue. But I don't see how to do that. In practice, it seems to randomly select a MAC address
from either lan0 or bond0, and if bond0 hasn't stabilized in time, its MAC address will be random.

P.P.S. I am setting the primary slave. I don't know why that would matter. The primary slave is bond0, since
that has the bandwidth. Here are my config files:

sh-5.1$ cd /etc/sysconfig/network-scripts/
sh-5.1$ head -20 ifcfg-*
==> ifcfg-bond0 <==
DEVICE=bond0
TYPE=Bond
BOOTPROTO=none
ONBOOT=yes
HOTPLUG=no
NOZEROCONF=yes
NM_CONTROLLED=yes
BONDING_MASTER=yes
BONDING_OPTS="miimon=100 ad_select=bandwidth mode=802.3ad xmit_hash_policy=layer2+3 arp_interval=0 lacp_rate=fast"
MASTER=bond1
SLAVE=yes

==> ifcfg-bond1 <==
DEVICE=bond1
TYPE=Bond
BOOTPROTO=static
ONBOOT=no
HOTPLUG=no
NOZEROCONF=yes
NM_CONTROLLED=yes
IPADDR=192.168.30.27
NETMASK=255.255.254.0
BONDING_MASTER=yes
BONDING_OPTS="miimon=0 mode=active-backup arp_all_targets=any primary=bond0 arp_ip_target=192.168.30.13,192.168.30.12,192.168.30.7,192.168.30.5 arp_interval=1000 fail_over_mac=follow arp_validate=none primary_reselect=always"

==> ifcfg-bond1.3 <==
DEVICE=bond1.3
BOOTPROTO=static
ONBOOT=no
HOTPLUG=no
NOZEROCONF=yes
NM_CONTROLLED=yes
VLAN=yes
IPADDR=192.168.33.27
NETMASK=255.255.255.0

==> ifcfg-lan0 <==
DEVICE=lan0
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
HOTPLUG=no
NOZEROCONF=yes
NM_CONTROLLED=yes
MASTER=bond1
SLAVE=yes

==> ifcfg-lan1 <==
DEVICE=lan1
ONBOOT=no
HOTPLUG=no
NOZEROCONF=yes
NM_CONTROLLED=no

==> ifcfg-lan2 <==
DEVICE=lan2
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
HOTPLUG=no
NOZEROCONF=yes
NM_CONTROLLED=yes
MASTER=bond0
SLAVE=yes

==> ifcfg-lan3 <==
DEVICE=lan3
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
HOTPLUG=no
NOZEROCONF=yes
NM_CONTROLLED=yes
MASTER=bond0
SLAVE=yes

==> ifcfg-lan4 <==
DEVICE=lan4
ONBOOT=no
HOTPLUG=no
NOZEROCONF=yes
NM_CONTROLLED=no

==> ifcfg-lan5 <==
DEVICE=lan5
ONBOOT=no
HOTPLUG=no
NOZEROCONF=yes
NM_CONTROLLED=no

Comment 6 Andrew Schorr 2023-04-26 01:00:58 UTC
Anyway, by setting ONBOOT=no for bond1 and bond1.3, it seems somehow to delay bringing up bond1 just enough
to fix my issue. And then my kludge systemd service starts bond1.3 to get things to the proper state
before network-online.target.

But here's the various other debugging info you requested:

sh-5.1$ nmcli -f all connection | cat -
NAME          UUID                                  TYPE      TIMESTAMP   TIMESTAMP-REAL            AUTOCONNECT  AUTOCONNECT-PRIORITY  READONLY  DBUS-PATH                                   ACTIVE  DEVICE   STATE      ACTIVE-PATH                                         SLAVE  FILENAME                                               
Vlan bond1.3  2682acff-f7af-8f05-0172-f959526caa05  vlan      1682470608  Tue Apr 25 20:56:48 2023  yes          0                     no        /org/freedesktop/NetworkManager/Settings/6  yes     bond1.3  activated  /org/freedesktop/NetworkManager/ActiveConnection/7  --     /etc/sysconfig/network-scripts/ifcfg-bond1.3           
lo            1bc71bbb-7bfc-458b-8987-dc989623ed73  loopback  1682470607  Tue Apr 25 20:56:47 2023  no           0                     no        /org/freedesktop/NetworkManager/Settings/7  yes     lo       activated  /org/freedesktop/NetworkManager/ActiveConnection/1  --     /run/NetworkManager/system-connections/lo.nmconnection 
Bond bond1    92306dc1-4142-23de-097b-b1464cfab5ee  bond      1682470608  Tue Apr 25 20:56:48 2023  yes          0                     no        /org/freedesktop/NetworkManager/Settings/5  yes     bond1    activated  /org/freedesktop/NetworkManager/ActiveConnection/5  --     /etc/sysconfig/network-scripts/ifcfg-bond1             
Bond bond0    ad33d8b0-1f7b-cab9-9447-ba07f855b143  bond      1682470607  Tue Apr 25 20:56:47 2023  yes          0                     no        /org/freedesktop/NetworkManager/Settings/1  yes     bond0    activated  /org/freedesktop/NetworkManager/ActiveConnection/6  bond   /etc/sysconfig/network-scripts/ifcfg-bond0             
System lan0   1969c5f0-fbcf-2bd8-5a7e-7a8a67908f8f  ethernet  1682470607  Tue Apr 25 20:56:47 2023  yes          0                     no        /org/freedesktop/NetworkManager/Settings/4  yes     lan0     activated  /org/freedesktop/NetworkManager/ActiveConnection/2  bond   /etc/sysconfig/network-scripts/ifcfg-lan0              
System lan2   38ac5fa0-df4c-251b-7afb-02fcb93ed035  ethernet  1682470607  Tue Apr 25 20:56:47 2023  yes          0                     no        /org/freedesktop/NetworkManager/Settings/2  yes     lan2     activated  /org/freedesktop/NetworkManager/ActiveConnection/3  bond   /etc/sysconfig/network-scripts/ifcfg-lan2              
System lan3   93ad060f-9feb-77e2-8938-1b17f04e706e  ethernet  1682470607  Tue Apr 25 20:56:47 2023  yes          0                     no        /org/freedesktop/NetworkManager/Settings/3  yes     lan3     activated  /org/freedesktop/NetworkManager/ActiveConnection/4  bond   /etc/sysconfig/network-scripts/ifcfg-lan3              


sh-5.1$ nmcli device
DEVICE   TYPE      STATE                   CONNECTION   
bond1.3  vlan      connected               Vlan bond1.3 
lo       loopback  connected (externally)  lo           
bond1    bond      connected               Bond bond1   
bond0    bond      connected               Bond bond0   
lan0     ethernet  connected               System lan0  
lan2     ethernet  connected               System lan2  
lan3     ethernet  connected               System lan3  
lan1     ethernet  unmanaged               --           
lan4     ethernet  unmanaged               --           
lan5     ethernet  unmanaged               --           


sh-5.1$ ip -d link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 
2: lan0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:b0:b7:c0 brd ff:ff:ff:ff:ff:ff permaddr a0:36:bc:c8:07:a9 promiscuity 0 minmtu 68 maxmtu 9216 
    bond_slave state BACKUP mii_status UP link_failure_count 0 perm_hwaddr a0:36:bc:c8:07:a9 queue_id 0 addrgenmode none numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 parentbus pci parentdev 0000:05:00.0 
    altname enp5s0
3: lan1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether a0:36:bc:c8:07:aa brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9216 addrgenmode eui64 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 parentbus pci parentdev 0000:06:00.0 
    altname enp6s0
4: lan2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether a6:c7:fb:56:3d:3d brd ff:ff:ff:ff:ff:ff permaddr 40:a6:b7:b0:b7:c0 promiscuity 0 minmtu 68 maxmtu 9702 
    bond_slave state ACTIVE mii_status UP link_failure_count 1 perm_hwaddr 40:a6:b7:b0:b7:c0 queue_id 0 ad_aggregator_id 2 ad_actor_oper_port_state 63 ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync,collecting,distributing> ad_partner_oper_port_state 61 ad_partner_oper_port_state_str <active,aggregating,in_sync,collecting,distributing> addrgenmode none numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 portid 40a6b7b0b7c0 parentbus pci parentdev 0000:01:00.0 
5: lan3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether a6:c7:fb:56:3d:3d brd ff:ff:ff:ff:ff:ff permaddr 40:a6:b7:b0:b7:c1 promiscuity 0 minmtu 68 maxmtu 9702 
    bond_slave state ACTIVE mii_status UP link_failure_count 1 perm_hwaddr 40:a6:b7:b0:b7:c1 queue_id 0 ad_aggregator_id 2 ad_actor_oper_port_state 63 ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync,collecting,distributing> ad_partner_oper_port_state 61 ad_partner_oper_port_state_str <active,aggregating,in_sync,collecting,distributing> addrgenmode none numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 portid 40a6b7b0b7c1 parentbus pci parentdev 0000:01:00.1 
6: lan4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:b0:b7:c2 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 addrgenmode eui64 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 portid 40a6b7b0b7c2 parentbus pci parentdev 0000:01:00.2 
7: lan5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:b0:b7:c3 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 addrgenmode eui64 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 portid 40a6b7b0b7c3 parentbus pci parentdev 0000:01:00.3 
8: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether a6:c7:fb:56:3d:3d brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 
    bond mode active-backup active_slave bond0 miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 1000 arp_missed_max 2 arp_ip_target 192.168.30.13,192.168.30.12,192.168.30.7,192.168.30.5 arp_validate none arp_all_targets any primary bond0 primary_reselect always fail_over_mac follow xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_active on lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode eui64 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 
9: bond1.3@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether a6:c7:fb:56:3d:3d brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 
    vlan protocol 802.1Q id 3 <REORDER_HDR> addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 
10: bond0: <BROADCAST,MULTICAST,MASTER,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond1 state UP mode DEFAULT group default qlen 1000
    link/ether a6:c7:fb:56:3d:3d brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 
    bond mode 802.3ad miimon 100 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_missed_max 2 arp_validate none arp_all_targets any primary_reselect always fail_over_mac none xmit_hash_policy layer2+3 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_active on lacp_rate fast ad_select bandwidth ad_aggregator 2 ad_num_ports 2 ad_actor_key 15 ad_partner_key 2 ad_partner_mac 10:c5:95:02:10:00 tlb_dynamic_lb 1 
    bond_slave state ACTIVE mii_status UP link_failure_count 0 perm_hwaddr a6:c7:fb:56:3d:3d queue_id 0 addrgenmode none numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 


sh-5.1$ grep -R ^ /etc/NetworkManager/system-connections /etc/sysconfig/network-scripts
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:NetworkManager stores new network profiles in keyfile format in the
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:/etc/NetworkManager/system-connections/ directory.
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:Previously, NetworkManager stored network profiles in ifcfg format
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:in this directory (/etc/sysconfig/network-scripts/). However, the ifcfg
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:format is deprecated. By default, NetworkManager no longer creates
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:new profiles in this format.
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:Connection profiles in keyfile format have many benefits. For example,
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:this format is INI file-based and can easily be parsed and generated.
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:Each section in NetworkManager keyfiles corresponds to a NetworkManager
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:setting name as described in the nm-settings(5) and nm-settings-keyfile(5)
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:man pages. Each key-value-pair in a section is one of the properties
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:listed in the settings specification of the man page.
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:If you still use network profiles in ifcfg format, consider migrating
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:them to keyfile format. To migrate all profiles at once, enter:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:# nmcli connection migrate
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:This command migrates all profiles from ifcfg format to keyfile
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:format and stores them in /etc/NetworkManager/system-connections/.
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:Alternatively, to migrate only a specific profile, enter:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:# nmcli connection migrate <profile_name|UUID|D-Bus_path>
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:For further details, see:
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:* nm-settings-keyfile(5)
/etc/sysconfig/network-scripts/readme-ifcfg-rh.txt:* nmcli(1)
/etc/sysconfig/network-scripts/route-bond1.3:default via 192.168.33.254 dev bond1.3 metric 4
/etc/sysconfig/network-scripts/ifcfg-bond0:DEVICE=bond0
/etc/sysconfig/network-scripts/ifcfg-bond0:TYPE=Bond
/etc/sysconfig/network-scripts/ifcfg-bond0:BOOTPROTO=none
/etc/sysconfig/network-scripts/ifcfg-bond0:ONBOOT=yes
/etc/sysconfig/network-scripts/ifcfg-bond0:HOTPLUG=no
/etc/sysconfig/network-scripts/ifcfg-bond0:NOZEROCONF=yes
/etc/sysconfig/network-scripts/ifcfg-bond0:NM_CONTROLLED=yes
/etc/sysconfig/network-scripts/ifcfg-bond0:BONDING_MASTER=yes
/etc/sysconfig/network-scripts/ifcfg-bond0:BONDING_OPTS="miimon=100 ad_select=bandwidth mode=802.3ad xmit_hash_policy=layer2+3 arp_interval=0 lacp_rate=fast"
/etc/sysconfig/network-scripts/ifcfg-bond0:MASTER=bond1
/etc/sysconfig/network-scripts/ifcfg-bond0:SLAVE=yes
/etc/sysconfig/network-scripts/ifcfg-lan2:DEVICE=lan2
/etc/sysconfig/network-scripts/ifcfg-lan2:TYPE=Ethernet
/etc/sysconfig/network-scripts/ifcfg-lan2:BOOTPROTO=none
/etc/sysconfig/network-scripts/ifcfg-lan2:ONBOOT=yes
/etc/sysconfig/network-scripts/ifcfg-lan2:HOTPLUG=no
/etc/sysconfig/network-scripts/ifcfg-lan2:NOZEROCONF=yes
/etc/sysconfig/network-scripts/ifcfg-lan2:NM_CONTROLLED=yes
/etc/sysconfig/network-scripts/ifcfg-lan2:MASTER=bond0
/etc/sysconfig/network-scripts/ifcfg-lan2:SLAVE=yes
/etc/sysconfig/network-scripts/ifcfg-lan3:DEVICE=lan3
/etc/sysconfig/network-scripts/ifcfg-lan3:TYPE=Ethernet
/etc/sysconfig/network-scripts/ifcfg-lan3:BOOTPROTO=none
/etc/sysconfig/network-scripts/ifcfg-lan3:ONBOOT=yes
/etc/sysconfig/network-scripts/ifcfg-lan3:HOTPLUG=no
/etc/sysconfig/network-scripts/ifcfg-lan3:NOZEROCONF=yes
/etc/sysconfig/network-scripts/ifcfg-lan3:NM_CONTROLLED=yes
/etc/sysconfig/network-scripts/ifcfg-lan3:MASTER=bond0
/etc/sysconfig/network-scripts/ifcfg-lan3:SLAVE=yes
/etc/sysconfig/network-scripts/ifcfg-lan0:DEVICE=lan0
/etc/sysconfig/network-scripts/ifcfg-lan0:TYPE=Ethernet
/etc/sysconfig/network-scripts/ifcfg-lan0:BOOTPROTO=none
/etc/sysconfig/network-scripts/ifcfg-lan0:ONBOOT=yes
/etc/sysconfig/network-scripts/ifcfg-lan0:HOTPLUG=no
/etc/sysconfig/network-scripts/ifcfg-lan0:NOZEROCONF=yes
/etc/sysconfig/network-scripts/ifcfg-lan0:NM_CONTROLLED=yes
/etc/sysconfig/network-scripts/ifcfg-lan0:MASTER=bond1
/etc/sysconfig/network-scripts/ifcfg-lan0:SLAVE=yes
/etc/sysconfig/network-scripts/ifcfg-lan1:DEVICE=lan1
/etc/sysconfig/network-scripts/ifcfg-lan1:ONBOOT=no
/etc/sysconfig/network-scripts/ifcfg-lan1:HOTPLUG=no
/etc/sysconfig/network-scripts/ifcfg-lan1:NOZEROCONF=yes
/etc/sysconfig/network-scripts/ifcfg-lan1:NM_CONTROLLED=no
/etc/sysconfig/network-scripts/ifcfg-lan4:DEVICE=lan4
/etc/sysconfig/network-scripts/ifcfg-lan4:ONBOOT=no
/etc/sysconfig/network-scripts/ifcfg-lan4:HOTPLUG=no
/etc/sysconfig/network-scripts/ifcfg-lan4:NOZEROCONF=yes
/etc/sysconfig/network-scripts/ifcfg-lan4:NM_CONTROLLED=no
/etc/sysconfig/network-scripts/ifcfg-lan5:DEVICE=lan5
/etc/sysconfig/network-scripts/ifcfg-lan5:ONBOOT=no
/etc/sysconfig/network-scripts/ifcfg-lan5:HOTPLUG=no
/etc/sysconfig/network-scripts/ifcfg-lan5:NOZEROCONF=yes
/etc/sysconfig/network-scripts/ifcfg-lan5:NM_CONTROLLED=no
/etc/sysconfig/network-scripts/ifcfg-bond1:DEVICE=bond1
/etc/sysconfig/network-scripts/ifcfg-bond1:TYPE=Bond
/etc/sysconfig/network-scripts/ifcfg-bond1:BOOTPROTO=static
/etc/sysconfig/network-scripts/ifcfg-bond1:ONBOOT=yes
/etc/sysconfig/network-scripts/ifcfg-bond1:HOTPLUG=no
/etc/sysconfig/network-scripts/ifcfg-bond1:NOZEROCONF=yes
/etc/sysconfig/network-scripts/ifcfg-bond1:NM_CONTROLLED=yes
/etc/sysconfig/network-scripts/ifcfg-bond1:IPADDR=192.168.30.27
/etc/sysconfig/network-scripts/ifcfg-bond1:NETMASK=255.255.254.0
/etc/sysconfig/network-scripts/ifcfg-bond1:BONDING_MASTER=yes
/etc/sysconfig/network-scripts/ifcfg-bond1:BONDING_OPTS="miimon=0 mode=active-backup arp_all_targets=any primary=bond0 arp_ip_target=192.168.30.13,192.168.30.12,192.168.30.7,192.168.30.5 arp_interval=1000 fail_over_mac=follow arp_validate=none primary_reselect=always"
/etc/sysconfig/network-scripts/ifcfg-bond1.3:DEVICE=bond1.3
/etc/sysconfig/network-scripts/ifcfg-bond1.3:BOOTPROTO=static
/etc/sysconfig/network-scripts/ifcfg-bond1.3:ONBOOT=yes
/etc/sysconfig/network-scripts/ifcfg-bond1.3:HOTPLUG=no
/etc/sysconfig/network-scripts/ifcfg-bond1.3:NOZEROCONF=yes
/etc/sysconfig/network-scripts/ifcfg-bond1.3:NM_CONTROLLED=yes
/etc/sysconfig/network-scripts/ifcfg-bond1.3:VLAN=yes
/etc/sysconfig/network-scripts/ifcfg-bond1.3:IPADDR=192.168.33.27
/etc/sysconfig/network-scripts/ifcfg-bond1.3:NETMASK=255.255.255.0


And I'm uploading the level=TRACE log as an attachment called NetworkManager_trace.txt

Comment 7 Andrew Schorr 2023-04-26 01:04:38 UTC
Created attachment 1959940 [details]
NetworkManager TRACE log

This is the output from:

journalctl -b0 -uNetworkManager

I previously set level=TRACE in /etc/NetworkManager/NetworkManager.conf
and in /etc/systemd/journald.conf:
RateLimitIntervalSec=0
RateLimitBurst=0


For whatever it's worth, I believe that NetworkManager should not automatically
start a master interface if the master has ONBOOT=no. I don't see why starting
a slave means that one must implicitly start the master.

But in any case, the ONBOOT=no hack seems to be working for me for now.

Regards,
Andy

Comment 8 Andrew Schorr 2023-04-26 01:05:31 UTC
Created attachment 1959941 [details]
journalctl boot log

In case you need all of the boot messages, here's
the output from:

journalctl -b0

Regards,
Andy

Comment 9 sfaye 2023-07-19 11:43:37 UTC
Thank you for your detailed report and investigation. Considering the issue  involves a complex bonding configuration with a race condition on booting and our current team's capacity, we've decided to close this issue as WONTFIX. 
It appears that your workaround is effective for the time being.