RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1945429 - child connection auto-activation will activate parents who have less priority than others
Summary: child connection auto-activation will activate parents who have less priority...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: beta
: ---
Assignee: NetworkManager Development Team
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks: 1943320
TreeView+ depends on / blocked
 
Reported: 2021-03-31 22:40 UTC by Tim Rozet
Modified: 2021-07-14 04:31 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-13 16:18:34 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Tim Rozet 2021-03-31 22:40:19 UTC
Description of problem:
We see that auto-activation of a bond child connection ends up preempting a higher priority connection for parent bond device. This shouldn't happen because it ends up stealing the bond device from the higher priority connection and giving it to the lower priority connection during auto-activation.

Consider these connections:
[root@ocp2-worker-3 ~]# nmcli conn show
NAME            UUID                                  TYPE           DEVICE 
ovs-if-br-ex    298549e1-608f-4ce3-b315-00088f17f877  ovs-interface  br-ex  
br-ex           abc60ab9-bf66-416a-8009-d61d3af3e010  ovs-bridge     br-ex  
eno3            1841b9fb-1e74-47b2-8fb7-90221b7042c3  ethernet       eno3   
eno3d1          80126c8e-e242-4a8a-9346-7bae8204d559  ethernet       eno3d1 
ovs-if-phys0    3c4a4c30-8d91-41f2-8b88-90a9d5b7df6b  bond           bond0  
ovs-port-br-ex  0ad2e793-d0a7-4d1f-8966-1864031a8695  ovs-port       br-ex  
ovs-port-phys0  fc4f37af-44a9-4464-8396-55e18bc834f0  ovs-port       bond0  
bond0           faf9e63e-47de-4df9-97e0-6a7c18121331  bond           --     
eno1            70018bb3-1f08-4848-a5af-c4ee5fa13f18  ethernet       --     
eno2            09fc2377-dfca-4e34-ba8e-1c4fccf6fb17  ethernet       --

eno2 is a child to bond0. When NetworkManager starts it will bring up ovs-if-phys0 first (who also uses bond0), and then subsequently will activate eno2. When this happens eno2 will activate, and then trigger bond0 to activate. bond0 will take the interface away from ovs-if-phys0 which shouldn't happen:

[root@ocp2-worker-3 ~]# nmcli conn show
NAME            UUID                                  TYPE           DEVICE 
ovs-if-br-ex    298549e1-608f-4ce3-b315-00088f17f877  ovs-interface  br-ex  
br-ex           abc60ab9-bf66-416a-8009-d61d3af3e010  ovs-bridge     br-ex  
eno3            1841b9fb-1e74-47b2-8fb7-90221b7042c3  ethernet       eno3   
eno3d1          80126c8e-e242-4a8a-9346-7bae8204d559  ethernet       eno3d1 
ovs-if-phys0    3c4a4c30-8d91-41f2-8b88-90a9d5b7df6b  bond           bond0  
ovs-port-br-ex  0ad2e793-d0a7-4d1f-8966-1864031a8695  ovs-port       br-ex  
ovs-port-phys0  fc4f37af-44a9-4464-8396-55e18bc834f0  ovs-port       bond0  
bond0           faf9e63e-47de-4df9-97e0-6a7c18121331  bond           --     
eno1            70018bb3-1f08-4848-a5af-c4ee5fa13f18  ethernet       --     
eno2            09fc2377-dfca-4e34-ba8e-1c4fccf6fb17  ethernet       --     
[root@ocp2-worker-3 ~]# nmcli conn up eno2   
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/15)
[root@ocp2-worker-3 ~]# nmcli conn show
NAME            UUID                                  TYPE           DEVICE 
bond0           faf9e63e-47de-4df9-97e0-6a7c18121331  bond           bond0  
ovs-if-br-ex    298549e1-608f-4ce3-b315-00088f17f877  ovs-interface  br-ex  
br-ex           abc60ab9-bf66-416a-8009-d61d3af3e010  ovs-bridge     br-ex  
eno1            70018bb3-1f08-4848-a5af-c4ee5fa13f18  ethernet       eno1   
eno2            09fc2377-dfca-4e34-ba8e-1c4fccf6fb17  ethernet       eno2   
eno3            1841b9fb-1e74-47b2-8fb7-90221b7042c3  ethernet       eno3   
eno3d1          80126c8e-e242-4a8a-9346-7bae8204d559  ethernet       eno3d1 
ovs-port-br-ex  0ad2e793-d0a7-4d1f-8966-1864031a8695  ovs-port       br-ex  
ovs-port-phys0  fc4f37af-44a9-4464-8396-55e18bc834f0  ovs-port       bond0  
ovs-if-phys0    3c4a4c30-8d91-41f2-8b88-90a9d5b7df6b  bond           --  

worker-3 ~]# nmcli --get-values connection.autoconnect-priority conn show bond0
0

]# nmcli --get-values connection.autoconnect-priority conn show ovs-if-phys0
100

I recognize in the above example I'm activating the child manually, but this does happen when auto-activation occurs at node boot.

Version-Release number of selected component (if applicable):
[root@ocp2-worker-3 ~]# rpm -qa | grep Network
NetworkManager-1.26.0-13.1.rhaos4.7.el8.x86_64
NetworkManager-ovs-1.26.0-13.1.rhaos4.7.el8.x86_64
NetworkManager-libnm-1.26.0-13.1.rhaos4.7.el8.x86_64
NetworkManager-team-1.26.0-13.1.rhaos4.7.el8.x86_64
NetworkManager-tui-1.26.0-13.1.rhaos4.7.el8.x86_64

Comment 1 Tim Rozet 2021-03-31 22:41:32 UTC
Beniamino can you let me know if this is really a bug or if there is some config I'm missing here?

Comment 2 Beniamino Galvani 2021-04-01 07:41:59 UTC
How does the 'eno2' connection specify the master, by interface name
or by UUID (I suppose the former)? Please paste the output of:

  nmcli -o connection show eno2
  nmcli -o connection show ovs-if-phys0
  nmcli -o connection show bond0

Do you want to attach eno2 to both 'ovs-if-phys0' and 'bond0'
connections at different times, or always to 'ovs-if-phys0'?

Comment 3 Thomas Haller 2021-04-01 07:50:41 UTC
"connection.autoconnect-priority" only has any meaning when autoactivating profiles... and even thwn, only that if there are multiple profiles that could activate on the same device (at a certain time), that the one with the higher priority is chosen.

After a profile activates, it doesn't matter whether it was done by the user manually or via "connection.autoconnect". In particular, the "connection.autoconnect-priority" no longer matters for an activated profile.


I suspect that eno2 has "connection.master=faf9e63e-47de-4df9-97e0-6a7c18121331". If you then manually activate the port, the bond must also come up. If you are happy with any bond device, set "connection.master=bond0".


I second Beniamino's request for more information.

Comment 4 Tim Rozet 2021-04-01 13:31:09 UTC
I think the issue maybe is when the child autoactivates, it activates the parent, and NM doesnt check if the parent conflicts with another connection that has a higher priority. ovs-if-phsy0 has a priority of 100, which is higher than the other 2 connections. The desired behavior here is that ovs-if-phys0 has the bond0 device, and the other connection does not come up. See the outputs here:

[root@ocp2-worker-3 ~]# nmcli conn show eno2
connection.id:                          eno2
connection.uuid:                        09fc2377-dfca-4e34-ba8e-1c4fccf6fb17
connection.stable-id:                   --
connection.type:                        802-3-ethernet
connection.interface-name:              eno2
connection.autoconnect:                 yes
connection.autoconnect-priority:        0
connection.autoconnect-retries:         -1 (default)
connection.multi-connect:               1 (single)
connection.auth-retries:                -1
connection.timestamp:                   1617229069
connection.read-only:                   no
connection.permissions:                 --
connection.zone:                        --
connection.master:                      faf9e63e-47de-4df9-97e0-6a7c18121331
connection.slave-type:                  bond
connection.autoconnect-slaves:          -1 (default)
connection.secondaries:                 --
connection.gateway-ping-timeout:        0
connection.metered:                     unknown
connection.lldp:                        default
connection.mdns:                        -1 (default)
connection.llmnr:                       -1 (default)
connection.wait-device-timeout:         60000
802-3-ethernet.port:                    --
802-3-ethernet.speed:                   0
802-3-ethernet.duplex:                  --
802-3-ethernet.auto-negotiate:          no
802-3-ethernet.mac-address:             --
802-3-ethernet.cloned-mac-address:      --
802-3-ethernet.generate-mac-address-mask:--
802-3-ethernet.mac-address-blacklist:   --
802-3-ethernet.mtu:                     auto
802-3-ethernet.s390-subchannels:        --
802-3-ethernet.s390-nettype:            --
802-3-ethernet.s390-options:            --
802-3-ethernet.wake-on-lan:             default
802-3-ethernet.wake-on-lan-password:    --

[root@ocp2-worker-3 ~]# nmcli conn show bond0
connection.id:                          bond0
connection.uuid:                        faf9e63e-47de-4df9-97e0-6a7c18121331
connection.stable-id:                   --
connection.type:                        bond
connection.interface-name:              bond0
connection.autoconnect:                 yes
connection.autoconnect-priority:        0
connection.autoconnect-retries:         -1 (default)
connection.multi-connect:               1 (single)
connection.auth-retries:                -1
connection.timestamp:                   1617229069
connection.read-only:                   no
connection.permissions:                 --
connection.zone:                        --
connection.master:                      --
connection.slave-type:                  --
connection.autoconnect-slaves:          -1 (default)
connection.secondaries:                 --
connection.gateway-ping-timeout:        0
connection.metered:                     unknown
connection.lldp:                        default
connection.mdns:                        -1 (default)
connection.llmnr:                       -1 (default)
connection.wait-device-timeout:         -1
ipv4.method:                            manual
ipv4.dns:                               172.18.42.10,172.18.42.11
ipv4.dns-search:                        --
ipv4.dns-options:                       --
ipv4.dns-priority:                      0
ipv4.addresses:                         172.18.0.72/24
ipv4.gateway:                           172.18.0.1
ipv4.routes:                            --
ipv4.route-metric:                      -1
ipv4.route-table:                       0 (unspec)
ipv4.routing-rules:                     --
ipv4.ignore-auto-routes:                no
ipv4.ignore-auto-dns:                   no
ipv4.dhcp-client-id:                    --
ipv4.dhcp-iaid:                         --
ipv4.dhcp-timeout:                      0 (default)
ipv4.dhcp-send-hostname:                yes
ipv4.dhcp-hostname:                     ocp2-worker-3.lab.signal9.gg
ipv4.dhcp-fqdn:                         --
ipv4.dhcp-hostname-flags:               0x0 (none)
ipv4.never-default:                     no
ipv4.may-fail:                          no
ipv4.dad-timeout:                       -1 (default)
ipv4.dhcp-vendor-class-identifier:      --
ipv6.method:                            disabled
ipv6.dns:                               --
ipv6.dns-search:                        --
ipv6.dns-options:                       --
ipv6.dns-priority:                      0
ipv6.addresses:                         --
ipv6.gateway:                           --
ipv6.routes:                            --
ipv6.route-metric:                      -1
ipv6.route-table:                       0 (unspec)
ipv6.routing-rules:                     --
ipv6.ignore-auto-routes:                no
ipv6.ignore-auto-dns:                   no
ipv6.never-default:                     no
ipv6.may-fail:                          yes
ipv6.ip6-privacy:                       -1 (unknown)
ipv6.addr-gen-mode:                     eui64
ipv6.ra-timeout:                        0 (default)
ipv6.dhcp-duid:                         --
ipv6.dhcp-iaid:                         --
ipv6.dhcp-timeout:                      0 (default)
ipv6.dhcp-send-hostname:                yes
ipv6.dhcp-hostname:                     ocp2-worker-3.lab.signal9.gg
ipv6.dhcp-hostname-flags:               0x0 (none)
ipv6.token:                             --
bond.options:                           mode=active-backup
proxy.method:                           none
proxy.browser-only:                     no
proxy.pac-url:                          --
proxy.pac-script:                       --

[root@ocp2-worker-3 ~]# nmcli conn show ovs-if-phys0
connection.id:                          ovs-if-phys0
connection.uuid:                        3c4a4c30-8d91-41f2-8b88-90a9d5b7df6b
connection.stable-id:                   --
connection.type:                        bond
connection.interface-name:              bond0
connection.autoconnect:                 yes
connection.autoconnect-priority:        100
connection.autoconnect-retries:         -1 (default)
connection.multi-connect:               0 (default)
connection.auth-retries:                -1
connection.timestamp:                   1617283597
connection.read-only:                   no
connection.permissions:                 --
connection.zone:                        --
connection.master:                      fc4f37af-44a9-4464-8396-55e18bc834f0
connection.slave-type:                  ovs-port
connection.autoconnect-slaves:          -1 (default)
connection.secondaries:                 --
connection.gateway-ping-timeout:        0
connection.metered:                     unknown
connection.lldp:                        default
connection.mdns:                        -1 (default)
connection.llmnr:                       -1 (default)
connection.wait-device-timeout:         -1
802-3-ethernet.port:                    --
802-3-ethernet.speed:                   0
802-3-ethernet.duplex:                  --
802-3-ethernet.auto-negotiate:          no
802-3-ethernet.mac-address:             --
802-3-ethernet.cloned-mac-address:      --
802-3-ethernet.generate-mac-address-mask:--
802-3-ethernet.mac-address-blacklist:   --
802-3-ethernet.mtu:                     1500
802-3-ethernet.s390-subchannels:        --
802-3-ethernet.s390-nettype:            --
802-3-ethernet.s390-options:            --
802-3-ethernet.wake-on-lan:             default
802-3-ethernet.wake-on-lan-password:    --
bond.options:                           mode=active-backup
ovs-interface.type:                     system
GENERAL.NAME:                           ovs-if-phys0
GENERAL.UUID:                           3c4a4c30-8d91-41f2-8b88-90a9d5b7df6b
GENERAL.DEVICES:                        bond0
GENERAL.IP-IFACE:                       bond0
GENERAL.STATE:                          activated
GENERAL.DEFAULT:                        no
GENERAL.DEFAULT6:                       no
GENERAL.SPEC-OBJECT:                    --
GENERAL.VPN:                            no
GENERAL.DBUS-PATH:                      /org/freedesktop/NetworkManager/ActiveConnection/23
GENERAL.CON-PATH:                       /org/freedesktop/NetworkManager/Settings/9
GENERAL.ZONE:                           --
GENERAL.MASTER-PATH:                    /org/freedesktop/NetworkManager/Devices/8
IP4.GATEWAY:                            --
IP6.GATEWAY:                            --

Comment 5 Tim Rozet 2021-04-01 13:34:47 UTC
Some journal output here:
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4206] device (br-ex): Activation: starting connection 'ovs-if-br-ex' (298549e1-608f-4ce3-b315-00088f17f877)
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4209] device (br-ex): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4217] manager: NetworkManager state is now CONNECTING
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4224] device (br-ex): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4232] device (br-ex): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4246] device (br-ex): Activation: connection 'ovs-if-br-ex' enslaved, continuing activation
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4359] device (br-ex): carrier: link connected
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4412] device (br-ex): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4450] device (br-ex): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4459] device (br-ex): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4471] manager: NetworkManager state is now CONNECTED_LOCAL
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4501] manager: NetworkManager state is now CONNECTED_SITE
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4506] policy: set 'ovs-if-br-ex' (br-ex) as default for IPv4 routing and DNS
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4541] device (br-ex): Activation: successful, device activated.
Mar 31 19:16:45 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218205.4553] manager: NetworkManager state is now CONNECTED_GLOBAL
Mar 31 19:16:46 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218206.2860] device (eno2): state change: disconnected -> unavailable (reason 'carrier-changed', sys-iface-state: 'mana>
Mar 31 19:16:46 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218206.8471] device (eno1): state change: disconnected -> unavailable (reason 'carrier-changed', sys-iface-state: 'mana>
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.4766] device (eno2): carrier: link connected
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.4771] device (eno2): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'mana>
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.4790] policy: auto-activating connection 'eno2' (09fc2377-dfca-4e34-ba8e-1c4fccf6fb17)
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.4805] device (eno2): Activation: starting connection 'eno2' (09fc2377-dfca-4e34-ba8e-1c4fccf6fb17)
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.4812] device (bond0): disconnecting for new activation request.
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.4813] device (bond0): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'manage>
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.4825] device (bond0): releasing ovs interface bond0
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.4828] device (bond0): released from master device bond0
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.4845] device (eno2): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.4900] device (bond0): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'man>
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.5005] device (bond0): Activation: starting connection 'bond0' (faf9e63e-47de-4df9-97e0-6a7c18121331)
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.5024] device (bond0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.5145] device (bond0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.5163] device (eno2): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.5177] device (bond0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.5189] device (eno2): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6270] device (bond0): enslaved bond slave eno2
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6271] device (eno2): Activation: connection 'eno2' enslaved, continuing activation
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6285] device (eno2): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6317] device (bond0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6396] device (eno2): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6402] device (eno2): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6425] manager: NetworkManager state is now CONNECTING
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6429] policy: set 'bond0' (bond0) as default for IPv4 routing and DNS
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6465] device (eno2): Activation: successful, device activated.
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6474] device (bond0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6480] device (bond0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6488] manager: NetworkManager state is now CONNECTED_GLOBAL
Mar 31 19:16:48 ocp2-worker-3.lab.signal9.gg NetworkManager[1688]: <info>  [1617218208.6508] device (bond0): Activation: successful, device activated.

Comment 6 Tim Rozet 2021-04-01 15:17:36 UTC
After talking with Beniamino the issue is that the slave connections like eno2, are setting their master to be the connection id of the "bond0" connection instead of the bond0 device. If we modify those connections to set the master as the device name, the issue is resolved. The question is what set the UUID as the master for eno2 and eno1? Normally in other OCP deployments a user does ignition and sets ifconfig files to create the initial bonds like:

[core@master-0-0 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-bond0
BONDING_OPTS="downdelay=0 lacp_rate=fast miimon=100 mode=802.3ad updelay=0"
TYPE=Bond
BONDING_MASTER=yes
BOOTPROTO=dhcp
NAME=bond0
DEVICE=bond0
ONBOOT=yes
[core@master-0-0 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-enp4s0
DEVICE=enp4s0
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes

This will set the master to be the device name. Austin, can you provide more information on how you create these bonds? We want to figure out if it is NM setting the uuid or if its your deployment. Thanks.

Comment 7 Andrew Austin 2021-04-01 16:33:08 UTC
The bonds were set up using kernel arguments at install time. The new installer creates nmconnection files and copies them into the installed system rather than the previous behavior of using network-scripts files. I can create network-scripts files by using ignition for network configuration but that is substantially more effort than setting kernel args.

The networking kernel args to build the node in question were: bootdev=bond0  bond=bond0:eno1,eno2:mode=active-backup ip=172.18.0.72::172.18.0.1:255.255.255.0:ocp2-worker-3.lab.signal9.gg:bond0:none:172.18.42.10 ip=eno3:none ip=eno3d1:none nameserver=172.18.42.10  nameserver=172.18.42.11

Comment 8 Tim Rozet 2021-04-01 21:11:10 UTC
Thanks Andrew. Then the question now is for the NM team. Beniamino is this bug on the NM side? Regardless we can implement a work around (even though it's not great) to modify the users previous connections to use the device during our ovs-configuration service in the meantime.

Comment 9 Beniamino Galvani 2021-04-02 10:14:55 UTC
> Thanks Andrew. Then the question now is for the NM team. Beniamino
> is this bug on the NM side? Regardless we can implement a work
> around (even though it's not great) to modify the users previous
> connections to use the device during our ovs-configuration service
> in the meantime.

I not sure this can considered a bug. There is a kernel command line
containing:

  bond=bond0:eno1,eno2,... ip=172.18.0.72::...:bond0

and the initrd generator creates 3 connections: a bond connection
('bond0') and 2 ethernet connection referencing the UUID of the first
connection as master.

After boot, a new bond connection ('ovs-if-phys0') is added for bond0
with higher autoconnect priority. The expectation is that eno1 and
eno2 will automatically use it as master instead of the bond
connection created in the initrd; however this doesn't happen because
eno1 and eno2 reference the master connection by UUID.

A possible solution is of course to change the initrd generator to
specify the master by interface name. I suppose	a reason in favor of
using the UUID is to avoid possible collision with pre-existing
connections, in case some connection files are included in the
initrd. So it's not clear that using the interface name is what we
want.


Tim, I have a question about this setup. Why is there the need to
create a new bond0 connection? The existing bond from command line
also has an IP address, while the new bond connection doesn't. Why not
reusing the existing bond connection, changing the master?

Or maybe, a better question is, how is this supposed to work? Are the
configuration scripts expecting that the user creates a 'bond0' with
slaves? And then it 'steals' them and assigns them to the bond under
the ovs-port?

Comment 10 Tim Rozet 2021-04-07 14:59:21 UTC
# Tim, I have a question about this setup. Why is there the need to
# create a new bond0 connection? The existing bond from command line
# also has an IP address, while the new bond connection doesn't. Why not
# reusing the existing bond connection, changing the master?

The main reason is we do not want to mess with the user's previously existing connections. IIRC I also ran into problems initially with trying to set the port to be slave of OVS type. There were some quirks with the ordering of the commands when trying to create the OVS bridge, ovs port, and ovs interface.

# Or maybe, a better question is, how is this supposed to work? Are the
# configuration scripts expecting that the user creates a 'bond0' with
# slaves? And then it 'steals' them and assigns them to the bond under
# the ovs-port?

Yeah the script runs after NM during OCP bring up. So the expectation there is NM should have at least 1 connection with a default gateway route. We look for what device has that default gateway, then expect an NM connection for it. At that point we steal the device from the connection and setup new OVS specific connections for our use case. The nice part about this is, someone can run a "clean" type of argument to the script which will blow away all of the OVS connections and restore the original connections. This is useful when migrating from one version SDN for OCP to another.

We try not to modify any of the connections a user has created. The initial bond for example may be created with kernel args, or it may be created with ifconfig scripts as I mentioned before. It may also be created with NM keyfiles directly. We try not to make assumptions about how the NM connection gets created or touch the existing connections. However, since this is not considered an NM bug it looks like we may have to break that rule and check to see if slave connections exist and if so, set them to use the device as the connection.master.

Since this doesn't seem like a bug in NM, shall we close this one?

Comment 11 Thomas Haller 2021-04-09 13:31:15 UTC
I agree with Beniamino that this seems as desired behavior in NetworkManager.


For "connection.master", a profile can have either the UUID of another profile or the interface name. In general, when a slave profile activates it will require the master to activate too. If that is referred by UUID, then that is what happens. Here it does not matter whether the slave profile autoactivates or is activated explicitly by somebody. Also, autoconnect-priority does not matter, because that only affects which profile is selected to autoconnect (and that only matters, if there are multiple candidates for the same interface).


The first suggestion is: don't configure NetworkManager this way, if you don't want that.


But that is not that simple here, because this configuration was created by nm-initrd-generator, in response to the kernel command line. I still think that the current behavior of nm-initrd-generator is correct, because when nm-initrd-generator creates a set of profiles that refer to each other, it really means *that* profile (by UUID), and not and that matches by interface name.



That makes it hard for OCP, which is a tool that needs to understand what is currently configured, and adjust the configuration to OCPs liking.


If the problem is really limited to autoconnect, then maybe OCP could simply block those profiles from autoconnect. It's a bit cumbersome, but something like:

  for P in $(nmcli -g TYPE,DBUS-PATH connection | sed -n 's/^802-3-ethernet://p'); do

      # filter out the wrong profiles... (just an example, not correct
      # check.
      nmcli -g connection.master connection show path "$P" | \
          grep -q "$WRONG_MASTER" || \ 
          continue

      busctl \
          call \
          org.freedesktop.NetworkManager \
          "$P" \
          org.freedesktop.NetworkManager.Settings.Connection \
          Update2 \
          'a{sa{sv}}ua{sv}' 0 0x20 0
  done

Comment 12 Till Maas 2021-04-13 16:18:34 UTC
(In reply to Tim Rozet from comment #10)

> Since this doesn't seem like a bug in NM, shall we close this one?

yes, let's close it. Please reach out if you need further assistance.


Note You need to log in before you can comment on or make changes to this bug.