Bug 2130287

Summary: ports can be left attached when controller dependency fails early
Product: Red Hat Enterprise Linux 9 Reporter: Thomas Haller <thaller>
Component: NetworkManagerAssignee: Thomas Haller <thaller>
Status: CLOSED ERRATA QA Contact: Matej Berezny <mberezny>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.1CC: bgalvani, lrintel, rkhan, sfaye, sukulkar, till, vbenes
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: NetworkManager-1.41.3-1.el9 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 08:17:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logfile showing the issue none

Description Thomas Haller 2022-09-27 17:46:54 UTC
Created attachment 1914638 [details]
logfile showing the issue

See attached logfile.


that was th/mlag-bonding-slb branch (on top of current `main`, 3871c670ab9417fc54d3c0450e91e08ced4a98b4).


First, we create a bond profile + 5 port profiles. Then, the bond gets activated with autoconnect-slaves on. During "ip-config" state something happens and, and 

            _LOGD(LOGD_BOND, "balance-slb: failed");
            nm_device_state_changed(NM_DEVICE(self),
                                    NM_DEVICE_STATE_FAILED,
                                    NM_DEVICE_STATE_REASON_CONFIG_FAILED);

gets called:

<info>  [1664299540.9661] device (bond0): state change: secondaries -> failed (reason 'config-failed', sys-iface-state: 'managed')

The first time, we are already in state "secondaries". Consequently we see 

<trace> [1664299540.9667] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (enslaved) (configure)

and all the port profiles get correctly deactivated.


Later, try the same again. This time:

<info>  [1664299566.1065] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<trace> [1664299566.1073] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (not enslaved) (configure)


the result is that the devices linger indefinitely in ip-config state and don't get wrapped up, although the controller is gone.

Comment 2 Thomas Haller 2022-09-29 13:05:28 UTC
(In reply to Thomas Haller from comment #1)
> patch on review at
> https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/
> merge_requests/1385

No proposed fix on MR https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1406

Seems this was a regression introduced in 1.40 by commit 1fe8166fc9fb93dc64992325e31e7611725aaeb2.

Comment 7 errata-xmlrpc 2023-05-09 08:17:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2485