Bug 1667874

Summary: Bond activation fails when autoconnect is set to true (using libnm)
Product: Red Hat Enterprise Linux 8 Reporter: Edward Haas <edwardh>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: high    
Version: 8.1CC: atragler, bgalvani, fge, fgiudici, fpokryvk, lrintel, rkhan, sukulkar, thaller, till, vbenes
Target Milestone: rc   
Target Release: 8.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 22:29:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1689408, 1701002    
Attachments:
Description Flags
Journal output of bond99 failure to activate
none
Python reproducer none

Description Edward Haas 2019-01-21 10:30:30 UTC
Created attachment 1522090 [details]
Journal output of bond99 failure to activate

Description of problem:
nmstate is using libnm to configure a bond profile and activate it.
When using autoconnect=True, the bond fails activation with the following error:

error=nm-manager-error-quark: Connection 'bond99' is not available on the device bond99 at this time. (2)

Version-Release number of selected component (if applicable):
Tested on CentOS container with NM 1.12.0-8.el7_6

How reproducible:
Run nmstate integration on https://github.com/nmstate/nmstate/pull/239/commits/8eec4329a8a469b682cc7b4cca9c86392e8752aa

Steps to Reproduce:
1. 
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Beniamino Galvani 2019-04-17 08:47:28 UTC
Hi,

here the new connection gets added

  <trace> [1548051268.2500] ifcfg-rh: write: write connection bond99 (e320e547-051a-4a87-92b6-4925f29aac7c) to file "/etc/sysconfig/network-scripts/ifcfg-bond99"
  <debug> [1548051268.2503] ifcfg-rh: loading from file "/etc/sysconfig/network-scripts/ifcfg-bond99"...

Since it has autoconnect=yes, the device gets realized and a new
bond99 link is created:

  <debug> [1548051268.2519] device[0x559e5da0ae80] (bond99): unmanaged: flags set to [platform-init,!sleeping=0x10/0x11/unmanaged/unrealized], set-managed [sleeping=0x1])
  <trace> [1548051268.2519] dbus-object[0x559e5da0ae80]: export: "/org/freedesktop/NM/Devices/5"
  <info>  [1548051268.2522] manager: (bond99): new Bond device (/org/freedesktop/NM/Devices/5)
  <debug> [1548051268.2523] device[0x559e5da0ae80] (bond99): create (is nm-owned)
  <debug> [1548051268.2523] platform: link: adding link 'bond99' of type 'bond' (196609)

At this time the device is unmanaged due to "platform-init,user-conf":

  <debug> [1548051268.2580] device[0x559e5da0ae80] (bond99): unmanaged: flags set to [platform-init,user-conf,!sleeping,!loopback=0x210/0x219/unmanaged/unrealized], set-unmanaged [user-conf=0x200])
  <info>  [1548051268.2587] audit: op="connection-add" uuid="e320e547-051a-4a87-92b6-4925f29aac7c" name="bond99" pid=4316 uid=0 result="success"

and so when a user activation request comes, it fails because the
device is platform-init unmanaged (which can't be overridden by user).

  <debug> [1548051268.2637] active-connection[0x559e5d983930]: Failed to activate 'bond99': Connection 'bond99' is not available on the device bond99 at this time.

There are two problems here. First, NM should not create a link for a
software device that is declared as unmanaged in configuration. This
is the same issue as bug 1679230 and is already solved on master.

The second problem is that, even with the fix above, this race
condition can hit when a connection for a not-unmanaged software
device is added and then activated in short time. This problem can be
seen also in bug 1700528. I'm attaching a python reproducer for this.

Comment 3 Beniamino Galvani 2019-04-17 08:48:52 UTC
Created attachment 1555788 [details]
Python reproducer

Comment 4 Beniamino Galvani 2019-05-13 15:23:25 UTC
Fix on review:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/144

Comment 5 Vladimir Benes 2019-05-14 11:56:36 UTC
The issue is gone after building NM with the code here.

Comment 7 Gris Ge 2019-05-29 15:22:42 UTC
Hi Beniamino Galvani,

Can we expect this bug been fixed in any version of Fedora?

Thank you.

Comment 8 Beniamino Galvani 2019-05-29 16:03:37 UTC
I can include the fix in the next F30 update, would that be ok?

Comment 9 Gris Ge 2019-05-31 15:17:48 UTC
(In reply to Beniamino Galvani from comment #8)
> I can include the fix in the next F30 update, would that be ok?

Yes. Thanks.

Comment 12 errata-xmlrpc 2019-11-05 22:29:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3623