Hide Forgot
On start/restart, NetworkManager "assumes" a connection on the device. That is just wrong and causes many issues. This should be fixed, but is a large effort -- and changing previous behavior. For details, see upstream bug https://bugzilla.gnome.org/show_bug.cgi?id=746440
*** Bug 1400411 has been marked as a duplicate of this bug. ***
merged https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=2d1b85f8d7f1e53b581e56f0f542b63e8a80da98 upstream. This may not yet be the full solution, I think we should separate externally managed devices better and making assuming connections more flexible. But it's another step, and it's all that can be done withint rhel-7.4 time frame. And it *does* "improve handling of unmanaged/assumed devices" already.
The change touches areas that were not well defined in NetworkManager or where it would not behave optimally. A lot of that was not properly covered by tests, so as a base requirement, I would be already happy if all old tests succeeds (or get adjusted to what we identify as new, desired, improved behavior). What matters is start (the first time) and restart of NetworkManager. Previously, when NM finds an already configured interface, it would try to "assume" a connection. It did so for external devices (virbr0) and for devices that are taken over after a restart. Now, there is a clear distinction between "external" (like virbr0). NM now would always create a new in-memory connection and pretend that to be active. It's important that in that mode, NM would not touch the interface at all. "assumed" this means, to gracefully take over an already configured interface. Currently, that only works after a restart (not start first time) where NM would write to /var/run/NetworkManager/devices/<IFINDEX> which connection to assume. It would then try to assume that connection, or fallback to "external". "assume" means to gracefully take over device. After the "assumed" activation reaches ACTIVATED state, it becomes identical to "managed". The "assumed" distinction only matters initially during activation. "managed": if NM cannot assume|external the device, it manages it. Meaning: it will try to autoconnect an existing connection. This especially happens when the device has no IP configuration yet so that "external" doesn't apply. So, it's all about starting/restarting NM. A first-start is different from a restart in that there is no state in /var/run/NetworkManager directory. Simulate first-start by removing that directory before starting NM. Interesting tests are: - have an external interface and start NM. See that the device is in "external" mode and not touched by NM. We already have tests for that. E.g. no DHCP for this interface, addresses/routes are preserved. - have a device managed by NM and restart NM. See that the connection gets assumed (non-destructively) and is afterwards fully managed by NM (e.g. DHCP leases get extended). - it gets more interesting when starting with nested slave/master hierarchies (bond/vlan/bridge/team). When all interfaces are "external", we would expect that NM activates external connections on all of them and does not touch the interfaces at all. If they were all activated in a previous run, we would expect that NM assumes them all and managed them all full. More complicated it gets when a the decision external/assumed/managed differs between master/slaves. I suspect there are bugs in this regard that we have to figure out. - what happens when setting an interfaces `nmcli device set $IF managed yes|no`? Does that work as one would expect? Also, we now persist the managed state in /var/run/NetworkManager. That means, the managed state is preserved after restart of NM (but not across reboot). Especially interesting, what happens if you set a device as unmanaged that is "external"? Unclear what is even desired. See bug 1371433. How does it work when setting a master/slave as unmanaged? What happens when having a set of unmanaged master/slaves devices, and then managing one of them? - activate another connection on an "external" managed device. - modify the generated, in-memory connection of an external device. That causes the connection to be persisted. It's unclear what should happen with the device. Probably we should now assume it (gracefully). Please open new bugs for each defect you find and let's keep this as tracker bug for the individual issues.
I think we have a lot of scenario covered and also fixed so let's wait for real world usage failure as I don't see any improvements to be done now.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2299