Bug 1394579 - improve handling of unmanaged/assumed devices
Summary: improve handling of unmanaged/assumed devices
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager
Version: 7.4
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: rc
: ---
Assignee: Thomas Haller
QA Contact: Desktop QE
Ioanna Gkioka
URL:
Whiteboard:
: 1400411 (view as bug list)
Depends On:
Blocks: 1393481 1428406
TreeView+ depends on / blocked
 
Reported: 2016-11-13 22:19 UTC by Thomas Haller
Modified: 2017-08-01 09:19 UTC (History)
13 users (show)

Fixed In Version: NetworkManager-1.8.0-0.4.rc1.el7
Doc Type: Enhancement
Doc Text:
*NetworkManager* now better handles devices state With this update, *NetworkManager* now maintains the state of devices after the service restart and takes over interfaces which are set into managed mode during restart. In addition, *NetworkManager* can handle devices which are not explicitly set as unmanaged but controlled manually by the user or another network service.
Clone Of:
: 1428406 (view as bug list)
Environment:
Last Closed: 2017-08-01 09:19:37 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2299 normal SHIPPED_LIVE Moderate: NetworkManager and libnl3 security, bug fix and enhancement update 2017-08-01 12:40:28 UTC
GNOME Bugzilla 746440 None None None 2019-06-24 16:12:23 UTC
Red Hat Bugzilla 1439220 None CLOSED nm.nm_get_all_settings() rises unhandled exception Glib.Error: g-dbus-error-quark: GDBus.Error:org.freedesktop.DBus.Erro... 2019-09-13 08:22:51 UTC
Red Hat Bugzilla 1443878 None CLOSED changes in NM assuming of devices causing regressions in Anaconda 2019-09-13 08:22:51 UTC

Internal Links: 1439220 1443878

Description Thomas Haller 2016-11-13 22:19:34 UTC
On start/restart, NetworkManager "assumes" a connection on the device.

That is just wrong and causes many issues.

This should be fixed, but is a large effort -- and changing previous behavior.

For details, see upstream bug https://bugzilla.gnome.org/show_bug.cgi?id=746440

Comment 1 Thomas Haller 2016-12-01 10:35:08 UTC
*** Bug 1400411 has been marked as a duplicate of this bug. ***

Comment 2 Thomas Haller 2017-03-16 17:39:56 UTC
merged https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=2d1b85f8d7f1e53b581e56f0f542b63e8a80da98 upstream.

This may not yet be the full solution, I think we should separate externally managed devices better and making assuming connections more flexible. But it's another step, and it's all that can be done withint rhel-7.4 time frame.

And it *does* "improve handling of unmanaged/assumed devices" already.

Comment 4 Thomas Haller 2017-03-17 14:40:16 UTC
The change touches areas that were not well defined in NetworkManager or where it would not behave optimally. A lot of that was not properly covered by tests, so as a base requirement, I would be already happy if all old tests succeeds (or get adjusted to what we identify as new, desired, improved behavior).


What matters is start (the first time) and restart of NetworkManager.

Previously, when NM finds an already configured interface, it would try to "assume" a connection. It did so for external devices (virbr0) and for devices that are taken over after a restart.
Now, there is a clear distinction between
  "external" (like virbr0). NM now would always create a new in-memory 
    connection and pretend that to be active. It's important that in that mode, 
    NM would not touch the interface at all.
  "assumed" this means, to gracefully take over an already configured interface.
    Currently, that only works after a restart (not start first time) where NM
    would write to /var/run/NetworkManager/devices/<IFINDEX> which connection to
    assume. It would then try to assume that connection, or fallback to 
    "external". "assume" means to gracefully take over device. After the 
    "assumed" activation reaches ACTIVATED state, it becomes identical
    to "managed". The "assumed" distinction only matters initially during 
    activation.
  "managed": if NM cannot assume|external the device, it manages it. Meaning:
    it will try to autoconnect an existing connection. This especially happens 
    when the device has no IP configuration yet so that "external" doesn't 
    apply.

So, it's all about starting/restarting NM. A first-start is different from a restart in that there is no state in /var/run/NetworkManager directory. Simulate first-start by removing that directory before starting NM.


Interesting tests are:

 - have an external interface and start NM. See that the device is in "external" 
   mode and not touched by NM. We already have tests for that. E.g. no DHCP for
   this interface, addresses/routes are preserved.
 - have a device managed by NM and restart NM. See that the connection gets
   assumed (non-destructively) and is afterwards fully managed by NM (e.g. DHCP 
   leases get extended).
 - it gets more interesting when starting with nested slave/master hierarchies 
   (bond/vlan/bridge/team). When all interfaces are "external", we would expect
   that NM activates external connections on all of them and does not touch the
   interfaces at all. If they were all activated in a previous run, we would
   expect that NM assumes them all and managed them all full.
   More complicated it gets when a the decision external/assumed/managed differs
   between master/slaves. I suspect there are bugs in this regard that we have
   to figure out.
 - what happens when setting an interfaces
   `nmcli device set $IF managed yes|no`? Does that work as one would expect?
   Also, we now persist the managed state in /var/run/NetworkManager. That 
   means, the managed state is preserved after restart of NM (but not across 
   reboot).
   Especially interesting, what happens if you set a device as unmanaged that
   is "external"? Unclear what is even desired. See bug 1371433.
   How does it work when setting a master/slave as unmanaged?
   What happens when having a set of unmanaged master/slaves devices, and then
   managing one of them?
 - activate another connection on an "external" managed device.
 - modify the generated, in-memory connection of an external device. That causes
   the connection to be persisted. It's unclear what should happen with the 
   device. Probably we should now assume it (gracefully).



Please open new bugs for each defect you find and let's keep this as tracker bug for the individual issues.

Comment 6 Vladimir Benes 2017-06-08 12:06:39 UTC
I think we have a lot of scenario covered and also fixed so let's wait for real world usage failure as I don't see any improvements to be done now.

Comment 7 errata-xmlrpc 2017-08-01 09:19:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2299


Note You need to log in before you can comment on or make changes to this bug.