1394579 – improve handling of unmanaged/assumed devices

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1394579 - improve handling of unmanaged/assumed devices

Summary: improve handling of unmanaged/assumed devices

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	NetworkManager
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Thomas Haller
QA Contact:	Desktop QE
Docs Contact:	Ioanna Gkioka
URL:
Whiteboard:
Duplicates (1):	1400411 (view as bug list)
Depends On:
Blocks:	1393481 1428406
TreeView+	depends on / blocked

Reported:	2016-11-13 22:19 UTC by Thomas Haller
Modified:	2017-08-01 09:19 UTC (History)
CC List:	13 users (show)
Fixed In Version:	NetworkManager-1.8.0-0.4.rc1.el7
Doc Type:	Enhancement
Doc Text:	NetworkManager now better handles devices state With this update, NetworkManager now maintains the state of devices after the service restart and takes over interfaces which are set into managed mode during restart. In addition, NetworkManager can handle devices which are not explicitly set as unmanaged but controlled manually by the user or another network service.
Clone Of:
Clones:	1428406 (view as bug list)
Environment:
Last Closed:	2017-08-01 09:19:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
GNOME Bugzilla	746440	Normal	NEW	improve behavior for assumed and unmanaged devices, do better at seamless take over, and don't touch devices	2020-07-14 11:53:15 UTC
Red Hat Bugzilla	1439220	unspecified	CLOSED	nm.nm_get_all_settings() rises unhandled exception Glib.Error: g-dbus-error-quark: GDBus.Error:org.freedesktop.DBus.Erro...	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1443878	unspecified	CLOSED	changes in NM assuming of devices causing regressions in Anaconda	2021-03-11 15:09:25 UTC
Red Hat Product Errata	RHSA-2017:2299	normal	SHIPPED_LIVE	Moderate: NetworkManager and libnl3 security, bug fix and enhancement update	2017-08-01 12:40:28 UTC

Internal Links: 1439220 1443878

Description Thomas Haller 2016-11-13 22:19:34 UTC

On start/restart, NetworkManager "assumes" a connection on the device.

That is just wrong and causes many issues.

This should be fixed, but is a large effort -- and changing previous behavior.

For details, see upstream bug https://bugzilla.gnome.org/show_bug.cgi?id=746440

Comment 1 Thomas Haller 2016-12-01 10:35:08 UTC

*** Bug 1400411 has been marked as a duplicate of this bug. ***

Comment 2 Thomas Haller 2017-03-16 17:39:56 UTC

merged https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=2d1b85f8d7f1e53b581e56f0f542b63e8a80da98 upstream.

This may not yet be the full solution, I think we should separate externally managed devices better and making assuming connections more flexible. But it's another step, and it's all that can be done withint rhel-7.4 time frame.

And it *does* "improve handling of unmanaged/assumed devices" already.

Comment 4 Thomas Haller 2017-03-17 14:40:16 UTC

The change touches areas that were not well defined in NetworkManager or where it would not behave optimally. A lot of that was not properly covered by tests, so as a base requirement, I would be already happy if all old tests succeeds (or get adjusted to what we identify as new, desired, improved behavior).

What matters is start (the first time) and restart of NetworkManager.

Previously, when NM finds an already configured interface, it would try to "assume" a connection. It did so for external devices (virbr0) and for devices that are taken over after a restart.
Now, there is a clear distinction between
"external" (like virbr0). NM now would always create a new in-memory
connection and pretend that to be active. It's important that in that mode,
NM would not touch the interface at all.
"assumed" this means, to gracefully take over an already configured interface.
Currently, that only works after a restart (not start first time) where NM
would write to /var/run/NetworkManager/devices/<IFINDEX> which connection to
assume. It would then try to assume that connection, or fallback to
"external". "assume" means to gracefully take over device. After the
"assumed" activation reaches ACTIVATED state, it becomes identical
to "managed". The "assumed" distinction only matters initially during
activation.
"managed": if NM cannot assume|external the device, it manages it. Meaning:
it will try to autoconnect an existing connection. This especially happens
when the device has no IP configuration yet so that "external" doesn't
apply.

So, it's all about starting/restarting NM. A first-start is different from a restart in that there is no state in /var/run/NetworkManager directory. Simulate first-start by removing that directory before starting NM.

Interesting tests are:

- have an external interface and start NM. See that the device is in "external"
mode and not touched by NM. We already have tests for that. E.g. no DHCP for
this interface, addresses/routes are preserved.
- have a device managed by NM and restart NM. See that the connection gets
assumed (non-destructively) and is afterwards fully managed by NM (e.g. DHCP
leases get extended).
- it gets more interesting when starting with nested slave/master hierarchies
(bond/vlan/bridge/team). When all interfaces are "external", we would expect
that NM activates external connections on all of them and does not touch the
interfaces at all. If they were all activated in a previous run, we would
expect that NM assumes them all and managed them all full.
More complicated it gets when a the decision external/assumed/managed differs
between master/slaves. I suspect there are bugs in this regard that we have
to figure out.
- what happens when setting an interfaces
`nmcli device set $IF managed yes|no`? Does that work as one would expect?
Also, we now persist the managed state in /var/run/NetworkManager. That
means, the managed state is preserved after restart of NM (but not across
reboot).
Especially interesting, what happens if you set a device as unmanaged that
is "external"? Unclear what is even desired. See bug 1371433.
How does it work when setting a master/slave as unmanaged?
What happens when having a set of unmanaged master/slaves devices, and then
managing one of them?
- activate another connection on an "external" managed device.
- modify the generated, in-memory connection of an external device. That causes
the connection to be persisted. It's unclear what should happen with the
device. Probably we should now assume it (gracefully).

Please open new bugs for each defect you find and let's keep this as tracker bug for the individual issues.

Comment 6 Vladimir Benes 2017-06-08 12:06:39 UTC

I think we have a lot of scenario covered and also fixed so let's wait for real world usage failure as I don't see any improvements to be done now.

Comment 7 errata-xmlrpc 2017-08-01 09:19:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2299

Note You need to log in before you can comment on or make changes to this bug.