Bug 1439220

Summary: nm.nm_get_all_settings() rises unhandled exception Glib.Error: g-dbus-error-quark: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface 'org.freedesktop.NetworkManager.Settings.Connection' on object at path /org /freedesktop/NetworkMana
Product: Red Hat Enterprise Linux 7 Reporter: Radek Vykydal <rvykydal>
Component: anacondaAssignee: Radek Vykydal <rvykydal>
Status: CLOSED DUPLICATE QA Contact: Release Test Team <release-test-team-automation>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: anaconda-maint-list, bgalvani, bugproxy, extras-qa, g.kaviyarasu, hannsj_uhl, jonathan, jstodola, mkolman, normand, rvykydal, sbueno, thaller, vanmeeuwen+fedora, vponcova
Target Milestone: rcKeywords: Reopened
Target Release: 7.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1439051
: 1443878 (view as bug list) Environment:
Last Closed: 2017-06-17 19:49:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1299988, 1443878    
Attachments:
Description Flags
logs with new NetworkManager (version 1.8.0-0.4.rc1.el7)
none
logs with new NetworkManager (version 1.8.0-0.2.git20170215.1d40c5f4.el7) none

Description Radek Vykydal 2017-04-05 12:55:04 UTC
+++ This bug was initially created as a clone of Bug #1439051 +++

Anaconda hang with GDBus.Error ...UnknownMethod for PowerPC F26 ks install

for QA ks tests (like QA:Testcase_kickstart_firewall_configured or QA:Testcase_kickstart_firewall_disabled ...)


The problem started with f26 compose 20170403 (no failure with 20170331)
The same problem is still alive with 20170404

===
anaconda 26.21.1-1 for Fedora 26 (pre-release) started.
 * installation log files are stored in /tmp during the installation
 * shell is available on TTY2
 * when reporting a bug add logs from /tmp as separarate Text/plain attachments
Traceback (most recent call last):
  File "/sbin/anaconda", line 582, in <module>
    networkInitialize(ksdata)
  File "/usr/lib64/python3.6/site-packages/pyanaconda/network.py", line 1353, in networkInitialize
    logIfcfgFiles(msg)
  File "/usr/lib64/python3.6/site-packages/pyanaconda/network.py", line 239, in logIfcfgFiles
    ifcfglog.debug("all settings: %s", nm.nm_get_all_settings())
  File "/usr/lib64/python3.6/site-packages/pyanaconda/nm.py", line 760, in nm_get_all_settings
    settings = proxy.GetSettings()
  File "/usr/lib64/python3.6/site-packages/gi/overrides/Gio.py", line 172, in __call__
    None)
Glib.Error: g-dbus-error-quark: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface 'org.freedesktop.NetworkManager.Settings.Connection' on object at path /org
/freedesktop/NetworkManager/Settings/2 (19)
===

--- Additional comment from Michel Normand on 2017-04-05 03:02:49 EDT ---

seems to be a timing problem, because if test triggered again the day after then there is no more such hang situation.

--- Additional comment from Michel Normand on 2017-04-05 03:12 EDT ---



--- Additional comment from Michel Normand on 2017-04-05 03:14 EDT ---



--- Additional comment from Michel Normand on 2017-04-05 03:15 EDT ---



--- Additional comment from Michel Normand on 2017-04-05 03:18 EDT ---



--- Additional comment from Michel Normand on 2017-04-05 03:20 EDT ---



--- Additional comment from Radek Vykydal on 2017-04-05 03:26:25 EDT ---

We are seeing the traceback also in current rhel7-branch. It seems we need to handle connections which become invalid (by some NM event) between obtaining the connection path and trying to read the settings of the connection for the path.

Comment 2 Radek Vykydal 2017-04-05 13:06:18 UTC
I can hit the bug also with rhel 7.4 server nightly compose RHEL-7.4-20170403.n.0 with patch (preventing deadlocks from bug 1436605)
https://github.com/rhinstaller/anaconda/commit/df62c0643a9d0526681db0de4c8dd97abdb0a277
applied locally (the patch should be in the next nightly compose in anaconda-21.48.22.107-1).

It seems to affect only kickstart installation with network configured in %pre section.

Comment 3 Radek Vykydal 2017-04-05 14:04:39 UTC
Seems to be caused by this change: bug 1394579 (NM 1:1.8.0-0.4.rc1)

Although the unhandled exception should be fixed in anaconda, I think the change could affect quite a few installation network configuration use cases regarding taking over connections created in intiramfs and matching them with connections/ifcfgs created in initramfs.
I have four kickstart tests failing after the change which I am going to look into.

Comment 4 Radek Vykydal 2017-04-06 11:39:14 UTC
Created attachment 1269354 [details]
logs with new NetworkManager (version 1.8.0-0.4.rc1.el7)

Merged and filtered anaconda.log, syslog and ifcfg.log from the failing case.

Comment 5 Radek Vykydal 2017-04-06 11:41:03 UTC
Created attachment 1269355 [details]
logs with new NetworkManager (version 1.8.0-0.2.git20170215.1d40c5f4.el7)

Merged and filtered anaconda.log, syslog and ifcfg.log, working case before NM update.

Comment 6 Radek Vykydal 2017-04-06 11:52:28 UTC
(In reply to Radek Vykydal from comment #3)
> Seems to be caused by this change: bug 1394579 (NM 1:1.8.0-0.4.rc1)
> 
> Although the unhandled exception should be fixed in anaconda, I think the
> change could affect quite a few installation network configuration use cases
> regarding taking over connections created in intiramfs and matching them
> with connections/ifcfgs created in initramfs.
> I have four kickstart tests failing after the change which I am going to
> look into.

- Before the change in bug 1394579, NM tried to match the configuration activated in initramfs with the ifcfg file created in initramfs and use its connection (from comment #4 log)

08:01:57,546      INFO NetworkManager:<info>  [1490860917.3279] manager: (ens3): found matching connection 'ens3'

and activated the device with the connection.

- After the change NM just generates new connection for ens3 (from comment #5 log)

06:34:19,015      DEBUG NetworkManager:<debug> [1491374058.8261] manager: (ens3): generated connection 'ens3'

and activates ens3 with this connection.

It breaks our assumption in Anaconda of having just single connection for an ethernet device.

Thomas, is it working as expected? Was the matching of existing connections dropped, or can we still have it working in installer?

Comment 7 Radek Vykydal 2017-04-10 10:34:41 UTC
Thomas, I am considering patching anaconda so that if it sees multiple connections for a (non-slave) device it will activate the ifcfg one (after NM activates the generated one). Perhaps getting rid of matching in installer would be better as it has been quite fragile (lately bug 1433891).

Or do you think trying to preconfigure NetworkManager.state somewhere between initramfs and NM starting after switchroot in installer, so NM tries to match the connections from initramfs (or have a config way to tell NM to do so) would be better?

Comment 8 Thomas Haller 2017-04-13 10:49:43 UTC
Yes, this was (a mostly) intentional change in behavior.


DETAILS:
========

Previously, when NM starts and it sees that an interface is already up (e.g. has an IP configuration), it would:
  - generate a connection according to what is configured on the interface
  - then perform some fuzzy matching of the generated connections against the
    connections that is already has (from disk).
  - if it found a matching candidate, it would drop the generated connection and 
    proceed with the candidate. Otherwise, it would add the generated 
    connection as in-memory connection.
  - then, it would proceed to "assume" the connection, which meant both of
     a) not touching the device -- as it is managed by somebody else (e.g.
        virbr0 or docker's veth).
     b) gracefully taking over the device and managing it. For example, if you 
        had NM running and managing eth0, then `systemctl restart 
        NetworkManager` would leave eth0 up. After restart, NM should take
        over eth0 again -- gracefully.

a) and b) are conflicting aims, and the code didn't strictly separate which path to follow.
But the real issue was, that it chose a) or b) based on some fuzzy matching. bug 1394579 is a first step to improve here.


Now, when NM exits, it stores which interfaces were managed (and which connection was active on the interface) in /var/run/NetworkManager/devices/.
When NM starts and it encounters an interface that is already configured, it looks into /var/run whether the interface was managed previously. This way, it decided between a) and b) -- and lean towards the latter.
Also, for behavior b), we now never pretend that an existing connection is active. Instead, it always uses the generated in-memory connection.
For a), instead we would (only) choose a previously existing connection.

Note that /var/run is only populate after a restart of NetworkManager, not during the first start on boot.

<<<<<


Clearly this is a change in behavior, which is painful. But I think the new behavior makes more sense and solves real problems.


You could fix the situation by explicitly activating the connection from disk. That will go through a down-up cycle and the generated connection will be deleted.


A follow up feature might be for the user to be able to pre-tell NM which connection to assume. E.g. the interface could have a udev value
  NM_ASSUME_CONNECTION=cd3844b2-8ca6-4fff-96a2-1cb09c357537
which would tell NM to ignore the state file and try to assume said connection.
It's however unclear, whether that would be a convenient solution for anaconda.


Alternatively, could anaconda copy over the statefiles from /var/run/NetworkManager? It seems, if you restart NM, you don't want to loose the state.
The state file is private API, but you don't have to care about it's contents.

Comment 9 Radek Vykydal 2017-04-18 14:08:56 UTC
Thank you for the explanation.

I think we will go with explicitly activating the connection from disk during networking initialization in anaconda (happens early after switch root, but after NM is started and generated connections are created and activated [1]).

One thing to be careful about here is not corrupting nfs mounted resources (eg installer image) from initramfs. Is reactivating the device with connection from disk after it has been activated with generated connection by NM more dangerous in this respect? I wouldn't think so and according to my testing nfs mounted installer image survives both activation of generated connection and subsequent activation of from-disk connection.

[1] although we don't have any real guard waiting for NM having its job done which is prone to race conditions but unrelated to this issue.

Comment 10 Thomas Haller 2017-04-18 14:41:46 UTC
Hm, rethinking again.

an explicit activation will deactivate the generated connection, delete the generated in-memory connection, and activate the new connection. That indeed for a short time removes the IP address, and afterwards doing DHCP again... it's not clear that the same IP address gets obtained -- which would be fatal for nfs.

Maybe the proper solution is really to copy the entire state /var/run/NetworkManager before restarting NetworkManager. Is that possible?

Comment 11 Radek Vykydal 2017-04-19 09:27:04 UTC
(In reply to Thomas Haller from comment #10)

> Maybe the proper solution is really to copy the entire state
> /var/run/NetworkManager before restarting NetworkManager. Is that possible?

Well the thing is that we are not restarting NetworkManager, during installation NM just starts for the first time after switch root with existing connections created in dracut/initramfs, so there is no previous state to copy.

Comment 12 Thomas Haller 2017-04-19 13:33:01 UTC
I see.

I think we should fix NM so that the heuristic of choosing between a) and b) behavior is closer to the previous behavior. Then NM should detect that the existing file is a candidate to assume.

Nothing to do for anaconda then.

Radek, please reassign this bug to NetworkManager or clone it (if you need this bug for your own purpose).

Comment 13 Beniamino Galvani 2017-04-20 07:16:41 UTC
> config: add first_start paramter to NMConfig to detect restart

+ * based on the existance of NM_CONFIG_DEVICE_STATE_DIR directory. */

existance -> existence


> device: assume matching connections during first start

+ g_qsort_with_data (connections, len, sizeof (connections[0]), nm_settings_connection_cmp_timestamp_p_with_data, NULL);

I'd prefer if we split long lines.

Otherwise, looks good.

Comment 14 Radek Vykydal 2017-04-20 08:22:57 UTC
(In reply to Thomas Haller from comment #12)
> I see.
> 
> I think we should fix NM so that the heuristic of choosing between a) and b)
> behavior is closer to the previous behavior. Then NM should detect that the
> existing file is a candidate to assume.
> 
> Nothing to do for anaconda then.
> 
> Radek, please reassign this bug to NetworkManager or clone it (if you need
> this bug for your own purpose).

Cloned bug 1443878. I am keeping this anaconda BZ because working on it I discovered some issues that would be good to handle in anaconda anyway:

- failing matching (bug 1433891)
- using correct connenction in UI (connection taken over from initramfs vs ifcfg connection configured, but not activated, from kickstart)

Comment 15 Radek Vykydal 2017-04-24 13:54:56 UTC
(In reply to Radek Vykydal from comment #14)

> Cloned bug 1443878. I am keeping this anaconda BZ because working on it I
> discovered some issues that would be good to handle in anaconda anyway:
> 
> - failing matching (bug 1433891)

> - using correct connenction in UI (connection taken over from initramfs vs
> ifcfg connection configured, but not activated, from kickstart)
filed bug 1444887 for this issue

Having these tracked in separate BZs I think we can close this one as a duplicate of the NetworkManager bug.

Comment 16 Radek Vykydal 2017-04-25 11:45:11 UTC

*** This bug has been marked as a duplicate of bug 1443878 ***

Comment 17 Fedora Update System 2017-06-13 16:25:12 UTC
anaconda-26.21.8-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-f8d6a1f05a

Comment 18 Fedora Update System 2017-06-15 13:57:47 UTC
anaconda-26.21.8-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-f8d6a1f05a

Comment 19 Jan Stodola 2017-06-15 14:36:37 UTC

*** This bug has been marked as a duplicate of bug 1443878 ***

Comment 20 Fedora Update System 2017-06-16 12:17:04 UTC
anaconda-26.21.9-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-f8d6a1f05a

Comment 21 Fedora Update System 2017-06-16 22:22:57 UTC
anaconda-26.21.9-1.fc26, pykickstart-2.35-1.fc26, python-blivet-2.1.9-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-f8d6a1f05a

Comment 22 Jan Stodola 2017-06-17 19:49:28 UTC

*** This bug has been marked as a duplicate of bug 1443878 ***