Bug 1439220
Summary: | nm.nm_get_all_settings() rises unhandled exception Glib.Error: g-dbus-error-quark: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface 'org.freedesktop.NetworkManager.Settings.Connection' on object at path /org /freedesktop/NetworkMana | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Radek Vykydal <rvykydal> | ||||||
Component: | anaconda | Assignee: | Radek Vykydal <rvykydal> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Release Test Team <release-test-team-automation> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 7.3 | CC: | anaconda-maint-list, bgalvani, bugproxy, extras-qa, g.kaviyarasu, hannsj_uhl, jonathan, jstodola, mkolman, normand, rvykydal, sbueno, thaller, vanmeeuwen+fedora, vponcova | ||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||
Target Release: | 7.4 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | 1439051 | ||||||||
: | 1443878 (view as bug list) | Environment: | |||||||
Last Closed: | 2017-06-17 19:49:28 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1299988, 1443878 | ||||||||
Attachments: |
|
Description
Radek Vykydal
2017-04-05 12:55:04 UTC
I can hit the bug also with rhel 7.4 server nightly compose RHEL-7.4-20170403.n.0 with patch (preventing deadlocks from bug 1436605) https://github.com/rhinstaller/anaconda/commit/df62c0643a9d0526681db0de4c8dd97abdb0a277 applied locally (the patch should be in the next nightly compose in anaconda-21.48.22.107-1). It seems to affect only kickstart installation with network configured in %pre section. Seems to be caused by this change: bug 1394579 (NM 1:1.8.0-0.4.rc1) Although the unhandled exception should be fixed in anaconda, I think the change could affect quite a few installation network configuration use cases regarding taking over connections created in intiramfs and matching them with connections/ifcfgs created in initramfs. I have four kickstart tests failing after the change which I am going to look into. Created attachment 1269354 [details]
logs with new NetworkManager (version 1.8.0-0.4.rc1.el7)
Merged and filtered anaconda.log, syslog and ifcfg.log from the failing case.
Created attachment 1269355 [details]
logs with new NetworkManager (version 1.8.0-0.2.git20170215.1d40c5f4.el7)
Merged and filtered anaconda.log, syslog and ifcfg.log, working case before NM update.
(In reply to Radek Vykydal from comment #3) > Seems to be caused by this change: bug 1394579 (NM 1:1.8.0-0.4.rc1) > > Although the unhandled exception should be fixed in anaconda, I think the > change could affect quite a few installation network configuration use cases > regarding taking over connections created in intiramfs and matching them > with connections/ifcfgs created in initramfs. > I have four kickstart tests failing after the change which I am going to > look into. - Before the change in bug 1394579, NM tried to match the configuration activated in initramfs with the ifcfg file created in initramfs and use its connection (from comment #4 log) 08:01:57,546 INFO NetworkManager:<info> [1490860917.3279] manager: (ens3): found matching connection 'ens3' and activated the device with the connection. - After the change NM just generates new connection for ens3 (from comment #5 log) 06:34:19,015 DEBUG NetworkManager:<debug> [1491374058.8261] manager: (ens3): generated connection 'ens3' and activates ens3 with this connection. It breaks our assumption in Anaconda of having just single connection for an ethernet device. Thomas, is it working as expected? Was the matching of existing connections dropped, or can we still have it working in installer? Thomas, I am considering patching anaconda so that if it sees multiple connections for a (non-slave) device it will activate the ifcfg one (after NM activates the generated one). Perhaps getting rid of matching in installer would be better as it has been quite fragile (lately bug 1433891). Or do you think trying to preconfigure NetworkManager.state somewhere between initramfs and NM starting after switchroot in installer, so NM tries to match the connections from initramfs (or have a config way to tell NM to do so) would be better? Yes, this was (a mostly) intentional change in behavior. DETAILS: ======== Previously, when NM starts and it sees that an interface is already up (e.g. has an IP configuration), it would: - generate a connection according to what is configured on the interface - then perform some fuzzy matching of the generated connections against the connections that is already has (from disk). - if it found a matching candidate, it would drop the generated connection and proceed with the candidate. Otherwise, it would add the generated connection as in-memory connection. - then, it would proceed to "assume" the connection, which meant both of a) not touching the device -- as it is managed by somebody else (e.g. virbr0 or docker's veth). b) gracefully taking over the device and managing it. For example, if you had NM running and managing eth0, then `systemctl restart NetworkManager` would leave eth0 up. After restart, NM should take over eth0 again -- gracefully. a) and b) are conflicting aims, and the code didn't strictly separate which path to follow. But the real issue was, that it chose a) or b) based on some fuzzy matching. bug 1394579 is a first step to improve here. Now, when NM exits, it stores which interfaces were managed (and which connection was active on the interface) in /var/run/NetworkManager/devices/. When NM starts and it encounters an interface that is already configured, it looks into /var/run whether the interface was managed previously. This way, it decided between a) and b) -- and lean towards the latter. Also, for behavior b), we now never pretend that an existing connection is active. Instead, it always uses the generated in-memory connection. For a), instead we would (only) choose a previously existing connection. Note that /var/run is only populate after a restart of NetworkManager, not during the first start on boot. <<<<< Clearly this is a change in behavior, which is painful. But I think the new behavior makes more sense and solves real problems. You could fix the situation by explicitly activating the connection from disk. That will go through a down-up cycle and the generated connection will be deleted. A follow up feature might be for the user to be able to pre-tell NM which connection to assume. E.g. the interface could have a udev value NM_ASSUME_CONNECTION=cd3844b2-8ca6-4fff-96a2-1cb09c357537 which would tell NM to ignore the state file and try to assume said connection. It's however unclear, whether that would be a convenient solution for anaconda. Alternatively, could anaconda copy over the statefiles from /var/run/NetworkManager? It seems, if you restart NM, you don't want to loose the state. The state file is private API, but you don't have to care about it's contents. Thank you for the explanation. I think we will go with explicitly activating the connection from disk during networking initialization in anaconda (happens early after switch root, but after NM is started and generated connections are created and activated [1]). One thing to be careful about here is not corrupting nfs mounted resources (eg installer image) from initramfs. Is reactivating the device with connection from disk after it has been activated with generated connection by NM more dangerous in this respect? I wouldn't think so and according to my testing nfs mounted installer image survives both activation of generated connection and subsequent activation of from-disk connection. [1] although we don't have any real guard waiting for NM having its job done which is prone to race conditions but unrelated to this issue. Hm, rethinking again. an explicit activation will deactivate the generated connection, delete the generated in-memory connection, and activate the new connection. That indeed for a short time removes the IP address, and afterwards doing DHCP again... it's not clear that the same IP address gets obtained -- which would be fatal for nfs. Maybe the proper solution is really to copy the entire state /var/run/NetworkManager before restarting NetworkManager. Is that possible? (In reply to Thomas Haller from comment #10) > Maybe the proper solution is really to copy the entire state > /var/run/NetworkManager before restarting NetworkManager. Is that possible? Well the thing is that we are not restarting NetworkManager, during installation NM just starts for the first time after switch root with existing connections created in dracut/initramfs, so there is no previous state to copy. I see. I think we should fix NM so that the heuristic of choosing between a) and b) behavior is closer to the previous behavior. Then NM should detect that the existing file is a candidate to assume. Nothing to do for anaconda then. Radek, please reassign this bug to NetworkManager or clone it (if you need this bug for your own purpose). > config: add first_start paramter to NMConfig to detect restart + * based on the existance of NM_CONFIG_DEVICE_STATE_DIR directory. */ existance -> existence > device: assume matching connections during first start + g_qsort_with_data (connections, len, sizeof (connections[0]), nm_settings_connection_cmp_timestamp_p_with_data, NULL); I'd prefer if we split long lines. Otherwise, looks good. (In reply to Thomas Haller from comment #12) > I see. > > I think we should fix NM so that the heuristic of choosing between a) and b) > behavior is closer to the previous behavior. Then NM should detect that the > existing file is a candidate to assume. > > Nothing to do for anaconda then. > > Radek, please reassign this bug to NetworkManager or clone it (if you need > this bug for your own purpose). Cloned bug 1443878. I am keeping this anaconda BZ because working on it I discovered some issues that would be good to handle in anaconda anyway: - failing matching (bug 1433891) - using correct connenction in UI (connection taken over from initramfs vs ifcfg connection configured, but not activated, from kickstart) (In reply to Radek Vykydal from comment #14) > Cloned bug 1443878. I am keeping this anaconda BZ because working on it I > discovered some issues that would be good to handle in anaconda anyway: > > - failing matching (bug 1433891) > - using correct connenction in UI (connection taken over from initramfs vs > ifcfg connection configured, but not activated, from kickstart) filed bug 1444887 for this issue Having these tracked in separate BZs I think we can close this one as a duplicate of the NetworkManager bug. *** This bug has been marked as a duplicate of bug 1443878 *** anaconda-26.21.8-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-f8d6a1f05a anaconda-26.21.8-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-f8d6a1f05a *** This bug has been marked as a duplicate of bug 1443878 *** anaconda-26.21.9-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-f8d6a1f05a anaconda-26.21.9-1.fc26, pykickstart-2.35-1.fc26, python-blivet-2.1.9-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-f8d6a1f05a *** This bug has been marked as a duplicate of bug 1443878 *** |