Bug 1542366
Summary: | [NMCI][abrt] [faf] NetworkManager: raise(): /usr/sbin/NetworkManager killed by 6 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Vladimir Benes <vbenes> |
Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> |
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.5 | CC: | atragler, bgalvani, fgiudici, lrintel, rkhan, sukulkar, thaller |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
URL: | http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/f7313d168af869b8a30e52942b94550b9182964a/ | ||
Whiteboard: | |||
Fixed In Version: | NetworkManager-1.16.2-0.1.20190328git8ec0954570.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-06 13:16:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Vladimir Benes
2018-02-06 08:11:42 UTC
(note to self) The problem is that platform_link_cb() sometimes receives the link add/remove event out of order, i.e.: ip l add dummy0 type dummy ip l del dummy0 ip l add dummy0 type dummy > link added : 1 > link added : 2 > link removed: 1 Therefore, we update the ifindex of the existing device instead of creating a new one; and since the DNS manager tracks configurations by ifindex, we fail to remove the old configuration, leaving a stale entry in the DNS manager. I think there's a bug in the platform code, investigating... This is reproducible by running the repro_1512316.sh script from CI tests and then adding a new connection with a DNS server set. Pushed branch bg/rh1542366 for review. (In reply to Beniamino Galvani from comment #3) > Pushed branch bg/rh1542366 for review. platform: fix typo ^^ "platform/trivial:" > manager: don't update ifindex of existing devices + if (nm_device_get_ifindex (candidate) <= 0) nm_device_update_from_platform_link (candidate, plink); why does that happen? That seems wrong in the first place. the only valid way to associate an NMDevice with a platform link, is via the ifindex. Meaning: a NMDevice refers to a particular platform link iff the ifindex matches. Why would we have realized devices that have no ifindex (aside purely virtual interfaces, like NMDeviceOvsBridge). Why does if (nm_manager_get_device_by_ifindex (self, ifindex)) return; above not already hit? I question in general, how unrealized devices are oddly tied to an ifname. that is, we create unrealized devices per ifname, and later, when realizing them, we match the NMDevice with the platform link based on the ifname. I find that in many ways wrong. It is wrong, because ifname can be changed. (In reply to Thomas Haller from comment #4) > (In reply to Beniamino Galvani from comment #3) > > Pushed branch bg/rh1542366 for review. > > platform: fix typo > ^^ "platform/trivial:" > > > > manager: don't update ifindex of existing devices > > + if (nm_device_get_ifindex (candidate) <= 0) > nm_device_update_from_platform_link (candidate, plink); > > why does that happen? That seems wrong in the first place. > > the only valid way to associate an NMDevice with a platform link, is via the > ifindex. Meaning: a NMDevice refers to a particular platform link iff the > ifindex matches. Why would we have realized devices that have no ifindex > (aside purely virtual interfaces, like NMDeviceOvsBridge). When we reach that code: 1) plink->ifindex is not associated to any device, because this didn't hit: if (nm_manager_get_device_by_ifindex (self, ifindex)) return; 2) and the link name matches the device name. Previously, we only did: if (nm_device_is_real (candidate)) { /* Ignore the link added event since there's already a realized * device with the link's name. */ nm_device_update_from_platform_link (candidate, plink); return; and we updated the ifindex of the interface. This was added by commit https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d7f7725ae8c965756902bd492bf2cf0834319548 and is needed because ovs devices are realized without an initial ifindex (also ppp devices do the same). I think this is correct in principle, but it doesn't check whether the device's ifindex is already != 0. This happens for example when we have an eth0 device with ifindex 1, and then a new link named eth0 with ifindex 2 appears. This might look impossible, but it actually happens when we have to rebuild the platform cache after a -ENOBUFS error. > Why does > > if (nm_manager_get_device_by_ifindex (self, ifindex)) > return; > > above not already hit? See above, the new link has a different ifindex. > I question in general, how unrealized devices are oddly tied to an ifname. > that is, we create unrealized devices per ifname, and later, when realizing > them, we match the NMDevice with the platform link based on the ifname. I > find that in many ways wrong. It is wrong, because ifname can be changed. Yes, this should be improved. branch LGTM Branch merged upstream: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=c7c989804fc478e724c43ca956c158079ee79c25 last occurrence was 2019-02-16 *** Bug 1567842 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2302 |