Bug 1542366

Summary: [NMCI][abrt] [faf] NetworkManager: raise(): /usr/sbin/NetworkManager killed by 6
Product: Red Hat Enterprise Linux 7 Reporter: Vladimir Benes <vbenes>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.5CC: atragler, bgalvani, fgiudici, lrintel, rkhan, sukulkar, thaller
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/f7313d168af869b8a30e52942b94550b9182964a/
Whiteboard:
Fixed In Version: NetworkManager-1.16.2-0.1.20190328git8ec0954570.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 13:16:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vladimir Benes 2018-02-06 08:11:42 UTC
This bug has been created based on an anonymous crash report requested by the package maintainer.

Report URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/f7313d168af869b8a30e52942b94550b9182964a/

Comment 2 Beniamino Galvani 2018-02-06 14:37:18 UTC
(note to self)

The problem is that platform_link_cb() sometimes receives the link
add/remove event out of order, i.e.:

 ip l add dummy0 type dummy
 ip l del dummy0
 ip l add dummy0 type dummy

 > link added  : 1
 > link added  : 2
 > link removed: 1

Therefore, we update the ifindex of the existing device instead of
creating a new one; and since the DNS manager tracks configurations by
ifindex, we fail to remove the old configuration, leaving a stale
entry in the DNS manager.

I think there's a bug in the platform code, investigating...

This is reproducible by running the repro_1512316.sh script from CI
tests and then adding a new connection with a DNS server set.

Comment 3 Beniamino Galvani 2018-08-03 15:29:42 UTC
Pushed branch bg/rh1542366 for review.

Comment 4 Thomas Haller 2018-08-06 07:27:34 UTC
(In reply to Beniamino Galvani from comment #3)
> Pushed branch bg/rh1542366 for review.

platform: fix typo
        ^^ "platform/trivial:"


> manager: don't update ifindex of existing devices

+              if (nm_device_get_ifindex (candidate) <= 0)
                    nm_device_update_from_platform_link (candidate, plink);

why does that happen? That seems wrong in the first place.

the only valid way to associate an NMDevice with a platform link, is via the ifindex. Meaning: a NMDevice refers to a particular platform link iff the ifindex matches. Why would we have realized devices that have no ifindex (aside purely virtual interfaces, like NMDeviceOvsBridge).

Why does

     if (nm_manager_get_device_by_ifindex (self, ifindex))
          return;

above not already hit?

I question in general, how unrealized devices are oddly tied to an ifname. that is, we create unrealized devices per ifname, and later, when realizing them, we match the NMDevice with the platform link based on the ifname. I find that in many ways wrong. It is wrong, because ifname can be changed.

Comment 5 Beniamino Galvani 2018-08-21 07:49:41 UTC
(In reply to Thomas Haller from comment #4)
> (In reply to Beniamino Galvani from comment #3)
> > Pushed branch bg/rh1542366 for review.
>
> platform: fix typo
>         ^^ "platform/trivial:"
>
>
> > manager: don't update ifindex of existing devices
>
> +              if (nm_device_get_ifindex (candidate) <= 0)
>                     nm_device_update_from_platform_link (candidate, plink);
>
> why does that happen? That seems wrong in the first place.
>
> the only valid way to associate an NMDevice with a platform link, is via the
> ifindex. Meaning: a NMDevice refers to a particular platform link iff the
> ifindex matches. Why would we have realized devices that have no ifindex
> (aside purely virtual interfaces, like NMDeviceOvsBridge).

When we reach that code:

1) plink->ifindex is not associated to any device, because this didn't hit:

        if (nm_manager_get_device_by_ifindex (self, ifindex))
		return;

2) and the link name matches the device name.

Previously, we only did:

      if (nm_device_is_real (candidate)) {
            /* Ignore the link added event since there's already a realized
             * device with the link's name.
             */
            nm_device_update_from_platform_link (candidate, plink);
            return;

and we updated the ifindex of the interface. This was added by commit

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d7f7725ae8c965756902bd492bf2cf0834319548

and is needed because ovs devices are realized without an initial
ifindex (also ppp devices do the same). I think this is correct in
principle, but it doesn't check whether the device's ifindex is
already != 0.

This happens for example when we have an eth0 device with ifindex 1,
and then a new link named eth0 with ifindex 2 appears. This might look
impossible, but it actually happens when we have to rebuild the
platform cache after a -ENOBUFS error.

> Why does
>
>      if (nm_manager_get_device_by_ifindex (self, ifindex))
>           return;
>
> above not already hit?

See above, the new link has a different ifindex.

> I question in general, how unrealized devices are oddly tied to an ifname.
> that is, we create unrealized devices per ifname, and later, when realizing
> them, we match the NMDevice with the platform link based on the ifname. I
> find that in many ways wrong. It is wrong, because ifname can be changed.

Yes, this should be improved.

Comment 6 Thomas Haller 2018-09-05 13:35:58 UTC
branch LGTM

Comment 10 Vladimir Benes 2019-04-05 08:25:57 UTC
last occurrence was  2019-02-16

Comment 11 Beniamino Galvani 2019-06-20 08:56:06 UTC
*** Bug 1567842 has been marked as a duplicate of this bug. ***

Comment 13 errata-xmlrpc 2019-08-06 13:16:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2302