Bug 1515829

Summary: [NMCI][abrt] [faf] NetworkManager: raise(): /usr/sbin/NetworkManager killed by 6
Product: Red Hat Enterprise Linux 7 Reporter: Vladimir Benes <vbenes>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: atragler, bgalvani, fgiudici, jreznik, lmiksik, lrintel, rkhan, sukulkar, thaller
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/2310a373c53fac3af9ddd9824b2ea41ac5bd8587/
Whiteboard:
Fixed In Version: NetworkManager-1.10.2-12.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 13:34:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
[PATCH 1/2] ppp: introduce SetInterface pppd plugin D-Bus method
none
[PATCH 2/2] ppp/trivial: rename field
none
[PATCH nm-1-10] ppp: don't start IPv6 configuration on the device none

Description Vladimir Benes 2017-11-21 12:55:22 UTC
This bug has been created based on an anonymous crash report requested by the package maintainer.

Report URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/2310a373c53fac3af9ddd9824b2ea41ac5bd8587/

Comment 1 Beniamino Galvani 2017-11-21 13:12:37 UTC
Detected by CI test @pppoe_over_vlan

"assertion failed: (ifindex > 0)"

#4  0x000055df21c924b6 in nm_ip6_config_capture
#5  0x000055df21d023f3 in act_stage3_ip6_config_start
#6  0x000055df21d15578 in nm_device_activate_stage3_ip6_start
#7  0x000055df21d18c69 in activate_stage3_ip_config_start
...

Comment 2 Beniamino Galvani 2017-11-27 10:18:22 UTC
Please review branch bg/ppp-rh1515829.

Comment 3 Thomas Haller 2017-11-27 10:39:58 UTC
»···if (priv->ifindex <= 0) {
»···»···priv->ifindex = nm_platform_link_get_ifindex (NM_PLATFORM_GET, priv->ip_iface);
»···»···if (priv->ifindex <= 0) {

there is a race here. Could you at least retry once with nm_platform_process_events()? Maybe the netlink message about the interface is already pending.


>> device: ppp: rename fields

device/trivial: ...


+    if (priv->iface)
+         g_assert_cmpstr (priv->iface, ==, iface);

iface comes from untrusted, you cannot assert against it.



+         if (renamed)
+              nm_manager_remove_device (nm_manager_get (), iface, NM_DEVICE_TYPE_PPP);
+
+         /* Once the device gets an ifindex, start with IP configuration */
+         nm_device_activate_schedule_stage3_ip_config_start (device);


if it got renamed, you shouldn't start stage3, should you?

Comment 4 Beniamino Galvani 2017-12-19 13:46:32 UTC
Created attachment 1370038 [details]
[PATCH 1/2] ppp: introduce SetInterface pppd plugin D-Bus method

Comment 5 Beniamino Galvani 2017-12-19 13:47:07 UTC
Created attachment 1370040 [details]
[PATCH 2/2] ppp/trivial: rename field

Comment 6 Beniamino Galvani 2017-12-19 13:48:00 UTC
(In reply to Thomas Haller from comment #3)
> »···if (priv->ifindex <= 0) {
> »···»···priv->ifindex = nm_platform_link_get_ifindex (NM_PLATFORM_GET,
> priv->ip_iface);
> »···»···if (priv->ifindex <= 0) {
> 
> there is a race here. Could you at least retry once with
> nm_platform_process_events()? Maybe the netlink message about the interface
> is already pending.

Fixed.

> iface comes from untrusted, you cannot assert against it.

> if it got renamed, you shouldn't start stage3, should you?

Solved in different ways.

Comment 7 Thomas Haller 2017-12-20 18:46:13 UTC
SetInterface uses the interface name. could we send the ifindex instead?

The ID of a link is the ifindex (ignoring the fact, that a ifindex might be reused). Contrary to an ifname, the ifindex cannot change.

Yes, inside the ppp plugin, there is only ifname available. Which I would consider a bug on it's own. You can immediately resolve the ifindex with if_nametoindex().
Of course, that is still racy with with possibility that the interface might be renamed or disappeared already. But it minimizes the time for the race by resolving the name as early as possible.

If you are unable to resolve the ifname (e.g. because the interface is already gone), then send ifindex "0" to NM to indicate that something went horribly wrong and NMDevice fails activation.


That also means, you need something like nm_device_set_ip_ifindex(), which does not exist yet. But I think that would be the right API, not set_ip_ifname(). In nm_device_set_ip_ifindex() you would instead need to lookup the ifname (e.g. by consulting the platform cache, possibly in conjunction with process_events). That might fail too, but you can handle that the same as receiving ifindex 0 and fail activation too.

Comment 8 Beniamino Galvani 2018-01-09 09:24:02 UTC
Pushed branch bg/ppp-set-ifindex-bgo1515829.

Comment 9 Thomas Haller 2018-01-09 13:22:53 UTC
(In reply to Beniamino Galvani from comment #8)
> Pushed branch bg/ppp-set-ifindex-bgo1515829.

looks mostly good. How about the two fixups?

Comment 10 Beniamino Galvani 2018-01-10 14:41:38 UTC
(In reply to Thomas Haller from comment #9)
> (In reply to Beniamino Galvani from comment #8)
> > Pushed branch bg/ppp-set-ifindex-bgo1515829.
> 
> looks mostly good. How about the two fixups?

Squashed, thanks.

Branch merged to master:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=3d4652fc6e21028068f3b59b8e95b8d03da3105e

Comment 11 Beniamino Galvani 2018-02-06 09:15:02 UTC
Created attachment 1391902 [details]
[PATCH nm-1-10] ppp: don't start IPv6 configuration on the device

Patch for nm-1-10 branch.

Comment 12 Thomas Haller 2018-02-06 10:00:20 UTC
(In reply to Beniamino Galvani from comment #11)
> Created attachment 1391902 [details]
> [PATCH nm-1-10] ppp: don't start IPv6 configuration on the device
> 
> Patch for nm-1-10 branch.

lgtm

Comment 17 errata-xmlrpc 2018-04-10 13:34:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0778