1376199 – stalled eth1.80 vlan after restart and connection delete

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1376199 - stalled eth1.80 vlan after restart and connection delete

Summary: stalled eth1.80 vlan after restart and connection delete

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	NetworkManager
Sub Component:
Version:	7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Beniamino Galvani
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-09-14 20:30 UTC by Vladimir Benes
Modified:	2017-08-01 09:17 UTC (History)
CC List:	9 users (show)
Fixed In Version:	NetworkManager-1.8.0-7.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-01 09:17:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logz (188.96 KB, text/plain) 2016-09-14 20:31 UTC, Vladimir Benes	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:2299	0	normal	SHIPPED_LIVE	Moderate: NetworkManager and libnl3 security, bug fix and enhancement update	2017-08-01 12:40:28 UTC

Description Vladimir Benes 2016-09-14 20:30:04 UTC

Description of problem:
stalled vlan after this test scenario:

nmcli connection add type vlan con-name vlan dev eth1 id 80
nmcli connection modify vlan eth.mtu 1450 ipv4.method manual ipv4.addresses 1.2.3.4/24
nmcli connection up id testeth1
nmcli con up id vlan
systemctl restart NetworkManager
nmcli connection
nmcli con del vlan
nmcli device

result:
eth1.80  vlan      unmanaged     --    

Version-Release number of selected component (if applicable):
NetworkManager-1.4.0-6.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1.see above

Actual results:
stalled device

Expected results:
device should be deleted

Additional info:
log attached

Comment 1 Vladimir Benes 2016-09-14 20:31:28 UTC

Created attachment 1200967 [details]
logz

Comment 2 Aniss Loughlam 2016-09-15 15:36:40 UTC

Confirmed:

Red Hat Enterprise Linux Server release 7.3 Beta (Maipo)
NetworkManager 1.4.0-0.5.beta1.el7

after running
# nmcli con del vlan
# nmcli device
results:
DEVICE   TYPE      STATE         CONNECTION
p6p1.80  vlan      disconnected  --  
-------------
with
Red Hat Enterprise Linux Server release 7.2 
NetworkManager 1.0.6-27.el7

the device was deleted

Comment 3 Aniss Loughlam 2016-09-15 15:42:05 UTC

(In reply to Aniss from comment #2)

> -------------
> with
> Red Hat Enterprise Linux Server release 7.2 
> NetworkManager 1.0.6-27.el7
> 
> the device was deleted
not really, I missed a step here (systemctl restart NetworkManager). I tried it again and I got:
DEVICE   TYPE      STATE         CONNECTION
em1.80  vlan      disconnected  --

Comment 4 Beniamino Galvani 2016-09-21 14:26:04 UTC

Hi, in this scenario NM tries to fulfill these two goals:

 - keeping the connection up when NM is stopped, to avoid breaking
   connectivity

 - don't destroy software devices that already existed when NM started

If these two are satisfied, the result is exactly what you see, that
after a restart NM finds a pre-existing vlan device and will not
delete it upon disconnect.

We have planned to rework how the connection assumption works and that
change will probably improve this scenario; see bug [1] for more details.

For now I propose to close this, as NM is behaving as expected.

[1] https://bugzilla.gnome.org/show_bug.cgi?id=746440

Comment 5 Beniamino Galvani 2017-05-30 16:46:37 UTC

Now that we have a state file to persist the device state on daemon restart, we could save there whether the device was created by NM or not, and do the right thing after restart. Implementation in branch:

 bg/nm-owned-persist-rh1376199

Please review.

Comment 6 Thomas Haller 2017-05-31 09:19:20 UTC

I dislike a bit that there is nm_device_set_nm_owned(), so the device gets fully realized, and only then we set the flag.

How about nm_device_realize_start() and nm_device_create_and_realize() having an argument "nm_owned", and the caller (NMManager) determines for the device whether it is nm-owned -- and it should do so very early when realizing the device.

Comment 7 Beniamino Galvani 2017-06-05 07:29:38 UTC

(In reply to Thomas Haller from comment #6)
> How about nm_device_realize_start() and nm_device_create_and_realize()
> having an argument "nm_owned", and the caller (NMManager) determines for the
> device whether it is nm-owned -- and it should do so very early when
> realizing the device.

These functions are called from 6 different places and I prefer not to
patch all those to load the state. Instead, how about setting nm-owned
in realize_start_setup(), which is called by both functions? Repushed
branch bg/nm-owned-persist-rh1376199.

Comment 8 Thomas Haller 2017-06-06 17:38:00 UTC

yeah, the place is good.

could we move it a bit up, I think it should set as early as possible.
Maybe immediately after _add_capabilities() (because we check for NM_DEVICE_CAP_IS_SOFTWARE).

Otherwise lgtm. This might fix CI failure bug 1452062. Will test tomorrow.

Comment 9 Beniamino Galvani 2017-06-07 06:33:03 UTC

(In reply to Thomas Haller from comment #8)
> yeah, the place is good.
> 
> could we move it a bit up, I think it should set as early as possible.
> Maybe immediately after _add_capabilities() (because we check for
> NM_DEVICE_CAP_IS_SOFTWARE).

Branch bg/nm-owned-persist-rh1376199 updated.

Comment 10 Thomas Haller 2017-06-07 07:53:10 UTC

lgtm

Comment 11 Beniamino Galvani 2017-06-07 08:31:56 UTC

Merged to master:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=3223d92eeaf704f0bed774610f5935b8fcfb1adb

Comment 12 Thomas Haller 2017-06-08 20:06:06 UTC

(In reply to Beniamino Galvani from comment #11)
> Merged to master:
> 
> https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/
> ?id=3223d92eeaf704f0bed774610f5935b8fcfb1adb

I did a related follow-up patch (merged to master as https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d83848be9dfd0edb5f318b81854b371133d84f6e )

I also backported bg/nm-owned-persist-rh1376199 branch + the follow-up patch to nm-1-8, as:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=de1c460e586be65f5549c5d705a10888d5f1baae
https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=8e25de8ab360fc973d7222685f107b81dd872dc1

Comment 13 Thomas Haller 2017-06-09 08:02:28 UTC

please see https://bugzilla.redhat.com/show_bug.cgi?id=1452062#c7 for rhel-7.4 backport.

Comment 15 errata-xmlrpc 2017-08-01 09:17:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2299

Note You need to log in before you can comment on or make changes to this bug.