Bug 1376199

Summary: stalled eth1.80 vlan after restart and connection delete
Product: Red Hat Enterprise Linux 7 Reporter: Vladimir Benes <vbenes>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: aloughla, atragler, bgalvani, lmiksik, lrintel, mleitner, rkhan, sukulkar, thaller
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: NetworkManager-1.8.0-7.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 09:17:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logz none

Description Vladimir Benes 2016-09-14 20:30:04 UTC
Description of problem:
stalled vlan after this test scenario:

nmcli connection add type vlan con-name vlan dev eth1 id 80
nmcli connection modify vlan eth.mtu 1450 ipv4.method manual ipv4.addresses 1.2.3.4/24
nmcli connection up id testeth1
nmcli con up id vlan
systemctl restart NetworkManager
nmcli connection
nmcli con del vlan
nmcli device

result:
eth1.80  vlan      unmanaged     --    

Version-Release number of selected component (if applicable):
NetworkManager-1.4.0-6.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1.see above

Actual results:
stalled device

Expected results:
device should be deleted

Additional info:
log attached

Comment 1 Vladimir Benes 2016-09-14 20:31:28 UTC
Created attachment 1200967 [details]
logz

Comment 2 Aniss Loughlam 2016-09-15 15:36:40 UTC
Confirmed:

Red Hat Enterprise Linux Server release 7.3 Beta (Maipo)
NetworkManager 1.4.0-0.5.beta1.el7

after running
# nmcli con del vlan
# nmcli device
results:
DEVICE   TYPE      STATE         CONNECTION
p6p1.80  vlan      disconnected  --  
-------------
with
Red Hat Enterprise Linux Server release 7.2 
NetworkManager 1.0.6-27.el7

the device was deleted

Comment 3 Aniss Loughlam 2016-09-15 15:42:05 UTC
(In reply to Aniss from comment #2)

> -------------
> with
> Red Hat Enterprise Linux Server release 7.2 
> NetworkManager 1.0.6-27.el7
> 
> the device was deleted
not really, I missed a step here (systemctl restart NetworkManager). I tried it again and I got:
DEVICE   TYPE      STATE         CONNECTION
em1.80  vlan      disconnected  --

Comment 4 Beniamino Galvani 2016-09-21 14:26:04 UTC
Hi, in this scenario NM tries to fulfill these two goals:

 - keeping the connection up when NM is stopped, to avoid breaking
   connectivity

 - don't destroy software devices that already existed when NM started

If these two are satisfied, the result is exactly what you see, that
after a restart NM finds a pre-existing vlan device and will not
delete it upon disconnect.

We have planned to rework how the connection assumption works and that
change will probably improve this scenario; see bug [1] for more details.

For now I propose to close this, as NM is behaving as expected.

[1] https://bugzilla.gnome.org/show_bug.cgi?id=746440

Comment 5 Beniamino Galvani 2017-05-30 16:46:37 UTC
Now that we have a state file to persist the device state on daemon restart, we could save there whether the device was created by NM or not, and do the right thing after restart. Implementation in branch:

 bg/nm-owned-persist-rh1376199

Please review.

Comment 6 Thomas Haller 2017-05-31 09:19:20 UTC
I dislike a bit that there is nm_device_set_nm_owned(), so the device gets fully realized, and only then we set the flag.

How about nm_device_realize_start() and nm_device_create_and_realize() having an argument "nm_owned", and the caller (NMManager) determines for the device whether it is nm-owned -- and it should do so very early when realizing the device.

Comment 7 Beniamino Galvani 2017-06-05 07:29:38 UTC
(In reply to Thomas Haller from comment #6)
> How about nm_device_realize_start() and nm_device_create_and_realize()
> having an argument "nm_owned", and the caller (NMManager) determines for the
> device whether it is nm-owned -- and it should do so very early when
> realizing the device.

These functions are called from 6 different places and I prefer not to
patch all those to load the state. Instead, how about setting nm-owned
in realize_start_setup(), which is called by both functions? Repushed
branch bg/nm-owned-persist-rh1376199.

Comment 8 Thomas Haller 2017-06-06 17:38:00 UTC
yeah, the place is good.

could we move it a bit up, I think it should set as early as possible.
Maybe immediately after _add_capabilities() (because we check for NM_DEVICE_CAP_IS_SOFTWARE).

Otherwise lgtm. This might fix CI failure bug 1452062. Will test tomorrow.

Comment 9 Beniamino Galvani 2017-06-07 06:33:03 UTC
(In reply to Thomas Haller from comment #8)
> yeah, the place is good.
> 
> could we move it a bit up, I think it should set as early as possible.
> Maybe immediately after _add_capabilities() (because we check for
> NM_DEVICE_CAP_IS_SOFTWARE).

Branch bg/nm-owned-persist-rh1376199 updated.

Comment 10 Thomas Haller 2017-06-07 07:53:10 UTC
lgtm

Comment 12 Thomas Haller 2017-06-08 20:06:06 UTC
(In reply to Beniamino Galvani from comment #11)
> Merged to master:
> 
> https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/
> ?id=3223d92eeaf704f0bed774610f5935b8fcfb1adb

I did a related follow-up patch (merged to master as https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d83848be9dfd0edb5f318b81854b371133d84f6e )

I also backported bg/nm-owned-persist-rh1376199 branch + the follow-up patch to nm-1-8, as:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=de1c460e586be65f5549c5d705a10888d5f1baae
https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=8e25de8ab360fc973d7222685f107b81dd872dc1

Comment 13 Thomas Haller 2017-06-09 08:02:28 UTC
please see https://bugzilla.redhat.com/show_bug.cgi?id=1452062#c7 for rhel-7.4 backport.

Comment 15 errata-xmlrpc 2017-08-01 09:17:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2299