Bug 1378418

Summary: vlan device is down and lost ip once stopping NetworkManager
Product: Red Hat Enterprise Linux 7 Reporter: Michael Burman <mburman>
Component: NetworkManagerAssignee: Rashid Khan <rkhan>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: aloughla, atragler, bgalvani, danken, lrintel, mburman, myakove, rkhan, thaller, vbenes
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: NetworkManager-1.4.0-13.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 09:17:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
joutnalctl NM
none
journalctl in trace log
none
screenshot_vlan_device_down_after_NM_Stopped
none
journalctl in trace log_new none

Description Michael Burman 2016-09-22 11:23:14 UTC
Description of problem:
vlan device is down and lost ip once stopping NetworkManager.

vlan devices (bond+nic) are going down and loosing IP once we stopping NM.
The devices configured via nmcli.

Version-Release number of selected component (if applicable):
NetworkManager-1.4.0-8.el7.x86_64
rhel7.3 beta

How reproducible:
100%

Steps to Reproduce:
1. Create vlan device using nmcli

[root@orchid-vds2 ~]# nmcli connection show
NAME           UUID                                  TYPE            DEVICE
System enp4s0  c3b03416-e559-4d71-9b58-ac3af842edea  802-3-ethernet  enp4s0
virbr0         01d9fce0-014f-4479-99e6-08034fe971b8  bridge          virbr0
enp6s0         f128342a-ad47-42d5-9e32-076aef50358f  802-3-ethernet  --    
ens1f0         4a235aea-9fc7-42db-b8dc-fd6276a1003e  802-3-ethernet  --    
ens1f1         499f1721-47c0-4740-b6a7-9324d9081285  802-3-ethernet  --    

[root@orchid-vds2 ~]# rpm -qa | grep NetworkMa
NetworkManager-team-1.4.0-8.el7.x86_64
NetworkManager-config-server-1.4.0-8.el7.x86_64
NetworkManager-1.4.0-8.el7.x86_64
NetworkManager-tui-1.4.0-8.el7.x86_64
NetworkManager-libnm-1.4.0-8.el7.x86_64

[root@orchid-vds2 ~]#
[root@orchid-vds2 ~]#
[root@orchid-vds2 ~]# nmcli connection add type vlan con-name enp4s0.162 dev enp4s0 id 162; \
> nmcli con mod uuid c3b03416-e559-4d71-9b58-ac3af842edea ipv4.method disabled ipv6.method ignore; \
> nmcli con mod id enp4s0.162 ipv4.method auto; \
> nmcli con down uuid c3b03416-e559-4d71-9b58-ac3af842edea; \
>  nmcli con up uuid c3b03416-e559-4d71-9b58-ac3af842edea; \
> nmcli con up id enp4s0.162
Connection 'enp4s0.162' (a43de140-2f7c-40d8-8cd0-f4da9d299797) successfully added.

2. Now the host is reachable via it's vlan ip - 10.35.129.15
once stopping NM, the connection will be lost cause the vlan nic goes down

enp4s0.162@enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 00:1a:64:7a:94:62 brd ff:ff:ff:ff:ff:ff
    inet 10.35.129.15/24 brd 10.35.129.255 scope global dynamic enp4s0.162
       valid_lft 36750sec preferred_lft 36750sec
    inet6 2620:52:0:2381:b514:b1d9:ea51:2a44/64 scope global noprefixroute dynamic 
       valid_lft 2591939sec preferred_lft 604739sec
    inet6 fe80::4591:3f02:a094:fc0b/64 scope link 
       valid_lft forever preferred_lft forever

Actual results:
vlan device is down and lost ip. can't reach host

Expected results:
vlan should stay up if stopping NM

Additional info:
This is affecting rhv-m product, because we stopping NM when adding new host to rhv-m.

- This applies vlan bond device as well. 

I enabled logging level=TRACE and i will provide logs when be able to establish connection again to the server.

Comment 2 Michael Burman 2016-09-22 12:32:23 UTC
Created attachment 1203718 [details]
joutnalctl NM

Comment 3 Michael Burman 2016-09-25 07:36:22 UTC
Created attachment 1204485 [details]
journalctl in trace log

Comment 7 Beniamino Galvani 2016-09-27 14:13:58 UTC
I can't reproduce the issue using commands in comment 6. Since none of
the files in comments 2 and 3 contain logs of the shutdown at trace
level (which is the part we need to understand what's happening), can
you please execute the following and attach the output as well as the
NM journal log:

  nmcli general logging level TRACE
  nmcli connection add type vlan con-name enp4s0.162 dev enp4s0 id 162
  nmcli con mod uuid c3b03416-e559-4d71-9b58-ac3af842edea ipv4.method disabled ipv6.method ignore
  nmcli con mod id enp4s0.162 ipv4.method auto
  nmcli con down uuid c3b03416-e559-4d71-9b58-ac3af842edea
  nmcli con up uuid c3b03416-e559-4d71-9b58-ac3af842edea
  nmcli con up id enp4s0.162
  ip addr
  sleep 5
  systemctl stop NetworkManager
  sleep 5
  ip addr

Thanks.

Comment 8 Michael Burman 2016-09-27 14:48:47 UTC
(In reply to Beniamino Galvani from comment #7)
> I can't reproduce the issue using commands in comment 6. Since none of
> the files in comments 2 and 3 contain logs of the shutdown at trace
> level (which is the part we need to understand what's happening), can
> you please execute the following and attach the output as well as the
> NM journal log:
> 
>   nmcli general logging level TRACE
>   nmcli connection add type vlan con-name enp4s0.162 dev enp4s0 id 162
>   nmcli con mod uuid c3b03416-e559-4d71-9b58-ac3af842edea ipv4.method
> disabled ipv6.method ignore
>   nmcli con mod id enp4s0.162 ipv4.method auto
>   nmcli con down uuid c3b03416-e559-4d71-9b58-ac3af842edea
>   nmcli con up uuid c3b03416-e559-4d71-9b58-ac3af842edea
>   nmcli con up id enp4s0.162
>   ip addr
>   sleep 5
>   systemctl stop NetworkManager
>   sleep 5
>   ip addr
> 
> Thanks.

[root@camel-vdsa ~]# nmcli connection show 
NAME           UUID                                  TYPE            DEVICE 
System enp4s0  c6db3d74-edab-4247-b66a-dd80c29a3d51  802-3-ethernet  enp4s0

[root@camel-vdsa ~]# nmcli general logging level TRACE
[root@camel-vdsa ~]# nmcli connection add type vlan con-name enp4s0.162 dev enp4s0 id 162; \
> nmcli con mod uuid c6db3d74-edab-4247-b66a-dd80c29a3d51 ipv4.method disabled ipv6.method ignore; \
> nmcli con mod id enp4s0.162 ipv4.method auto; \
> nmcli con down uuid c6db3d74-edab-4247-b66a-dd80c29a3d51; \
> nmcli con up uuid c6db3d74-edab-4247-b66a-dd80c29a3d51; \
> nmcli con up id enp4s0.162
Connection 'enp4s0.162' (44d4df63-7fc0-4241-b641-72abcade42a8) successfully added.

[root@camel-vdsa ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:21:5e:3f:de:b0 brd ff:ff:ff:ff:ff:ff
    inet6 2620:52:0:2380:221:5eff:fe3f:deb0/64 scope global mngtmpaddr dynamic 
       valid_lft 2591784sec preferred_lft 604584sec
    inet6 fe80::221:5eff:fe3f:deb0/64 scope link 
       valid_lft forever preferred_lft forever
3: enp6s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:21:5e:3f:de:b2 brd ff:ff:ff:ff:ff:ff
4: ens2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 00:10:18:24:47:f2 brd ff:ff:ff:ff:ff:ff
5: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 00:10:18:24:47:f3 brd ff:ff:ff:ff:ff:ff
6: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 90:e2:ba:18:de:7c brd ff:ff:ff:ff:ff:ff
7: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 90:e2:ba:18:de:7d brd ff:ff:ff:ff:ff:ff
8: enp4s0.162@enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 00:21:5e:3f:de:b0 brd ff:ff:ff:ff:ff:ff
    inet 10.35.129.18/24 brd 10.35.129.255 scope global dynamic enp4s0.162
       valid_lft 42982sec preferred_lft 42982sec
    inet6 2620:52:0:2381:f4bd:279d:57b6:28d7/64 scope global noprefixroute dynamic 
       valid_lft 2591785sec preferred_lft 604585sec
    inet6 fe80::689e:e6ee:9f40:997f/64 scope link 
       valid_lft forever preferred_lft forever

[root@camel-vdsa ~]# sleep 5

[root@camel-vdsa ~]# systemctl stop NetworkManager

* At this point i lost connectivity! this is the bug.
So now, i need to connect the management console to provide you the output of the ip addr and in order to provide you the journactl, i will need to run dhclient enp4s0 to get ip back from enp4s0 device. 
Before that i will show you in screen shot that enp4s0.162 is down and lost ip.

Comment 9 Michael Burman 2016-09-27 14:50:30 UTC
Created attachment 1205256 [details]
screenshot_vlan_device_down_after_NM_Stopped

Comment 10 Michael Burman 2016-09-27 14:51:39 UTC
Created attachment 1205258 [details]
journalctl in trace log_new

Comment 11 Thomas Haller 2016-09-27 16:42:01 UTC
the logfile is helpful. Thanks.

unmanaged_on_quit() returns TRUE, because

  - nm_platform_link_can_assume() is FALSE due to the ipv6 address 
     2620:52:0:2380:221:5eff:fe3f:deb0/64
  - nm_device_can_assume_active_connection() returns FALSE too.

I guess, upstream patch https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=553717bb1c9ed31be0fab85bc37f6823dc8ab480 would avoid this scenario. As such, this is a duplicate of bug 1371126.

Comment 12 Dan Kenigsberg 2016-09-30 19:07:34 UTC
Thanks, Tomas. I'd ask to keep the bug open until it is verified by Burman, to be 100% sure that RHEV use case is covered.

Comment 13 Michael Burman 2016-12-04 09:12:54 UTC
Hi Thomas

I have tested this bug with success using 
NetworkManager-1.4.0-13.el7.x86_64
NetworkManager-team-1.4.0-13.el7.x86_64
NetworkManager-config-server-1.4.0-13.el7.x86_64
NetworkManager-libnm-1.4.0-13.el7.x86_64
NetworkManager-tui-1.4.0-13.el7.x86_64

RHV use case is covered and working. vlan device is alive after NM is stopped.
This report can considered as VERIFIED.

Comment 14 Meni Yakove 2017-02-02 14:29:58 UTC
Shouldn't this be ON_QA?

Comment 17 errata-xmlrpc 2017-08-01 09:17:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2299