Bug 1900038

Summary:	[RFE] don't take down vlan if parent interface doesn't get configured
Product:	Red Hat Enterprise Linux 8	Reporter:	Dusty Mabe <dustymabe>
Component:	NetworkManager	Assignee:	NetworkManager Development Team <nm-team>
Status:	CLOSED WONTFIX	QA Contact:	Desktop QE <desktop-qa-list>
Severity:	unspecified	Docs Contact:
Priority:	medium
Version:	8.2	CC:	acardace, bgalvani, derekh, djuran, ferferna, fge, lrintel, rkhan, sukulkar, thaller, till
Target Milestone:	rc	Keywords:	FutureFeature, Triaged
Target Release:	8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	Feature: don't take down vlan if parent interface doesn't get configured Reason: I have a situation where my vlan device gets taken down because a device it's built on top of doesn't get DHCP. Result:	Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-05-20 07:27:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dusty Mabe 2020-11-20 16:29:56 UTC

Description of problem:

I have a situation where my vlan device gets taken down because
a device it's built on top of doesn't get DHCP.

NOTE: this probably applies to more cases, but vlan on top of a bond is
      an easy way to reproduce the issue.

This can happen easily if you configure a vlan on top of a bond, but
forget to disable both ipv4 and ipv6 DHCP on the bond itself. So we
have something like `bond0` and `bond0.100` where `bond0` won't get
DHCP from anywhere (either ipv4 or ipv6), but only one of them is
disabled in configuration.

In that case, after the ipv6 times out then entire setup will be taken
down (including the vlan), even though the vlan was up and working
fine.


```
[core@dhcpvlanbond ~]$ nmcli c show
NAME              UUID                                  TYPE      DEVICE    
bond0             75ac1a13-dbce-36e4-8ecb-c6ed6fce5322  bond      bond0     
bond0.100         bc927f10-6620-3b6c-9946-9186cc4df6aa  vlan      bond0.100 
bond0-slave-ens2  4fb61355-f5fd-3ade-940e-5fbd7d6d3f63  ethernet  ens2      
bond0-slave-ens3  7a194b59-78a7-3447-83fb-f5336d848e19  ethernet  ens3      
[core@dhcpvlanbond ~]$ [   39.004144] bond0: (slave ens2): Releasing backup interface
[   39.005104] bond0: (slave ens2): the permanent HWaddr of slave - 52:54:00:ea:6b:17 - is still in use by bond - set the HWaddr of slave to a different address to avoid conflicts
[   39.007009] bond0: (slave ens3): making interface the new active one
[   39.053150] IPv6: ADDRCONF(NETDEV_UP): ens2: link is not ready
[   39.053827] 8021q: adding VLAN 0 to HW filter on device ens2
[   39.055303] bond0: (slave ens3): Releasing backup interface
[   39.099861] IPv6: ADDRCONF(NETDEV_UP): ens3: link is not ready
[   39.099862] 8021q: adding VLAN 0 to HW filter on device ens3
[   39.101850] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[   39.106281] bond0 (unregistering): Released all slaves
[   39.126264] IPv6: ADDRCONF(NETDEV_UP): ens2: link is not ready
[   39.128190] IPv6: ADDRCONF(NETDEV_UP): ens3: link is not ready
[   41.055559] e1000: ens2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   41.057642] IPv6: ADDRCONF(NETDEV_CHANGE): ens2: link becomes ready
[   41.119575] e1000: ens3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   41.121723] IPv6: ADDRCONF(NETDEV_CHANGE): ens3: link becomes ready

[core@dhcpvlanbond ~]$ nmcli c show
NAME              UUID                                  TYPE      DEVICE 
bond0             75ac1a13-dbce-36e4-8ecb-c6ed6fce5322  bond      --     
bond0-slave-ens2  4fb61355-f5fd-3ade-940e-5fbd7d6d3f63  ethernet  --     
bond0-slave-ens3  7a194b59-78a7-3447-83fb-f5336d848e19  ethernet  --     
bond0.100         bc927f10-6620-3b6c-9946-9186cc4df6aa  vlan      --
```


We should probably be able to leave the bond0 up in this case, even
though it's slightly misconfigured because there is a device on top of
it that needs it to stay up.


Version-Release number of selected component (if applicable):

NetworkManager-1.22.8-6.el8_2.x86_64


How reproducible:

Always


Steps to Reproduce:

Set up a bond on top of a vlan. Something like:

```
[core@dhcpvlanbond ~]$ sudo tail -n 100 /etc/NetworkManager/system-connections/*
==> /etc/NetworkManager/system-connections/bond0-slave-ens2.nmconnection <==ns/* 
[connection]
id=bond0-slave-ens2
type=ethernet
interface-name=ens2
master=bond0
slave-type=bond

==> /etc/NetworkManager/system-connections/bond0-slave-ens3.nmconnection <==
[connection]
id=bond0-slave-ens3
type=ethernet
interface-name=ens3
master=bond0
slave-type=bond

==> /etc/NetworkManager/system-connections/bond0.100.nmconnection <==
[connection]
id=bond0.100
type=vlan
interface-name=bond0.100
[vlan]
egress-priority-map=
flags=1
id=100
ingress-priority-map=
parent=bond0
[ipv4]
dns-search=
may-fail=false
method=auto

==> /etc/NetworkManager/system-connections/bond0.nmconnection <==
[connection]
id=bond0
type=bond
interface-name=bond0
[bond]
miimon=100
mode=active-backup
[ipv4]
method=disabled
```

Note that there is no `ipv6.method=disabled` in the `bond0.nmconnection` file.


Actual results:

The bond0 gets taken down when ipv6 never succeeds and takes down the bond0.100 as well.


Expected results:

The bond0.100 stays up.


Additional info:

Comment 1 Gris Ge 2021-03-03 05:31:17 UTC

Hi Dusty,

Can you try?

nmcli connection modify <parent_iface> ipv4.dhcp-timeout infinity ipv6.dhcp-timeout infinity

It should not bring down the interface due to DHCP failures.

Comment 2 Dusty Mabe 2021-04-01 14:38:04 UTC

Hey Gris,

Sorry for the late reply, I've been away. 

I added the timeout (Fedora 33 system (NetworkManager-1.26.6-1.fc33.x86_64) but the bond and vlan still gets taken down eventually. This is admittedly a misconfiguration, but I think we can do better. Once DHCP fails for the bond we should be able to simply check to see if the bond is used in any other devices that are successfully up before we take it down.

We can leave the bond in a degraded state (yellow in `nmcli c show` view) but not take it down because of the higher level devices using it.

Comment 3 Gris Ge 2021-04-02 06:36:27 UTC

Hi Dusty,

Might related to `ipv6.ra_timeout`. Let me try around.

Comment 4 Gris Ge 2021-08-03 04:07:08 UTC

Hi Dusty,

I checked, you need to use this command to set dhcp and ipv6-autoconf to infinity timeout:

nmcli connection modify <connection_id> ipv4.dhcp-timeout infinity ipv6.dhcp-timeout infinity ipv6.ra-timeout infinity

We are planning this use case at https://docs.google.com/document/d/17LIu6xml9OrJHghS6t3RVVceN1fWtknqH73WwIlWubo/edit targeting 8.6/9.1.

Comment 5 Gris Ge 2021-11-03 13:31:06 UTC

*** Bug 1908302 has been marked as a duplicate of this bug. ***

Comment 6 Gris Ge 2021-11-23 08:01:37 UTC

Workaround exists:


nmcli connection modify <connection_id> ipv4.dhcp-timeout infinity ipv6.dhcp-timeout infinity ipv6.ra-timeout infinity


Acceptance criteria: NetworkManager should not remove virtual interface on DHCP timeout when that interface is been used as VLAN parent or bridge/bond/etc controller.


Hence set to medium priority. We are out of capacity for 8.6. Postpone to further planning.

Comment 9 RHEL Program Management 2022-05-20 07:27:20 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.