1900038 – [RFE] don't take down vlan if parent interface doesn't get configured

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1900038 - [RFE] don't take down vlan if parent interface doesn't get configured

Summary: [RFE] don't take down vlan if parent interface doesn't get configured

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	NetworkManager
Sub Component:
Version:	8.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	rc
Target Release:	8.0
Assignee:	NetworkManager Development Team
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1908302 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-20 16:29 UTC by Dusty Mabe
Modified:	2022-05-20 07:27 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	Feature: don't take down vlan if parent interface doesn't get configured Reason: I have a situation where my vlan device gets taken down because a device it's built on top of doesn't get DHCP. Result:
Clone Of:
Environment:
Last Closed:	2022-05-20 07:27:20 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Dusty Mabe 2020-11-20 16:29:56 UTC

Description of problem:

I have a situation where my vlan device gets taken down because
a device it's built on top of doesn't get DHCP.

NOTE: this probably applies to more cases, but vlan on top of a bond is
      an easy way to reproduce the issue.

This can happen easily if you configure a vlan on top of a bond, but
forget to disable both ipv4 and ipv6 DHCP on the bond itself. So we
have something like `bond0` and `bond0.100` where `bond0` won't get
DHCP from anywhere (either ipv4 or ipv6), but only one of them is
disabled in configuration.

In that case, after the ipv6 times out then entire setup will be taken
down (including the vlan), even though the vlan was up and working
fine.


```
[core@dhcpvlanbond ~]$ nmcli c show
NAME              UUID                                  TYPE      DEVICE    
bond0             75ac1a13-dbce-36e4-8ecb-c6ed6fce5322  bond      bond0     
bond0.100         bc927f10-6620-3b6c-9946-9186cc4df6aa  vlan      bond0.100 
bond0-slave-ens2  4fb61355-f5fd-3ade-940e-5fbd7d6d3f63  ethernet  ens2      
bond0-slave-ens3  7a194b59-78a7-3447-83fb-f5336d848e19  ethernet  ens3      
[core@dhcpvlanbond ~]$ [   39.004144] bond0: (slave ens2): Releasing backup interface
[   39.005104] bond0: (slave ens2): the permanent HWaddr of slave - 52:54:00:ea:6b:17 - is still in use by bond - set the HWaddr of slave to a different address to avoid conflicts
[   39.007009] bond0: (slave ens3): making interface the new active one
[   39.053150] IPv6: ADDRCONF(NETDEV_UP): ens2: link is not ready
[   39.053827] 8021q: adding VLAN 0 to HW filter on device ens2
[   39.055303] bond0: (slave ens3): Releasing backup interface
[   39.099861] IPv6: ADDRCONF(NETDEV_UP): ens3: link is not ready
[   39.099862] 8021q: adding VLAN 0 to HW filter on device ens3
[   39.101850] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[   39.106281] bond0 (unregistering): Released all slaves
[   39.126264] IPv6: ADDRCONF(NETDEV_UP): ens2: link is not ready
[   39.128190] IPv6: ADDRCONF(NETDEV_UP): ens3: link is not ready
[   41.055559] e1000: ens2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   41.057642] IPv6: ADDRCONF(NETDEV_CHANGE): ens2: link becomes ready
[   41.119575] e1000: ens3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   41.121723] IPv6: ADDRCONF(NETDEV_CHANGE): ens3: link becomes ready

[core@dhcpvlanbond ~]$ nmcli c show
NAME              UUID                                  TYPE      DEVICE 
bond0             75ac1a13-dbce-36e4-8ecb-c6ed6fce5322  bond      --     
bond0-slave-ens2  4fb61355-f5fd-3ade-940e-5fbd7d6d3f63  ethernet  --     
bond0-slave-ens3  7a194b59-78a7-3447-83fb-f5336d848e19  ethernet  --     
bond0.100         bc927f10-6620-3b6c-9946-9186cc4df6aa  vlan      --
```


We should probably be able to leave the bond0 up in this case, even
though it's slightly misconfigured because there is a device on top of
it that needs it to stay up.


Version-Release number of selected component (if applicable):

NetworkManager-1.22.8-6.el8_2.x86_64


How reproducible:

Always


Steps to Reproduce:

Set up a bond on top of a vlan. Something like:

```
[core@dhcpvlanbond ~]$ sudo tail -n 100 /etc/NetworkManager/system-connections/*
==> /etc/NetworkManager/system-connections/bond0-slave-ens2.nmconnection <==ns/* 
[connection]
id=bond0-slave-ens2
type=ethernet
interface-name=ens2
master=bond0
slave-type=bond

==> /etc/NetworkManager/system-connections/bond0-slave-ens3.nmconnection <==
[connection]
id=bond0-slave-ens3
type=ethernet
interface-name=ens3
master=bond0
slave-type=bond

==> /etc/NetworkManager/system-connections/bond0.100.nmconnection <==
[connection]
id=bond0.100
type=vlan
interface-name=bond0.100
[vlan]
egress-priority-map=
flags=1
id=100
ingress-priority-map=
parent=bond0
[ipv4]
dns-search=
may-fail=false
method=auto

==> /etc/NetworkManager/system-connections/bond0.nmconnection <==
[connection]
id=bond0
type=bond
interface-name=bond0
[bond]
miimon=100
mode=active-backup
[ipv4]
method=disabled
```

Note that there is no `ipv6.method=disabled` in the `bond0.nmconnection` file.


Actual results:

The bond0 gets taken down when ipv6 never succeeds and takes down the bond0.100 as well.


Expected results:

The bond0.100 stays up.


Additional info:

Comment 1 Gris Ge 2021-03-03 05:31:17 UTC

Hi Dusty,

Can you try?

nmcli connection modify <parent_iface> ipv4.dhcp-timeout infinity ipv6.dhcp-timeout infinity

It should not bring down the interface due to DHCP failures.

Comment 2 Dusty Mabe 2021-04-01 14:38:04 UTC

Hey Gris,

Sorry for the late reply, I've been away. 

I added the timeout (Fedora 33 system (NetworkManager-1.26.6-1.fc33.x86_64) but the bond and vlan still gets taken down eventually. This is admittedly a misconfiguration, but I think we can do better. Once DHCP fails for the bond we should be able to simply check to see if the bond is used in any other devices that are successfully up before we take it down.

We can leave the bond in a degraded state (yellow in `nmcli c show` view) but not take it down because of the higher level devices using it.

Comment 3 Gris Ge 2021-04-02 06:36:27 UTC

Hi Dusty,

Might related to `ipv6.ra_timeout`. Let me try around.

Comment 4 Gris Ge 2021-08-03 04:07:08 UTC

Hi Dusty,

I checked, you need to use this command to set dhcp and ipv6-autoconf to infinity timeout:

nmcli connection modify <connection_id> ipv4.dhcp-timeout infinity ipv6.dhcp-timeout infinity ipv6.ra-timeout infinity

We are planning this use case at https://docs.google.com/document/d/17LIu6xml9OrJHghS6t3RVVceN1fWtknqH73WwIlWubo/edit targeting 8.6/9.1.

Comment 5 Gris Ge 2021-11-03 13:31:06 UTC

*** Bug 1908302 has been marked as a duplicate of this bug. ***

Comment 6 Gris Ge 2021-11-23 08:01:37 UTC

Workaround exists:


nmcli connection modify <connection_id> ipv4.dhcp-timeout infinity ipv6.dhcp-timeout infinity ipv6.ra-timeout infinity


Acceptance criteria: NetworkManager should not remove virtual interface on DHCP timeout when that interface is been used as VLAN parent or bridge/bond/etc controller.


Hence set to medium priority. We are out of capacity for 8.6. Postpone to further planning.

Comment 9 RHEL Program Management 2022-05-20 07:27:20 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Note You need to log in before you can comment on or make changes to this bug.