RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1818697 - Vlan over bond is not active after first boot
Summary: Vlan over bond is not active after first boot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.2
Assignee: Beniamino Galvani
QA Contact: Vladimir Benes
URL:
Whiteboard:
: 1879003 (view as bug list)
Depends On: 1783891
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-30 06:34 UTC by Qin Yuan
Modified: 2021-05-18 13:31 UTC (History)
27 users (show)

Fixed In Version: NetworkManager-1.28.0-0.1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-18 13:29:37 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (1.61 MB, application/gzip)
2020-03-30 06:34 UTC, Qin Yuan
no flags Details
level=TRACE log (927.48 KB, text/plain)
2020-05-15 06:49 UTC, Qin Yuan
no flags Details
Reproducer (621 bytes, application/x-shellscript)
2020-09-18 10:05 UTC, Beniamino Galvani
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1783891 0 medium CLOSED Virtual interfaces will disappear periodically when DHCP timeout 2023-06-16 09:45:03 UTC
Red Hat Bugzilla 1791372 0 medium CLOSED [RFE] Allow automatic configuration to timeout on IPv4 AND IPv6 2021-04-30 16:47:27 UTC
Red Hat Bugzilla 1791624 0 medium CLOSED [RFE] support preserving the interface when DHCP timeout 2022-02-01 07:43:51 UTC
Red Hat Knowledge Base (Solution) 5425801 0 None None None 2020-09-22 22:33:39 UTC

Internal Links: 1791372 1791624

Description Qin Yuan 2020-03-30 06:34:56 UTC
Created attachment 1674634 [details]
logs

Description of problem:
Configure vlan over bond device on Anaconda GUI, it can get ip during installation, but after first reboot, the vlan device is not up.

Version-Release number of selected component (if applicable):
RHVH-4.4-20200325.0-RHVH-x86_64-dvd1.iso

How reproducible:
100%

Steps to Reproduce:
1. Configure vlan over bond on Anaocnda GUI:
   1) Bond:
      slaves: 2 nics
      mode: active-backup
      ipv4: disabled
      ipv6: ignore
   2)Vlan:
       parent interface: bond0
       vlan id: 50    
     
2. Continue to finish other required settings, and begin installation
3. Reboot and enter system, check vlan over bond

Actual results:
1. The vlan over bond device is not up

Expected results:
1. The vlan over bond device should be up after first boot.

Additional info:
1. vlan over bond device can be activated by nmcli cmd.
2. vlan over bond device will be activated automatically when activating other nic using nmcli cmd.

Comment 1 Sandro Bonazzola 2020-04-14 07:59:59 UTC
Dominic can you please have a look at this one?

Comment 2 Dominik Holler 2020-04-14 08:19:49 UTC
Qin, is the bond configured to use dhcp? Is the dhcp successful?

Comment 3 Qin Yuan 2020-04-14 08:32:40 UTC
For bond, ipv4 is disabled, ipv6 is ignored. Isn't this the right way to configure vlan over bond?

In /var/log/messages, I saw:
Mar 30 02:06:53 localhost NetworkManager[1417]: <warn>  [1585534013.0331] dhcp4 (bond0.50): request timed out
Mar 30 02:06:53 localhost NetworkManager[1417]: <info>  [1585534013.0333] dhcp4 (bond0.50): state changed unknown -> timeout
Mar 30 02:06:53 localhost NetworkManager[1417]: <info>  [1585534013.0334] device (bond0.50): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')

Comment 4 Dominik Holler 2020-04-14 08:34:29 UTC
(In reply to Qin Yuan from comment #3)
> For bond, ipv4 is disabled, ipv6 is ignored. Isn't this the right way to
> configure vlan over bond?
> 
> In /var/log/messages, I saw:
> Mar 30 02:06:53 localhost NetworkManager[1417]: <warn>  [1585534013.0331]
> dhcp4 (bond0.50): request timed out
> Mar 30 02:06:53 localhost NetworkManager[1417]: <info>  [1585534013.0333]
> dhcp4 (bond0.50): state changed unknown -> timeout
> Mar 30 02:06:53 localhost NetworkManager[1417]: <info>  [1585534013.0334]
> device (bond0.50): state change: ip-config -> failed (reason
> 'ip-config-unavailable', sys-iface-state: 'managed')

Thanks, is dhcp enabled for the VLAN on the bond?

Comment 5 Qin Yuan 2020-04-14 08:41:30 UTC
Yes. 

# cat /etc/sysconfig/network-script/ifcfg-VLAN_connection_1
VLAN=yes
TYPE=Vlan
PHYSDEV=d5008bac-31f3-4b75-803e-dbca0ee4e871
VLAN_ID=50
REORDER_HDR=yes
GVRP=no
MVRP=no
HWADDR=
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_PRIVACY=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME="VLAN connection 1"
UUID=009ca293-b03d-422f-a35f-b1cc36e8c2f5
ONBOOT=yes

# nmcli c show "VLAN connection 1"
ipv4.method:                            auto
ipv6.method:                            auto

Comment 6 Dominik Holler 2020-04-14 08:43:44 UTC
Thomas, is this the intended behavior of NetworkManager?

Comment 7 Sandro Bonazzola 2020-04-28 07:45:59 UTC
Should be fixed by nmstate-0.2.5-1.el8 which is included in RHEl 8.2

Comment 8 Dominik Holler 2020-04-28 07:55:40 UTC
(In reply to Sandro Bonazzola from comment #7)
> Should be fixed by nmstate-0.2.5-1.el8 which is included in RHEl 8.2

Unfortunately not, anaconda is using NetworkManager directly without nmstate.
This means if we want to change the behavior, anaconda has to create a NetworkManager config similar to the one nmstate is creating.

Comment 10 Thomas Haller 2020-05-06 19:15:22 UTC
(In reply to Dominik Holler from comment #6)
> Thomas, is this the intended behavior of NetworkManager?

I don't fully understand the question. But yes, it seems intended. If you enable DHCP on a device and DHCP fails, the device goes down.

As we discussed several weeks ago, that depends on some circumstances like ipv4.may-fail/ipv6.may-fail settings (and the ipv4.dhcp-timeout and ipv6.ra-timeout)... but yes. Seems intended.

If something is unclear, please provide full level=TRACE logs.(In reply to Nir Levy from comment #9)

Comment 12 Radek Vykydal 2020-05-13 10:07:12 UTC
Anaconda is just copying ifcfg files created during configuration in Anaconda GUI in NetworkManager Connection Editor to installed system, so Anaconda does not create the ifcfg files in this case.

The only thing that could be possibly caused/fixed by Anaconda that comes to my mind could be interference with some other ifcfg files created during installation (like default ifcfg files for devices) but from the logs from the Description it does not seem to be the case.

I think we can learn more only from the logs requested in comment #10.

Comment 13 Qin Yuan 2020-05-15 06:49:25 UTC
Created attachment 1688793 [details]
level=TRACE log

Attached level=TRACE log

Comment 14 Sandro Bonazzola 2020-05-26 07:44:15 UTC
Radek, can you please have a look at provided logs?

Comment 15 Radek Vykydal 2020-05-26 08:22:40 UTC
I think we need NM eyes here.

Comment 16 Sandro Bonazzola 2020-06-09 07:38:34 UTC
thaller can you please have a look?

Comment 17 Thomas Haller 2020-06-09 11:16:30 UTC
from the log in comment 13:


<debug> [1589550235.7710] ++ connection.id             = 'VLAN connection 1'
<debug> [1589550235.7710] ++ connection.permissions    = []
<debug> [1589550235.7710] ++ connection.type           = 'vlan'
<debug> [1589550235.7710] ++ connection.uuid           = 'b6f95590-fa05-4b89-bcc9-64ee2c2ced6f'
...
<debug> [1589550235.7711] ++ vlan.id                   = 50
<debug> [1589550235.7711] ++ vlan.ingress-priority-map = []
<debug> [1589550235.7711] ++ vlan.parent               = '7a9a8c92-ec6d-420f-96a8-cff4086b3534'
<debug> [1589550235.7711] ++ ipv4                      [ 0x55d137d0a1e0 ]
<debug> [1589550235.7711] ++ ipv4.addresses            = ((GPtrArray*) 0x55d137d5c060)
<debug> [1589550235.7711] ++ ipv4.dns                  = []
<debug> [1589550235.7711] ++ ipv4.dns-search           = []
<debug> [1589550235.7712] ++ ipv4.method               = 'auto'
<debug> [1589550235.7713] ++ ipv4.routes               = ((GPtrArray*) 0x55d137d62880)
<debug> [1589550235.7713] ++ ipv4.routing-rules        = <unknown>
<debug> [1589550235.7713] ++ ipv6                      [ 0x55d137cca530 ]
<debug> [1589550235.7713] ++ ipv6.addresses            = ((GPtrArray*) 0x55d137d61b20)
<debug> [1589550235.7713] ++ ipv6.dns                  = []
<debug> [1589550235.7713] ++ ipv6.dns-search           = []
<debug> [1589550235.7713] ++ ipv6.ip6-privacy          = ((NMSettingIP6ConfigPrivacy) NM_SETTING_IP6_CONFIG_PRIVACY_DISABLED)
<debug> [1589550235.7714] ++ ipv6.method               = 'auto'
<debug> [1589550235.7714] ++ ipv6.routes               = ((GPtrArray*) 0x55d137d5c0e0)
<debug> [1589550235.7714] ++ ipv6.routing-rules        = <unknown>
...
<info>  [1589550239.5732] policy: auto-activating connection 'VLAN connection 1' (b6f95590-fa05-4b89-bcc9-64ee2c2ced6f)
...
<info>  [1589550239.5872] device (bond0.50): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
...
<warn>  [1589550284.8424] dhcp4 (bond0.50): request timed out
<info>  [1589550284.8428] dhcp4 (bond0.50): state changed unknown -> timeout
...
<info>  [1589550284.8430] device (bond0.50): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
...
<warn>  [1589550284.8449] device (bond0.50): Activation: failed for connection 'VLAN connection 1'


So far, so expected. You configured a VLAN profile that should DHCP, it timed out and failed. The solution for this problem is: don't configure the profile this way, if that is not the correct solution for your setup.



However, then we would exepect that the profile keeps trying to autoconnect indefinetly. That doesn't seem to happen right:

<info>  [1589550284.8644] policy: auto-activating connection 'VLAN connection 1' (b6f95590-fa05-4b89-bcc9-64ee2c2ced6f)
...
<debug> [1589550284.8869] device[5b394a254974fbfe] (bond0.50): parent: clear
<debug> [1589550284.8875] device[5b394a254974fbfe] (bond0.50): unmanaged: flags set to [platform-init,!sleeping,!parent,!by-type,!user-explicit,!user-settings=0x10/0x7d/unmanaged/unrealize>
<debug> [1589550284.8876] device[5b394a254974fbfe] (bond0.50): unmanaged: flags set to [platform-init,!sleeping,!user-settings=0x10/0x51/unmanaged/unrealized], forget [parent,by-type,user->
<info>  [1589550284.8876] device (bond0.50): state change: disconnected -> unmanaged (reason 'user-requested', sys-iface-state: 'managed')


That's odd.

Comment 18 Thomas Haller 2020-06-09 11:33:13 UTC
do you need this bug report to investigate why the profile was configured in the way it is (when it possibly shouldn't be)?

Depending on that, I either clone or reassign the bug, to check why the autoconnect doesn't work.

Comment 19 Sandro Bonazzola 2020-06-09 14:45:17 UTC
(In reply to Thomas Haller from comment #18)
> do you need this bug report to investigate why the profile was configured in
> the way it is (when it possibly shouldn't be)?
> 
> Depending on that, I either clone or reassign the bug, to check why the
> autoconnect doesn't work.

Please clone, we still need some investigation on how we get profile configured in this way.
As far as I understood, anaconda allowed the profile to be configured like this and if this is not supposed to be the right configuration we may need to work with anaconda team to prevent this configuration to be selected.

Comment 20 Sandro Bonazzola 2020-06-16 07:40:29 UTC
Moving to anaconda for checking the profile generation here. In RHEL 7 this worked fine.

Comment 21 Radek Vykydal 2020-06-16 07:58:51 UTC
Based on comment #12 and comment #18 I am reassigning to NetworkManager / Thomas for checking why the autoconnection does not work.

@Sandro: I think there was no change on Anaconda side between RHEL7 and RHEL8 regarding this kind of configuration, and the profile is generated by nm-c-e/NetworkManager. As I understand what Thomas said - the configuration passed to installed system by Anaconda seems OK/expected. I think we need to find out why the configuration works in Anaconda environment but does not work on installed system.

Comment 26 Beniamino Galvani 2020-08-26 14:47:19 UTC
Hi Patrick,

nice investigation!

As you have found, the problem is that we don't schedule an
activation-check after the device is deleted (in fact we schedule it
when the device is still being deleted, which is the wrong time).

I bisected the regression to commit
d35d3c468a304c3e0e78b4b068d105b1d753876c, which is large rework; it's
not yet clear to me what part of that commit caused the regression.

Comment 27 Beniamino Galvani 2020-08-27 14:20:45 UTC
I have opened a merge request with a possible fix at:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/613/commits

    device: fix autoactivating virtual devices after a failure

    When a virtual device fails, its state goes to FAIL and then
    DISCONNECTED. In DISCONNECTED we call schedule_activate_check() to
    schedule an auto-activation if needed. We also schudule the deletion
    of the link through delete_on_deactivate_check_and_schedule(). The
    auto-activation attempt fails because the link deletion unmanages the
    device; as a result, the device doesn't try to auto-activate again.

    To fix this:

     - don't allow the device to auto-activate if the device deletion is
       pending;

     - check again if the device can be auto-activated after its deletion.

Comment 28 Beniamino Galvani 2020-09-18 10:05:29 UTC
Created attachment 1715342 [details]
Reproducer

Comment 34 Antonio Cardace 2020-10-08 12:10:56 UTC
*** Bug 1879003 has been marked as a duplicate of this bug. ***

Comment 39 errata-xmlrpc 2021-05-18 13:29:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: NetworkManager and libnma security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1574


Note You need to log in before you can comment on or make changes to this bug.