Description of problem:
Bridge linux profile is not activated and stuck in connecting state after reboot
After rebooting a host with latest NM 1.20.0-3.el8.x86_64, the linux bridge that was active before reboot, stay as connecting state after reboot and effects RHV hosts.
NAME UUID TYPE DEVICE
ovirtmgmt 23aeb48d-c4f6-4cdc-ae2c-c268c2fb2159 bridge ovirtmgmt
ens4f0 e47db561-27c5-4399-a553-39d694a6b932 ethernet ens4f0
ovirtmgmt bridge connected ovirtmgmt
ovirtmgmt bridge connecting (getting IP configuration) ovirtmgmt
This cause the network configuration on the host to break after reboot.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Add rhel8.1 host with NM 1.20.0-3.el8.x86_64 to RHV manager
2. Reboot host
linux bridge(ovirtmgmt) stuck in connecting state
linux bridge(ovirtmgmt) should be connected
For one, in the log you see multiple connection profiles. For example connection b967a965-4bd2-4fd8-99b2-b6d81d27cc7a from /etc/sysconfig/network-scripts/ifcfg-ens4f0. That is a regular ethernet profile, not a slave profile for a bridge (it has no "connection.master" property set). Overall, there is no available slave profile for device "ens4f0" that would be suitable slave profile. I don't know what the state was before reboot, but obviously, if you don't persist suitable profiles before rebooting, it's not gonna work afterwards.
I would guess, that e47db561-27c5-4399-a553-39d694a6b932 was only-in-memory. Check the path with `nmcli -f all connection`, if it's /run/NetworkManager/system-conections, then it's in-memory and will be lost after reboot.
A second problem is
Oct 09 23:35:07 localhost.localdomain dhclient: DHCPDISCOVER on ens4f0 to 255.255.255.255 port 67 interval 7 (xid=0xbca830c)
This is dracut/initrd, which configures the interface by running dhclient on it.
Later, when NM starts, it sees that the interface "ens4f0" is already pre-configured by something else. This results in
<info> [1570693052.3064] manager: (ens4f0): assume: will attempt to assume matching connection 'ens4f0' (b967a965-4bd2-4fd8-99b2-b6d81d27cc7a) (guessed)
this means, that NetworkManager will try to gracefully take over the pre-configured device with the plain ethernet connection. Though, that will not work very well, and I don't think that is what is intended. This behaviour where initrd preconfigures the device with dhclient and passes it to later boot (NetworkManager) has many issues. In rhel-8.2, those will be solved by also running NetworkManager in initrd.
Turned out, that dracut was overwriting /etc/sysconfig/network-scripts/ifcfg-ens4f0 file during boot.
I don't think this is a bug in NetworkManager.
What do you think? Can we close or reassign this?
(In reply to Thomas Haller from comment #3)
> Turned out, that dracut was overwriting
> /etc/sysconfig/network-scripts/ifcfg-ens4f0 file during boot.
Thomas, are you saying that dracut that we had a proper ifcfg-ens4f0 as a bridge slave, but dracut unilaterally overwritten it with something else?
Would you elaborate so this bug can be moved to the offending component?
> Thomas, are you saying that dracut that we had a proper ifcfg-ens4f0 as a bridge slave, but dracut unilaterally overwritten it with something else?
That's what I am saying.
> Would you elaborate so this bug can be moved to the offending component?
The system is configured to do rd.neednet=1. Dracut does what it is requested to do.
I don't know the offending comment. When installing the image, look at the resulting system for what is installed and configured. Then see where that configuration comes from.
Created attachment 1629177 [details]
reproduction and workaround on plain RHEL 8.1 without RHV
Red Hat Knowledge Base (Solution) 3017441 describes the behavior,
the suggested workaround
echo 'omit_dracutmodules+="ifcfg"' >> /etc/dracut.conf.d/99-disable_ifcfg.conf
works for me.
The ifcfg files are protected by the fix, but I am interested to know if there are flows which results in an unexpected run time state.
vdsm-4.40.0-141.gitb9d2120.el8ev.x86_64 that was shipped with 4.4.0-5 doesn't includes the desired fix.
moving back to MODIFIED
Verified on - vdsm-4.40.0-164.git38a19bb.el8ev.x86_64 with rhvm-4.4.0-0.9.master.el7.noarch
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.
Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.