Hide Forgot
Description of problem: Bridge linux profile is not activated and stuck in connecting state after reboot After rebooting a host with latest NM 1.20.0-3.el8.x86_64, the linux bridge that was active before reboot, stay as connecting state after reboot and effects RHV hosts. before reboot: NAME UUID TYPE DEVICE ovirtmgmt 23aeb48d-c4f6-4cdc-ae2c-c268c2fb2159 bridge ovirtmgmt ens4f0 e47db561-27c5-4399-a553-39d694a6b932 ethernet ens4f0 ovirtmgmt bridge connected ovirtmgmt after reboot: ovirtmgmt bridge connecting (getting IP configuration) ovirtmgmt This cause the network configuration on the host to break after reboot. Version-Release number of selected component (if applicable): NetworkManager-1.20.0-3.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. Add rhel8.1 host with NM 1.20.0-3.el8.x86_64 to RHV manager 2. Reboot host Actual results: linux bridge(ovirtmgmt) stuck in connecting state Expected results: linux bridge(ovirtmgmt) should be connected
Two issues: For one, in the log you see multiple connection profiles. For example connection b967a965-4bd2-4fd8-99b2-b6d81d27cc7a from /etc/sysconfig/network-scripts/ifcfg-ens4f0. That is a regular ethernet profile, not a slave profile for a bridge (it has no "connection.master" property set). Overall, there is no available slave profile for device "ens4f0" that would be suitable slave profile. I don't know what the state was before reboot, but obviously, if you don't persist suitable profiles before rebooting, it's not gonna work afterwards. I would guess, that e47db561-27c5-4399-a553-39d694a6b932 was only-in-memory. Check the path with `nmcli -f all connection`, if it's /run/NetworkManager/system-conections, then it's in-memory and will be lost after reboot. A second problem is Oct 09 23:35:07 localhost.localdomain dhclient[1434]: DHCPDISCOVER on ens4f0 to 255.255.255.255 port 67 interval 7 (xid=0xbca830c) This is dracut/initrd, which configures the interface by running dhclient on it. Later, when NM starts, it sees that the interface "ens4f0" is already pre-configured by something else. This results in <info> [1570693052.3064] manager: (ens4f0): assume: will attempt to assume matching connection 'ens4f0' (b967a965-4bd2-4fd8-99b2-b6d81d27cc7a) (guessed) this means, that NetworkManager will try to gracefully take over the pre-configured device with the plain ethernet connection. Though, that will not work very well, and I don't think that is what is intended. This behaviour where initrd preconfigures the device with dhclient and passes it to later boot (NetworkManager) has many issues. In rhel-8.2, those will be solved by also running NetworkManager in initrd.
Turned out, that dracut was overwriting /etc/sysconfig/network-scripts/ifcfg-ens4f0 file during boot.
I don't think this is a bug in NetworkManager. What do you think? Can we close or reassign this?
(In reply to Thomas Haller from comment #3) > Turned out, that dracut was overwriting > /etc/sysconfig/network-scripts/ifcfg-ens4f0 file during boot. Thomas, are you saying that dracut that we had a proper ifcfg-ens4f0 as a bridge slave, but dracut unilaterally overwritten it with something else? Would you elaborate so this bug can be moved to the offending component?
> Thomas, are you saying that dracut that we had a proper ifcfg-ens4f0 as a bridge slave, but dracut unilaterally overwritten it with something else? That's what I am saying. > Would you elaborate so this bug can be moved to the offending component? The system is configured to do rd.neednet=1. Dracut does what it is requested to do. I don't know the offending comment. When installing the image, look at the resulting system for what is installed and configured. Then see where that configuration comes from.
Created attachment 1629177 [details] reproduction and workaround on plain RHEL 8.1 without RHV Red Hat Knowledge Base (Solution) 3017441 describes the behavior, the suggested workaround echo 'omit_dracutmodules+="ifcfg"' >> /etc/dracut.conf.d/99-disable_ifcfg.conf works for me.
The ifcfg files are protected by the fix, but I am interested to know if there are flows which results in an unexpected run time state.
vdsm-4.40.0-141.gitb9d2120.el8ev.x86_64 that was shipped with 4.4.0-5 doesn't includes the desired fix. moving back to MODIFIED
Verified on - vdsm-4.40.0-164.git38a19bb.el8ev.x86_64 with rhvm-4.4.0-0.9.master.el7.noarch
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.