Bug 1760262

Summary: Bridge linux profile is not activated and stuck in connecting state after reboot
Product: [oVirt] vdsm Reporter: Michael Burman <mburman>
Component: GeneralAssignee: Dominik Holler <dholler>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: high    
Version: 4.40.0CC: aloughla, atragler, bgalvani, bugs, danken, dholler, fgiudici, jcall, lrintel, mperina, rkhan, sukulkar, thaller
Target Milestone: ovirt-4.4.0Flags: mperina: ovirt-4.4?
Target Release: 4.40.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-20 20:01:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1741792, 1756944, 1762028    
Bug Blocks:    
Attachments:
Description Flags
reproduction and workaround on plain RHEL 8.1 without RHV none

Description Michael Burman 2019-10-10 09:38:23 UTC
Description of problem:
Bridge linux profile is not activated and stuck in connecting state after reboot

After rebooting a host with latest NM 1.20.0-3.el8.x86_64, the linux bridge that was active before reboot, stay as connecting state after reboot and effects RHV hosts. 

before reboot:
NAME       UUID                                  TYPE      DEVICE    
ovirtmgmt  23aeb48d-c4f6-4cdc-ae2c-c268c2fb2159  bridge    ovirtmgmt 
ens4f0     e47db561-27c5-4399-a553-39d694a6b932  ethernet  ens4f0 

ovirtmgmt    bridge    connected  ovirtmgmt

after reboot:
ovirtmgmt    bridge    connecting (getting IP configuration)  ovirtmgmt

This cause the network configuration on the host to break after reboot. 

Version-Release number of selected component (if applicable):
NetworkManager-1.20.0-3.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Add rhel8.1 host with NM 1.20.0-3.el8.x86_64 to RHV manager
2. Reboot host

Actual results:
linux bridge(ovirtmgmt) stuck in connecting state

Expected results:
linux bridge(ovirtmgmt) should be connected

Comment 2 Thomas Haller 2019-10-10 10:06:23 UTC
Two issues:


For one, in the log you see multiple connection profiles. For example connection b967a965-4bd2-4fd8-99b2-b6d81d27cc7a from /etc/sysconfig/network-scripts/ifcfg-ens4f0. That is a regular ethernet profile, not a slave profile for a bridge (it has no "connection.master" property set). Overall, there is no available slave profile for device "ens4f0" that would be suitable slave profile. I don't know what the state was before reboot, but obviously, if you don't persist suitable profiles before rebooting, it's not gonna work afterwards.

I would guess, that e47db561-27c5-4399-a553-39d694a6b932 was only-in-memory. Check the path with `nmcli -f all connection`, if it's /run/NetworkManager/system-conections, then it's in-memory and will be lost after reboot.




A second problem is 

  Oct 09 23:35:07 localhost.localdomain dhclient[1434]: DHCPDISCOVER on ens4f0 to 255.255.255.255 port 67 interval 7 (xid=0xbca830c)

This is dracut/initrd, which configures the interface by running dhclient on it.

Later, when NM starts, it sees that the interface "ens4f0" is already pre-configured by something else. This results in

  <info>  [1570693052.3064] manager: (ens4f0): assume: will attempt to assume matching connection 'ens4f0' (b967a965-4bd2-4fd8-99b2-b6d81d27cc7a) (guessed)

this means, that NetworkManager will try to gracefully take over the pre-configured device with the plain ethernet connection. Though, that will not work very well, and I don't think that is what is intended. This behaviour where initrd preconfigures the device with dhclient and passes it to later boot (NetworkManager) has many issues. In rhel-8.2, those will be solved by also running NetworkManager in initrd.

Comment 3 Thomas Haller 2019-10-10 14:26:59 UTC
Turned out, that dracut was overwriting /etc/sysconfig/network-scripts/ifcfg-ens4f0 file during boot.

Comment 4 Thomas Haller 2019-10-14 08:46:08 UTC
I don't think this is a bug in NetworkManager.

What do you think? Can we close or reassign this?

Comment 7 Dan Kenigsberg 2019-10-15 18:31:03 UTC
(In reply to Thomas Haller from comment #3)
> Turned out, that dracut was overwriting
> /etc/sysconfig/network-scripts/ifcfg-ens4f0 file during boot.

Thomas, are you saying that dracut that we had a proper ifcfg-ens4f0 as a bridge slave, but dracut unilaterally overwritten it with something else?

Would you elaborate so this bug can be moved to the offending component?

Comment 8 Thomas Haller 2019-10-15 19:42:21 UTC
> Thomas, are you saying that dracut that we had a proper ifcfg-ens4f0 as a bridge slave, but dracut unilaterally overwritten it with something else?

That's what I am saying.

> Would you elaborate so this bug can be moved to the offending component?

The system is configured to do rd.neednet=1. Dracut does what it is requested to do.

I don't know the offending comment. When installing the image, look at the resulting system for what is installed and configured. Then see where that configuration comes from.

Comment 9 Dominik Holler 2019-10-25 13:11:54 UTC
Created attachment 1629177 [details]
reproduction and workaround on plain RHEL 8.1 without RHV

Red Hat Knowledge Base (Solution) 3017441 describes the behavior,
the suggested workaround
echo 'omit_dracutmodules+="ifcfg"' >>  /etc/dracut.conf.d/99-disable_ifcfg.conf
works for me.

Comment 10 Dominik Holler 2019-11-04 09:16:04 UTC
The ifcfg files are protected by the fix, but I am interested to know if there are flows which results in an unexpected run time state.

Comment 11 Michael Burman 2019-11-20 09:26:06 UTC
vdsm-4.40.0-141.gitb9d2120.el8ev.x86_64 that was shipped with 4.4.0-5 doesn't includes the desired fix.
moving back to MODIFIED

Comment 12 Michael Burman 2019-12-15 16:17:21 UTC
Verified on - vdsm-4.40.0-164.git38a19bb.el8ev.x86_64 with rhvm-4.4.0-0.9.master.el7.noarch

Comment 13 Sandro Bonazzola 2020-05-20 20:01:50 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.