Bug 1952229

Summary: After reboot, OvS interfaces become unmanaged in NetworkManager
Product: Red Hat Enterprise Linux 8 Reporter: David Critch <dcritch>
Component: NetworkManagerAssignee: NetworkManager Development Team <nm-team>
Status: CLOSED NOTABUG QA Contact: Desktop QE <desktop-qa-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.3CC: acardace, atragler, bgalvani, danw, fge, lrintel, phoracek, rkhan, sukulkar, till
Target Milestone: betaFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-27 01:58:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1899057    

Description David Critch 2021-04-21 19:51:04 UTC
Description of problem:
For an OpenShift customer, we created a script to create an OvS bridge on top of an OvS bond to satisfy their networking requirements.

This script was running fine up in OpenShift for versions up to and including 4.7.0. However a recent OpenShift update (at least as early as 4.7.5) has caused this script to break.


Version-Release number of selected component (if applicable):
$ rpm -qa | grep NetworkManager
NetworkManager-1.26.0-14.1.rhaos4.7.el8.x86_64
NetworkManager-libnm-1.26.0-14.1.rhaos4.7.el8.x86_64
NetworkManager-ovs-1.26.0-14.1.rhaos4.7.el8.x86_64
NetworkManager-team-1.26.0-14.1.rhaos4.7.el8.x86_64
NetworkManager-tui-1.26.0-14.1.rhaos4.7.el8.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
openvswitch2.13-2.13.0-79.6.el8fdp.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Deploy an OpenShift 4.7.6 cluster
2. Run ovs-setup.sh script on worker node
3. Reboot node

Actual results:
The bridge/bond is fine after running the script, but upon a reboot, the interfaces do not come up. NetworkManager lists them as unmanaged:
DEVICE           TYPE           STATE                                  CONNECTION       
enx1c4024995c56  ethernet       connecting (getting IP configuration)  Wired Connection 
enx1c4024995c53  ethernet       disconnected                           --               
enx1c4024995c55  ethernet       disconnected                           --               
enx1c4024995c57  ethernet       disconnected                           --               
enx1c4024995c58  ethernet       disconnected                           --               
enx1c4024995d23  ethernet       disconnected                           --               
enx1c4024995d25  ethernet       disconnected                           --               
vxlan_sys_4789   vxlan          disconnected                           --               
enx1c4024995c51  ethernet       unavailable                            --               
lo               loopback       unmanaged                              --               
br0              ovs-bridge     unmanaged                              --               
brcnv            ovs-bridge     unmanaged                              --               
br0              ovs-interface  unmanaged                              --               
tun0             ovs-interface  unmanaged                              --               
bond0            ovs-port       unmanaged                              --               
br0              ovs-port       unmanaged                              --               
tun0             ovs-port       unmanaged                              --               
veth25e678c9     ovs-port       unmanaged                              --               
veth5ff54d07     ovs-port       unmanaged                              --               
veth84d00b62     ovs-port       unmanaged                              --               
veth95d9553b     ovs-port       unmanaged                              --               
vxlan0           ovs-port       unmanaged                              --            

Expected results:
After reboot, the bridge and bond persist:
DEVICE           TYPE           STATE         CONNECTION                
brcnv            ovs-interface  connected     brcnv-iface               
enx1c4024995c56  ethernet       connected     ovs-slave-enx1c4024995c56 
enx1c4024995c58  ethernet       connected     ovs-slave-enx1c4024995c58 
brcnv            ovs-bridge     connected     ovs-bridge-brcnv          
bond0            ovs-port       connected     ovs-slave-bond0           
brcnv-port       ovs-port       connected     ovs-slave-brcnv-port      
enx1c4024995c53  ethernet       disconnected  --                        
enx1c4024995c55  ethernet       disconnected  --                        
enx1c4024995c57  ethernet       disconnected  --                        
enx1c4024995d23  ethernet       disconnected  --                        
enx1c4024995d25  ethernet       disconnected  --                        
vxlan_sys_4789   vxlan          disconnected  --                        
enx1c4024995c51  ethernet       unavailable   --                        
veth25e678c9     ethernet       unmanaged     --                        
veth5ff54d07     ethernet       unmanaged     --                        
veth84d00b62     ethernet       unmanaged     --                        
veth95d9553b     ethernet       unmanaged     --                        
lo               loopback       unmanaged     --                        
br0              ovs-bridge     unmanaged     --                        
br0              ovs-interface  unmanaged     --                        
tun0             ovs-interface  unmanaged     --                        
br0              ovs-port       unmanaged     --                        
tun0             ovs-port       unmanaged     --                        
veth25e678c9     ovs-port       unmanaged     --                        
veth5ff54d07     ovs-port       unmanaged     --                        
veth84d00b62     ovs-port       unmanaged     --                        
veth95d9553b     ovs-port       unmanaged     --                        
vxlan0           ovs-port       unmanaged     --                   

Additional info:
networkType: OpenShiftSDN

Comment 3 David Critch 2021-04-21 20:09:37 UTC
Notable timestamps:
initial creation @ create @ Apr 21 19:11:52
reboot @ reboot @ Apr 21 19:15:11

and a snippet from the log after reboot:
Apr 21 19:15:12 localhost NetworkManager[1965]: <trace> [1619032512.9305] ovsdb: added a bridge: brcnv, 6a343abd-06c6-4074-aacc-96b70815c37c
Apr 21 19:15:12 localhost NetworkManager[1965]: <debug> [1619032512.9306] device[9cfd1f45bee63753] (brcnv): constructed (NMDeviceOvsBridge)
Apr 21 19:15:12 localhost NetworkManager[1965]: <debug> [1619032512.9306] device[9cfd1f45bee63753] (brcnv): start setup of NMDeviceOvsBridge, kernel ifindex 0
Apr 21 19:15:12 localhost NetworkManager[1965]: <debug> [1619032512.9307] device[9cfd1f45bee63753] (brcnv): unmanaged: flags set to [platform-init,external-down=0x810/0x810/unmanaged/unrealized], set-unmanaged [external-down=0x800])
Apr 21 19:15:12 localhost NetworkManager[1965]: <debug> [1619032512.9307] device[9cfd1f45bee63753] (brcnv): unmanaged: flags set to [platform-init,external-down,!by-type=0x810/0x818/unmanaged/unrealized], set-managed [by-type=0x8])
Apr 21 19:15:12 localhost NetworkManager[1965]: <debug> [1619032512.9307] device[9cfd1f45bee63753] (brcnv): unmanaged: flags set to [platform-init,external-down,!by-type,!user-settings=0x810/0x858/unmanaged/unrealized], set-managed [user-settings=0x40])
Apr 21 19:15:12 localhost NetworkManager[1965]: <debug> [1619032512.9308] device[9cfd1f45bee63753] (brcnv): unmanaged: flags set to [external-down,!by-type,!platform-init,!user-settings=0x800/0x858/manageable/unrealized], set-managed [platform-init=0x10])
Apr 21 19:15:12 localhost NetworkManager[1965]: <debug> [1619032512.9308] device[9cfd1f45bee63753] (brcnv): unmanaged: flags set to [external-down,!sleeping,!by-type,!platform-init,!user-settings=0x800/0x859/manageable/unrealized], set-managed [sleeping=0x1])
Apr 21 19:15:12 localhost NetworkManager[1965]: <trace> [1619032512.9308] dbus-object[9cfd1f45bee63753]: export: "/org/freedesktop/NetworkManager/Devices/21"
Apr 21 19:15:12 localhost NetworkManager[1965]: <info>  [1619032512.9312] manager: (brcnv): new Open vSwitch Bridge device (/org/freedesktop/NetworkManager/Devices/21)

Comment 4 Beniamino Galvani 2021-04-22 12:36:52 UTC
It looks like after the reboot at 19:15:11 all connections have disappeared except this one:

/etc/NetworkManager/system-connections-merged/default_connection.nmconnection

David, do you know why?

Comment 5 David Critch 2021-04-22 12:50:54 UTC
No idea.

I just tested on a fresh node, and after running the script but before rebooting, that is still the only connection in that directory.

Comment 6 Beniamino Galvani 2021-04-22 19:59:08 UTC
That seems a problem in the underlying storage (which is, if I understand well, a overlay filesystem).

Comment 7 David Critch 2021-04-23 19:08:55 UTC
Interesting...

I went back to 4.7.0 (NM versions below, but might be irrelevant)
NetworkManager-1.26.0-12.1.rhaos4.7.el8.x86_64
NetworkManager-ovs-1.26.0-12.1.rhaos4.7.el8.x86_64
NetworkManager-tui-1.26.0-12.1.rhaos4.7.el8.x86_64
NetworkManager-libnm-1.26.0-12.1.rhaos4.7.el8.x86_64
NetworkManager-team-1.26.0-12.1.rhaos4.7.el8.x86_64


All the files are there under /etc/NetworkManager/system-connections upon initial creation and after a reboot, and there is no `etc/NetworkManager/system-connections-merge` at all.

Comment 8 Petr Horáček 2021-04-26 16:56:51 UTC
I believe there was a change in RHCOS, making network profiles non persistent, so all of the day-1 configuration must be delivered through Ignition. Would it be possible to change the configuration script to run on every boot?

Comment 9 David Critch 2021-04-26 21:11:16 UTC
As Ben said, that directory is mounted via overlayfs:
overlay on /etc/NetworkManager/system-connections-merged type overlay (rw,relatime,seclabel,lowerdir=/etc/NetworkManager/system-connections,upperdir=/run/nm-system-connections,workdir=/run/nm-system-connections-work)

After running the script, if I copy all the files from the upper directory to the lower one, it all persists after reboot, but I'm worried that isn't the right approach, or could get overwritten in an update.

I can test running the script on every boot, but the issue is that even though NetworkManager forgets about them, they are still in the ovsdb. I suspect OVS will bark about it. I can modify to delete all the stuff in ovs first.

I'll report back tomorrow.

Comment 10 David Critch 2021-04-26 23:49:49 UTC
Okay, I tweaked the script slightly (just added a `ovs-vsctl --if-exists del-br brcnv` in there near the top) and that seems to work fine.

I guess we can close this as NOTABUG since I'm figuring they had good reasons to make the change on the OpenShift/CoreOS side?

Comment 11 Gris Ge 2021-04-27 01:58:08 UTC
Closing as previous comment.

Please feel free to reopen if you think further work is required.

Comment 12 Gris Ge 2021-05-06 06:59:35 UTC
*** Bug 1899745 has been marked as a duplicate of this bug. ***