Bug 1970021

Summary: nmstate does not persist its configuration due to overlay systemd-connections-merged mount
Product: OpenShift Container Platform Reporter: Pablo Alonso Rodriguez <palonsor>
Component: NetworkingAssignee: Yossi Boaron <yboaron>
Networking sub component: kubernetes-nmstate-operator QA Contact: Aleksandra Malykhin <amalykhi>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, bbennett, bnemec, gvillani, krzysztof.cieplucha, mhatem, obockows, palonsor, sdodson, vcojot, vvoronko, wrussell, yboaron, yyacoub
Version: 4.7Keywords: Triaged
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:03:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2013034    
Bug Blocks:    

Description Pablo Alonso Rodriguez 2021-06-09 16:12:00 UTC
Description of problem:

RHCOS performs an overlay mount at NetworkManager configuration directory so that changes at NetworkManager level are never persisted.

The overlay mount looks like this:

$ mount | grep NetworkManager
overlay on /etc/NetworkManager/system-connections-merged type overlay (rw,relatime,seclabel,lowerdir=/etc/NetworkManager/system-connections,upperdir=/run/nm-system-connections,workdir=/run/nm-system-connections-work)

And NetworkManager is configured to store its connections configurations at this directory as per this config drop-in:

$ cat /etc/NetworkManager/conf.d/99-keyfiles.conf 
[keyfile]
path=/etc/NetworkManager/system-connections-merged

The problem is that if networking configuration made by nmstate is not persisted and such configuration is required the next time that a node boots in order for it to be able to reach the kube-apiserver, then that node won't be able to start the nmstate handler pod so that its networking gets confirured.

Version-Release number of selected component (if applicable):

4.7

How reproducible:

Always if your node requires the networking configuration provided via nmstate to reach the kube-apiserver

Steps to Reproduce:
1. Reboot the node if it requires the networking configuration provided via nmstate to reach the kube-apiserver
2.
3.

Actual results:

No network connectivity, node not ready

Expected results:

Network connectivity, node ready

Additional info:

As per this BZ (https://bugzilla.redhat.com/show_bug.cgi?id=1916363), it seems that other components like OVN faced situations where copying from /etc/NetworkManager/system-connections-merged back to /etc/NetworkManager/system-connections was required.

Comment 10 Ben Nemec 2021-10-05 21:53:46 UTC
*** Bug 2005792 has been marked as a duplicate of this bug. ***

Comment 11 Derrick Ornelas 2021-10-28 14:45:10 UTC
*** Bug 2017304 has been marked as a duplicate of this bug. ***

Comment 13 Ben Nemec 2022-01-07 15:53:04 UTC
*** Bug 2037098 has been marked as a duplicate of this bug. ***

Comment 14 Krzysztof Cieplucha 2022-01-10 11:42:14 UTC
Not sure if this is the proper way to do it, but I found below workaround to work for me for OpenShift 4.7.38 on bare metal: 

After you finish setting network configuration via nmcli, just copy proper files with your connection profiles from the /etc/NetworkManager/systemConnectionsMerged/ directory back to /etc/NetworkManager/system-connections/
The coreos-installer tool with -n option does the same thing (this is mentioned in official documentation: https://docs.openshift.com/container-platform/4.7/installing/installing_bare_metal/installing-bare-metal-network-customizations.html)

Comment 15 Ben Nemec 2022-01-17 20:44:54 UTC
That is also an option (in fact the configure-ovs script does exactly that), but it's a little more error-prone since it requires a manual step after doing the network configuration.

Comment 18 errata-xmlrpc 2022-03-10 16:03:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056