Description of problem: On a linux bridge defined by nmstate, sometimes we see traces of what looks like networkmanager trying to apply a setting that triggers a complete rebuild of that bridge. It is extremely problematic because that bridge has also pod interfaces attached by multus. Multus does not currently have the capacity to heal this (Multus is looking whether they can add a feature to heal this, but currently it is a missing feature). So basically we need to understand why the bridge is rebuilt and whether that can be indicative of a problem in nmstate (or elsewhere). No changes were done that could trigger such rebuild (as far as we are aware). Even inspecting the audit logs did not reveal any nmstate-related resource to be updated by the time of the changes. Version-Release number of selected component (if applicable): 4.8 How reproducible: Sometimes Steps to Reproduce: 1. Wait for it to happen without making changes 2. 3. Actual results: Bridge rebuilt Expected results: Bridge not rebuilt or some way to avoid it. Additional info: I'll be making several internal comments with all the concrete details. Please bear with me while I do so.
Dropping priority since kubernetes-nmstate is TP in 4.8. We will look at this, but it can't be urgent priority.
Was this issue encountered during upgrade of CNV from 2.6 to 4.8 or much later?
Nevermind, was too quick to comment.
Following is a general summary of the issue using br1 as the name of the bridge 1- Bridge is created with nncp nmstate create the bridge and later on a script use iptools to set vlan-filtering so we have nmcli vlan-filtering=no and kernel vlan-filtering=yes $ nmcli c show br1 |grep vlan bridge.vlan-filtering: no $ bridge vlan port vlan-id br1 1 PVID Egress Untagged 2. Then veth are attached, nothing change on the bridges apart from the new port 3. Later on Reconcile cycle re-apply the nncp and the kernel vlan-filtering argument is found and bridge reconstructed. $ nmcli c show br1 |grep vlan bridge.vlan-filtering: yes Again all this is fixed with patches at v0.47.11 branch.
Removing the dependency on the nmstate bug because it makes bz bot unhappy. I don't believe we're actually dependent on that being fixed anyway.
Cluster version is 4.8.0-0.nightly-2022-06-17-175848 kubernetes-nmstate-operator.4.8.0-202206151838 1. Create the bridge by applying the nncp 2. relabel a node so the NNCP get reconcile again Result: the bridge is not recreated? no issues or errors found Problem was that the first time the vlan-filtering flag is set with netlink (using iptools) with knmstate after bridge is created so NM does not see it, but after kubernetes-nmstate handler NNCP Reconcile (forced by labeling a node) NetworkManager see the vlan-filtering flag previously set and re-create the bridge removing the veths
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.45 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5167