Bug 2092355 - Linux bridge managed by nmstate is rebuilt for an unknown reason
Summary: Linux bridge managed by nmstate is rebuilt for an unknown reason
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.z
Assignee: Ben Nemec
QA Contact: Aleksandra Malykhin
URL:
Whiteboard:
Depends On: 2097396
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-01 10:29 UTC by Pablo Alonso Rodriguez
Modified: 2022-08-08 12:43 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2097396 (view as bug list)
Environment:
Last Closed: 2022-06-30 16:35:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes-nmstate pull 278 0 None open [release-4.8] Bug 2092355: Rebase to 0.47.11 2022-06-15 15:33:24 UTC
Red Hat Bugzilla 2092762 1 urgent CLOSED Changing vlan-filtering, stp or mtu on bridge detachs unmanaged ports 2023-07-18 12:58:17 UTC

Internal Links: 2092762 2093941

Description Pablo Alonso Rodriguez 2022-06-01 10:29:03 UTC
Description of problem:

On a linux bridge defined by nmstate, sometimes we see traces of what looks like networkmanager trying to apply a setting that triggers a complete rebuild of that bridge.

It is extremely problematic because that bridge has also pod interfaces attached by multus. Multus does not currently have the capacity to heal this (Multus is looking whether they can add a feature to heal this, but currently it is a missing feature).

So basically we need to understand why the bridge is rebuilt and whether that can be indicative of a problem in nmstate (or elsewhere).

No changes were done that could trigger such rebuild (as far as we are aware). Even inspecting the audit logs did not reveal any nmstate-related resource to be updated by the time of the changes.

Version-Release number of selected component (if applicable):

4.8

How reproducible:

Sometimes

Steps to Reproduce:
1. Wait for it to happen without making changes
2.
3.

Actual results:

Bridge rebuilt

Expected results:

Bridge not rebuilt or some way to avoid it.

Additional info:

I'll be making several internal comments with all the concrete details. Please bear with me while I do so.

Comment 4 Ben Nemec 2022-06-01 14:40:43 UTC
Dropping priority since kubernetes-nmstate is TP in 4.8. We will look at this, but it can't be urgent priority.

Comment 13 Petr Horáček 2022-06-03 07:14:58 UTC
Was this issue encountered during upgrade of CNV from 2.6 to 4.8 or much later?

Comment 14 Petr Horáček 2022-06-03 07:18:15 UTC
Nevermind, was too quick to comment.

Comment 47 Quique Llorente 2022-06-09 09:14:44 UTC
Following is a general summary of the issue using br1 as the name of the bridge 

1- Bridge is created with nncp nmstate create the bridge and later on a script use iptools to set vlan-filtering so we have nmcli vlan-filtering=no and kernel vlan-filtering=yes

$ nmcli c show  br1 |grep vlan
bridge.vlan-filtering:                  no

$ bridge vlan
port              vlan-id  
br1            1 PVID Egress Untagged


2. Then veth are attached, nothing change on the bridges apart from the new port

3. Later on Reconcile cycle re-apply the nncp and the kernel vlan-filtering argument is found and bridge reconstructed.

$ nmcli c show  br1 |grep vlan
bridge.vlan-filtering:                  yes

Again all this is fixed with patches at v0.47.11 branch.

Comment 49 Ben Nemec 2022-06-15 15:32:51 UTC
Removing the dependency on the nmstate bug because it makes bz bot unhappy. I don't believe we're actually dependent on that being fixed anyway.

Comment 51 Aleksandra Malykhin 2022-06-20 12:20:06 UTC
Cluster version is 4.8.0-0.nightly-2022-06-17-175848
kubernetes-nmstate-operator.4.8.0-202206151838

1. Create the bridge by applying the nncp 
2. relabel a node so the NNCP get reconcile again


Result:
the bridge is not recreated? no issues or errors found

Problem was that the first time the vlan-filtering flag is set with netlink (using iptools) with knmstate after bridge is created so NM does not see it, but after kubernetes-nmstate handler NNCP Reconcile (forced by labeling a node) NetworkManager see the vlan-filtering flag previously set and re-create the bridge removing the veths

Comment 56 errata-xmlrpc 2022-06-30 16:35:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.45 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5167


Note You need to log in before you can comment on or make changes to this bug.