RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2049103 - Add option for OVS ofport_request to NM | NetworkManager restart replugs all OVS interfaces and leads to OVS port ID change and broken OpenFlows
Summary: Add option for OVS ofport_request to NM | NetworkManager restart replugs all ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: NetworkManager
Version: 9.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: NetworkManager Development Team
QA Contact: Vladimir Benes
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-01 14:46 UTC by Andreas Karis
Modified: 2023-05-09 10:22 UTC (History)
9 users (show)

Fixed In Version: NetworkManager-1.41.3-1.el9
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-09 08:17:27 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dispatcher pre-up script + analysis (437.37 KB, text/plain)
2022-02-03 19:56 UTC, Andreas Karis
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker NMT-234 0 None None None 2023-01-31 23:58:21 UTC
Red Hat Issue Tracker RHELPLAN-110541 0 None None None 2022-02-01 14:55:47 UTC
Red Hat Product Errata RHBA-2023:2485 0 None None None 2023-05-09 08:17:51 UTC
freedesktop.org Gitlab NetworkManager NetworkManager-ci merge_requests 1213 0 None merged ovs: add reboot to nmcli_add_openvswitch_ofport_request 2022-10-11 18:11:08 UTC
freedesktop.org Gitlab NetworkManager NetworkManager merge_requests 1322 0 None merged ovs: add ofport_request option to ovs interface 2022-10-18 08:01:51 UTC

Description Andreas Karis 2022-02-01 14:46:30 UTC
When NetworkManager is restarted, it will tear down and bring up all associated connections. In the case of OVS, it will delete all connections of interfaces to OVS. In practive this leads to an existing OVS bridge with completely new interface IDs and in turn this breaks existing, programmed OpenFlow rules that targeted the ports that were torn down and recreated.

Let's imagine an OVS bridge br-ex with an attached port ens5 and a flow `in_port=LOCAL, action=output:ens5`.
If the OVS ID of ens5 is 2, then the flow will be: `in_port=LOCAL, action=output:2`

When NetworkManager restarts, it removes ens5 from br-ex and adds it again. This time, ens5 will have an ID of e.g. 3. However, the flows that were programmed to br-ex are still the same and point to non-existing port 2, thus breaking all flows that originate from port LOCAL.

OVS flows are programmed by port ID with the exception of the LOCAL match (the bridge's internal port). So, flows that were programmed earlier to steer packets coming from in_port=LOCAL to out_port=x will break after a NetworkManager restart, because ens5 is no longer at OVS internal id x but at a new ID y.

NetworkManager upon a restart should not delete all interface <-> OVS connections. These ports should be stable. Otherwise, this can cause really weird issues with OVS' OpenFlows if those target specific ports (source or destination matches by port). NetworkManager instead should shut down the interfaces which are connected to the OVS bridges instead of ripping them out and plugging them in completely. What reason is there for deleting the ports completely and recreating them? The impact is way too big, instead a smart solution would be to compare the desired state to the current state and only make those changes that are really necessary. At the very least, there should be some option for administrators to keep port IDs, or NetworkManager should cooperate with OVS (perhaps this would require changes on both sides) to keep the same port IDs, so that OpenFlows do not break.

For an example of the issues that the current behavior can create, please see: https://bugzilla.redhat.com/show_bug.cgi?id=2048352

Comment 1 Andreas Karis 2022-02-01 15:20:31 UTC
NM should ideally use ofport_request to make sure that port IDs remain stable, see: https://docs.openvswitch.org/en/latest/tutorials/ovs-advanced/

Comment 2 Andreas Karis 2022-02-01 15:47:42 UTC
Example of how the port id changes from `6(ens5): addr:0e:5e:00:0a:14:47` to `7(ens5): addr:0e:5e:00:0a:14:47`:
~~~
[root@ip-10-0-155-59 tmp]# ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:00000e5e000a1447
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 3(patch-br-ex_ip-): addr:5e:f1:2f:35:7d:ef
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 6(ens5): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ex): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
[root@ip-10-0-155-59 tmp]# systemctl stop NetworkManager
[root@ip-10-0-155-59 tmp]# ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:00000e5e000a1447
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 3(patch-br-ex_ip-): addr:5e:f1:2f:35:7d:ef
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 6(ens5): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
[root@ip-10-0-155-59 tmp]# ip link ls dev ens5
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 0e:5e:00:0a:14:47 brd ff:ff:ff:ff:ff:ff
[root@ip-10-0-155-59 tmp]# systemctl start NetworkManager
[root@ip-10-0-155-59 tmp]# 
[root@ip-10-0-155-59 tmp]# ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:00000e5e000a1447
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 3(patch-br-ex_ip-): addr:5e:f1:2f:35:7d:ef
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 7(ens5): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ex): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
~~~

Comment 3 Andreas Karis 2022-02-03 19:56:51 UTC
Created attachment 1858962 [details]
dispatcher pre-up script + analysis

I created a dispatcher pre-up script which sets a fix ovs port id. Ideally, NM would have an option so that we can pre-assign an ovs port id. As you can see from the logs, there still is a 1 second ga

Comment 5 Beniamino Galvani 2022-03-15 10:36:35 UTC
> Ideally, NM would have an option so that we can pre-assign an ovs port id

I think that's a reasonable requirement.

> What reason is there for deleting the ports completely and recreating them? 

The reason is that there is a duplication of persistent state when a OVS connection is created by NM. NM maintains its own state based on active connections and at the same time it needs to write that state to the ovsdb. Upon shutdown, NM needs to clean up ovsdb so that those interfaces are not created automatically by ovs at the next boot, because NM should be in control of which connections are activated.

> The impact is way too big, instead a smart solution would be to compare the desired state to the current state and only make those changes that are really necessary.

I agree that this would be the optimal solution, but it seems hard to implement correctly.

Comment 6 Andreas Karis 2022-03-16 10:04:48 UTC
Awesome, thank you so much for the answer. Let's go with that option then which will allow us to pre-assign an ovs port id. Meanwhile, in our case, we are going to work around this with a dispatcher pre-up script.

Thanks!

Comment 7 Vojtěch Bůbela 2022-07-22 12:04:46 UTC
i am now working on this

Comment 8 Thomas Haller 2022-09-08 13:19:09 UTC
fixed upstream by https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/db88bc50905f7f4ad61c81a4b7541279e7a92cca
in NetworkManager 1.41.2+

Comment 11 Beniamino Galvani 2022-10-17 14:33:01 UTC
(changing status to ASSIGNED because the patch is not yet backported to nm-1-40)

Comment 16 Vladimir Benes 2022-10-21 13:16:25 UTC
we have the nmcli_add_openvswitch_ofport_request test running in centos and in RHEL automation too, passing

Comment 19 errata-xmlrpc 2023-05-09 08:17:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2485


Note You need to log in before you can comment on or make changes to this bug.