Bug 1807670 - Minimize disruption of new and existing connections while OVS is being upgraded
Summary: Minimize disruption of new and existing connections while OVS is being upgraded
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.3.z
Assignee: Aniket Bhat
QA Contact: zhaozhanqi
URL:
Whiteboard: SDN-CI-IMPACT,SDN-BP
Depends On: 1807648
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-26 21:19 UTC by Clayton Coleman
Modified: 2020-06-17 20:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1807648
Environment:
Last Closed: 2020-06-17 20:27:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 626 0 None closed Bug 1807670: Fixes to reliably save/restore flows 2021-01-21 00:32:49 UTC
Red Hat Product Errata RHBA-2020:2436 0 None None None 2020-06-17 20:28:20 UTC

Description Clayton Coleman 2020-02-26 21:19:12 UTC
+++ This bug was initially created as a clone of Bug #1807648 +++

+++ This bug was initially created as a clone of Bug #1807638 +++

During OVS shutdown and startup new and reused connections are being impacted (in upgrade).

We should attempt to preserve all existing flows in the kernel while the OVS daemon is offline, and during startup avoid clearing those flows.

While this does not completely mitigate dropped / failed connections while OVS is upgrading, it dramatically reduces the amount of time user applications are impacted both by upgrades or by unexpected disruption (an OOM kill or OVS crash). Future changes will build on this to try to make OVS upgrade completely transparent to end user applications.

Known gaps:

ARP flows are potentially still being lost for new connections (workarounds being investigated with OVS team).

After testing in 4.5, we will consider back porting this to all active releases.

Comment 5 zhaozhanqi 2020-06-01 05:21:11 UTC
Verified this bug on 4.3.0-0.nightly-2020-05-30-012900

oc rsh -n openshift-sdn ovs-h67r5
sh-4.2# bridges=$(ovs-vsctl -- --real list-br)
sh-4.2# TMPDIR=/var/run/openvswitch /usr/share/openvswitch/scripts/ovs-save save-flows $bridges > /var/run/openvswitch/flows.sh
2020-06-01T05:17:45Z|00001|vconn|WARN|unix:/var/run/openvswitch/br0.mgmt: version negotiation failed (we support version 0x01, peer supports version 0x04)
ovs-ofctl: br0: failed to connect to socket (Broken pipe)
sh-4.2# cat /var/run/openvswitch/flows.sh 
ovs-ofctl add-tlv-map br0 ''
ovs-ofctl -O OpenFlow13 add-groups br0               "/var/run/openvswitch/ovs-save.z9ibdquDTe/br0.groups.dump" 
ovs-ofctl -O OpenFlow13 replace-flows br0               "/var/run/openvswitch/ovs-save.z9ibdquDTe/br0.flows.dump" 
rm -rf "/var/run/openvswitch/ovs-save.z9ibdquDTe"
sh-4.2# /usr/bin/ovs-vsctl --may-exist add-br br0 -- set Bridge br0 fail_mode=secure protocols=OpenFlow13
sh-4.2# sh -x /var/run/openvswitch/flows.sh
+ ovs-ofctl add-tlv-map br0 ''
2020-06-01T05:19:17Z|00001|vconn|WARN|unix:/var/run/openvswitch/br0.mgmt: version negotiation failed (we support version 0x01, peer supports version 0x04)
ovs-ofctl: br0: failed to connect to socket (Broken pipe)
+ ovs-ofctl -O OpenFlow13 add-groups br0 /var/run/openvswitch/ovs-save.z9ibdquDTe/br0.groups.dump
+ ovs-ofctl -O OpenFlow13 replace-flows br0 /var/run/openvswitch/ovs-save.z9ibdquDTe/br0.flows.dump
+ rm -rf /var/run/openvswitch/ovs-save.z9ibdquDTe
sh-4.2#           ovs-vsctl --no-wait --if-exists remove Open_vSwitch . other_config flow-restore-wait=true
sh-4.2# exit
exit

Comment 9 errata-xmlrpc 2020-06-17 20:27:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2436


Note You need to log in before you can comment on or make changes to this bug.