Bug 1934645 - [4.7z] Need BFD failover capability on ECMP routes
Summary: [4.7z] Need BFD failover capability on ECMP routes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.z
Assignee: Federico Paolinelli
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On: 1934643
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-03 15:44 UTC by Tim Rozet
Modified: 2021-03-30 04:46 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1934643
Environment:
Last Closed: 2021-03-30 04:46:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 462 0 None open Bug 1934645: Backport enable support for BFD on external gateway routes 2021-03-18 11:04:53 UTC
Red Hat Product Errata RHSA-2021:0957 0 None None None 2021-03-30 04:46:44 UTC

Description Tim Rozet 2021-03-03 15:44:42 UTC
+++ This bug was initially created as a clone of Bug #1934643 +++

Description of problem:
With multiple external gateways we use ECMP in order to load-balance egress cluster traffic across multiple gateways. However, if one of the gateways goes down, we are essentially forwarding traffic to a black hole. To fix this, most external routers support bidirectional forwarding detection (bfd). OVN also now supports this. We can configure bfd on our ecmp routes, and as long as the gateway also uses bfd we can detect gateway routing failures quickly and remove those routes in OVN.

Comment 3 Ross Brattain 2021-03-22 23:39:15 UTC
Verified on 4.7.0-0.nightly-2021-03-21-181832


sh-4.4# ovn-nbctl find bfd
_uuid               : 702f7fe9-7174-41a3-b6a0-a942ef8dcb6d
detect_mult         : []
dst_ip              : "10.0.0.163"
external_ids        : {}
logical_port        : rtoe-GR_ip-10-0-151-254.compute.internal
min_rx              : []
min_tx              : []
options             : {}
status              : up

sh-4.4# ovn-nbctl       find    Logical_Router_Static_Route bfd!=[]
_uuid               : b8e24084-22ba-437f-b9dd-def41948da1f
bfd                 : 702f7fe9-7174-41a3-b6a0-a942ef8dcb6d
external_ids        : {}
ip_prefix           : "10.131.0.32"
nexthop             : "10.0.0.163"
options             : {ecmp_symmetric_reply="true"}
output_port         : rtoe-GR_ip-10-0-151-254.compute.internal
policy              : src-ip


[root@ip-10-0-0-163 ec2-user]# vtysh -c "show bfd peers"
BFD Peers:
        peer 10.0.151.254
                ID: 1
                Remote ID: 828088889
                Status: up
                Uptime: 1 minute(s), 14 second(s)
                Diagnostics: ok
                Remote diagnostics: ok
                Local timers:
                        Receive interval: 300ms
                        Transmission interval: 300ms
                        Echo transmission interval: disabled
                Remote timers:
                        Receive interval: 1000ms
                        Transmission interval: 1000ms
                        Echo transmission interval: 0ms


dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
23:34:39.683048 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:34:40.330692 IP ip-10-0-151-254.compute.internal.49152 > ip-10-0-0-163.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:34:40.613264 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:34:41.276886 IP ip-10-0-151-254.compute.internal.49152 > ip-10-0-0-163.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:34:41.403328 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:34:42.162088 IP ip-10-0-151-254.compute.internal.49152 > ip-10-0-0-163.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:34:42.163370 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24


sh-4.4# tcpdump -i ens5 port '(3784 or 3785)'
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
23:36:23.697664 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:36:23.836563 IP ip-10-0-151-254.compute.internal.49152 > ip-10-0-0-163.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:36:24.457697 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:36:24.794813 IP ip-10-0-151-254.compute.internal.49152 > ip-10-0-0-163.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:36:25.327817 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:36:25.674966 IP ip-10-0-151-254.compute.internal.49152 > ip-10-0-0-163.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:36:26.238108 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24


sh-4.4# tcpdump -i br-ex port '(3784 or 3785)'
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-ex, link-type EN10MB (Ethernet), capture size 262144 bytes
23:36:52.261362 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:36:53.171569 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:36:53.931627 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
23:36:54.891804 IP ip-10-0-0-163.compute.internal.49152 > ip-10-0-151-254.ca-central-1.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24

Comment 5 errata-xmlrpc 2021-03-30 04:46:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.4 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0957


Note You need to log in before you can comment on or make changes to this bug.