Bug 1934643 - Need BFD failover capability on ECMP routes
Summary: Need BFD failover capability on ECMP routes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.0
Assignee: Federico Paolinelli
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks: 1934645
TreeView+ depends on / blocked
 
Reported: 2021-03-03 15:41 UTC by Tim Rozet
Modified: 2021-07-27 22:51 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1934645 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:49:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 458 0 None Merged Bug 1934643: Downstream merge 3-10-21 2022-07-18 22:31:59 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:51:00 UTC

Description Tim Rozet 2021-03-03 15:41:22 UTC
Description of problem:
With multiple external gateways we use ECMP in order to load-balance egress cluster traffic across multiple gateways. However, if one of the gateways goes down, we are essentially forwarding traffic to a black hole. To fix this, most external routers support bidirectional forwarding detection (bfd). OVN also now supports this. We can configure bfd on our ecmp routes, and as long as the gateway also uses bfd we can detect gateway routing failures quickly and remove those routes in OVN.

Comment 3 Ross Brattain 2021-03-17 02:29:58 UTC
I see the annotations causing the BFD to be created, but I don't have external BFD routers to test with.

4.8.0-0.nightly-2021-03-16-073618

oc annotate ns t1 k8s.ovn.org/routing-external-gws=10.242.0.1,10.242.0.2

oc annotate ns t1 k8s.ovn.org/bfd-enabled=""

GR_rbrattai-o48v24-2pkgr-worker-ghvzc
IPv4 Routes
              10.128.2.10                10.242.0.1 src-ip ecmp ecmp-symmetric-reply
              10.128.2.10                10.242.0.2 src-ip ecmp ecmp-symmetric-reply
            10.128.0.0/14                100.64.0.1 dst-ip
                0.0.0.0/0              172.31.248.1 dst-ip rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc


sh-4.4# ovn-nbctl --format=table       find    BFD
_uuid                                detect_mult dst_ip       external_ids logical_port                               min_rx min_tx options status
------------------------------------ ----------- ------------ ------------ ------------------------------------------ ------ ------ ------- ------
94add884-a7b0-4f80-ae31-b36a369e3cc8 []          "10.242.0.1" {}           rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc []     []     {}      down
eb70052f-4ae0-4fa5-9bb5-9f7eb5605895 []          "10.242.0.2" {}           rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc []     []     {}      down

sh-4.4# ovn-nbctl --format=table       find    Logical_Router_Static_Route
_uuid                                bfd                                  external_ids ip_prefix       nexthop        options                       output_port                                policy
------------------------------------ ------------------------------------ ------------ --------------- -------------- ----------------------------- ------------------------------------------ ------
89e72ba6-61f2-441a-a1a1-c3ec2bdbe185 []                                   {}           "0.0.0.0/0"     "172.31.248.1" {}                            rtoe-GR_rbrattai-o48v24-2pkgr-worker-c5qrb []
118c64c8-9966-49f9-81a9-1956dd031f37 94add884-a7b0-4f80-ae31-b36a369e3cc8 {}           "10.128.2.10"   "10.242.0.1"   {ecmp_symmetric_reply="true"} rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc src-ip
87a76309-8ea6-46af-a300-e217b5e117e8 []                                   {}           "0.0.0.0/0"     "172.31.248.1" {}                            rtoe-GR_rbrattai-o48v24-2pkgr-master-2     []
a9faaa8a-0ee0-4671-bf8b-330b05b4eab3 []                                   {}           "0.0.0.0/0"     "172.31.248.1" {}                            rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc []
d517ca12-c978-4bf1-9896-f8dbc1686bfe []                                   {}           "0.0.0.0/0"     "172.31.248.1" {}                            rtoe-GR_rbrattai-o48v24-2pkgr-master-1     []
cbf0a1b4-6be7-43bf-94aa-50267735bb85 eb70052f-4ae0-4fa5-9bb5-9f7eb5605895 {}           "10.128.2.10"   "10.242.0.2"   {ecmp_symmetric_reply="true"} rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc src-ip



sh-4.4# ovs-ofctl -O OpenFlow13 dump-flows br-ex | grep 3784
cookie=0xdeff105, duration=34429.320s, table=1, n_packets=0, n_bytes=0, priority=13,udp,in_port=1,tp_dst=3784 actions=output:2,LOCAL



sh-4.4# ovs-ofctl -O OpenFlow13 dump-flows br-int  | grep tp_dst=3784
cookie=0xb1b9218c, duration=1797.314s, table=11, n_packets=0, n_bytes=0, priority=110,udp6,metadata=0x11,ipv6_dst=fe80::250:56ff:feac:61dc,tp_dst=3784 actions=controller(userdata=00.00.00.17.00.00.00.00)
cookie=0x8403fc30, duration=1797.314s, table=11, n_packets=0, n_bytes=0, priority=110,udp,metadata=0x11,nw_dst=172.31.249.221,tp_dst=3784 actions=controller(userdata=00.00.00.17.00.00.00.00)
cookie=0x92c08538, duration=1797.314s, table=11, n_packets=0, n_bytes=0, priority=110,udp6,metadata=0x11,ipv6_src=fe80::250:56ff:feac:61dc,tp_dst=3784 actions=resubmit(,12)
cookie=0xb450fcd2, duration=1797.314s, table=11, n_packets=4105, n_bytes=270930, priority=110,udp,metadata=0x11,nw_src=172.31.249.221,tp_dst=3784 actions=resubmit(,12)
cookie=0x30f403ae, duration=1797.316s, table=18, n_packets=0, n_bytes=0, priority=130,udp6,reg14=0x2,metadata=0x11,ipv6_dst=fe80::/64,tp_dst=3784 actions=load:0->OXM_OF_PKT_REG4[32..47],move:NXM_NX_IPV6_DST[]->NXM_NX_XXREG0[],set_field:0xfe80000000000000025056fffeac61dc->xxreg1,set_field:00:50:56:ac:61:dc->eth_src,set_field:0x2->reg15,load:0x1->NXM_NX_REG10[0],resubmit(,19)
cookie=0xc7f30157, duration=1797.316s, table=18, n_packets=0, n_bytes=0, priority=48,udp,metadata=0x11,nw_dst=172.31.248.0/23,tp_dst=3784 actions=load:0->OXM_OF_PKT_REG4[32..47],move:NXM_OF_IP_DST[]->NXM_NX_XXREG0[96..127],load:0xac1ff9dd->NXM_NX_XXREG0[64..95],set_field:00:50:56:ac:61:dc->eth_src,set_field:0x2->reg15,load:0x1->NXM_NX_REG10[0],resubmit(,19)
cookie=0x42621563, duration=1797.316s, table=18, n_packets=4105, n_bytes=270930, priority=2,udp,metadata=0x11,tp_dst=3784 actions=load:0->OXM_OF_PKT_REG4[32..47],load:0xac1ff801->NXM_NX_XXREG0[96..127],load:0xac1ff9dd->NXM_NX_XXREG0[64..95],set_field:00:50:56:ac:61:dc->eth_src,set_field:0x2->reg15,load:0x1->NXM_NX_REG10[0],resubmit(,19)

Comment 4 zhaozhanqi 2021-03-17 03:46:39 UTC
@trozet Could you help check if comment 3 is enough to verify this bug? thanks.

Comment 5 Federico Paolinelli 2021-03-17 08:19:09 UTC
In case it's not enough:

I used frr to test and CI it, maybe it helps.

http://docs.frrouting.org/en/latest/bfd.html

You need to add 

sed -i 's/^bfdd=no/bfdd=yes/g' /etc/frr/daemons

To enable bfdd, and 

cat << EOF >> /etc/frr/frr.conf

bfd
 peer 172.18.0.4
   no shutdown
 !
!
EOF

to add a peer.

vtysh -c "show bfd peers" would show the coupled peers

ovn-nbctl find bfd show the peer status from the ovn point of view.

Comment 6 Tim Rozet 2021-03-17 14:15:17 UTC
If you are unable to run a bfd client to verify, then you can at minimum verify you see bfd control packets coming from the node. At least that provides some indication that bfd is functioning from the OVN side.

Comment 7 Ross Brattain 2021-03-18 02:20:17 UTC
frr was easy enough



sh-4.4# ovn-nbctl find bfd
_uuid               : 82f83d39-5769-463e-88de-fae888bc480b
detect_mult         : []
dst_ip              : "10.0.24.40"
external_ids        : {}
logical_port        : rtoe-GR_ip-10-0-142-146.compute.internal
min_rx              : []
min_tx              : []
options             : {}
status              : up


sh-4.4# ovn-nbctl       find    Logical_Router_Static_Route bfd!=[]
_uuid               : 7b542cfc-3853-4da5-b1ab-c45ecdcf5f51
bfd                 : 82f83d39-5769-463e-88de-fae888bc480b
external_ids        : {}
ip_prefix           : "10.129.3.121"
nexthop             : "10.0.24.40"
options             : {ecmp_symmetric_reply="true"}
output_port         : rtoe-GR_ip-10-0-142-146.compute.internal
policy              : src-ip

[root@ip-10-0-24-40 frr]# tcpdump -i eth0 port '(3784 or 3785)'
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
01:56:46.476071 IP ip-10-0-142-146.compute.internal.49152 > ip-10-0-24-40.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24
01:56:46.649478 IP ip-10-0-24-40.compute.internal.49152 > ip-10-0-142-146.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24



[root@ip-10-0-24-40 frr]# vtysh -c "show bfd peers"
BFD Peers:
        peer 10.0.142.146
                ID: 1
                Remote ID: 3771289794
                Status: up
                Uptime: 2 minute(s), 3 second(s)
                Diagnostics: ok
                Remote diagnostics: ok
                Local timers:
                        Receive interval: 300ms
                        Transmission interval: 300ms
                        Echo transmission interval: disabled
                Remote timers:
                        Receive interval: 1000ms
                        Transmission interval: 1000ms
                        Echo transmission interval: 0ms


Internet Protocol Version 4, Src: 10.0.142.146, Dst: 10.0.24.40
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 52
    Identification: 0x0000 (0)
    Flags: 0x40, Don't fragment
        0... .... = Reserved bit: Not set
        .1.. .... = Don't fragment: Set
        ..0. .... = More fragments: Not set
    Fragment Offset: 0
    Time to Live: 255
    Protocol: UDP (17)
    Header Checksum: 0xc0fe [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 10.0.142.146
    Destination Address: 10.0.24.40
User Datagram Protocol, Src Port: 49152, Dst Port: 3784
    Source Port: 49152
    Destination Port: 3784
    Length: 32
    [Checksum: [missing]]
    [Checksum Status: Not present]
    [Stream index: 0]
    [Timestamps]
        [Time since first frame: 0.000000000 seconds]
        [Time since previous frame: 0.000000000 seconds]
    UDP payload (24 bytes)
BFD Control message
    001. .... = Protocol Version: 1
    ...0 0000 = Diagnostic Code: No Diagnostic (0x00)
    11.. .... = Session State: Up (0x3)
    Message Flags: 0xc0
        0... .. = Poll: Not set
        .0.. .. = Final: Not set
        ..0. .. = Control Plane Independent: Not set
        ...0 .. = Authentication Present: Not set
        .... 0. = Demand: Not set
        .... .0 = Multipoint: Not set
    Detect Time Multiplier: 5 (= 5000 ms Detection time)
    Message Length: 24 bytes
    My Discriminator: 0xe0c950c2
    Your Discriminator: 0x00000001
    Desired Min TX Interval: 1000 ms (1000000 us)
    Required Min RX Interval: 1000 ms (1000000 us)
    Required Min Echo Interval:    0 ms (0 us)

Comment 10 errata-xmlrpc 2021-07-27 22:49:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.