Bug 1934643
Summary: | Need BFD failover capability on ECMP routes | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Tim Rozet <trozet> | |
Component: | Networking | Assignee: | Federico Paolinelli <fpaoline> | |
Networking sub component: | ovn-kubernetes | QA Contact: | Ross Brattain <rbrattai> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | aconstan, pibanezr, rbrattai, zzhao | |
Version: | 4.7 | |||
Target Milestone: | --- | |||
Target Release: | 4.8.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1934645 (view as bug list) | Environment: | ||
Last Closed: | 2021-07-27 22:49:27 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1934645 |
Description
Tim Rozet
2021-03-03 15:41:22 UTC
I see the annotations causing the BFD to be created, but I don't have external BFD routers to test with. 4.8.0-0.nightly-2021-03-16-073618 oc annotate ns t1 k8s.ovn.org/routing-external-gws=10.242.0.1,10.242.0.2 oc annotate ns t1 k8s.ovn.org/bfd-enabled="" GR_rbrattai-o48v24-2pkgr-worker-ghvzc IPv4 Routes 10.128.2.10 10.242.0.1 src-ip ecmp ecmp-symmetric-reply 10.128.2.10 10.242.0.2 src-ip ecmp ecmp-symmetric-reply 10.128.0.0/14 100.64.0.1 dst-ip 0.0.0.0/0 172.31.248.1 dst-ip rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc sh-4.4# ovn-nbctl --format=table find BFD _uuid detect_mult dst_ip external_ids logical_port min_rx min_tx options status ------------------------------------ ----------- ------------ ------------ ------------------------------------------ ------ ------ ------- ------ 94add884-a7b0-4f80-ae31-b36a369e3cc8 [] "10.242.0.1" {} rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc [] [] {} down eb70052f-4ae0-4fa5-9bb5-9f7eb5605895 [] "10.242.0.2" {} rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc [] [] {} down sh-4.4# ovn-nbctl --format=table find Logical_Router_Static_Route _uuid bfd external_ids ip_prefix nexthop options output_port policy ------------------------------------ ------------------------------------ ------------ --------------- -------------- ----------------------------- ------------------------------------------ ------ 89e72ba6-61f2-441a-a1a1-c3ec2bdbe185 [] {} "0.0.0.0/0" "172.31.248.1" {} rtoe-GR_rbrattai-o48v24-2pkgr-worker-c5qrb [] 118c64c8-9966-49f9-81a9-1956dd031f37 94add884-a7b0-4f80-ae31-b36a369e3cc8 {} "10.128.2.10" "10.242.0.1" {ecmp_symmetric_reply="true"} rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc src-ip 87a76309-8ea6-46af-a300-e217b5e117e8 [] {} "0.0.0.0/0" "172.31.248.1" {} rtoe-GR_rbrattai-o48v24-2pkgr-master-2 [] a9faaa8a-0ee0-4671-bf8b-330b05b4eab3 [] {} "0.0.0.0/0" "172.31.248.1" {} rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc [] d517ca12-c978-4bf1-9896-f8dbc1686bfe [] {} "0.0.0.0/0" "172.31.248.1" {} rtoe-GR_rbrattai-o48v24-2pkgr-master-1 [] cbf0a1b4-6be7-43bf-94aa-50267735bb85 eb70052f-4ae0-4fa5-9bb5-9f7eb5605895 {} "10.128.2.10" "10.242.0.2" {ecmp_symmetric_reply="true"} rtoe-GR_rbrattai-o48v24-2pkgr-worker-ghvzc src-ip sh-4.4# ovs-ofctl -O OpenFlow13 dump-flows br-ex | grep 3784 cookie=0xdeff105, duration=34429.320s, table=1, n_packets=0, n_bytes=0, priority=13,udp,in_port=1,tp_dst=3784 actions=output:2,LOCAL sh-4.4# ovs-ofctl -O OpenFlow13 dump-flows br-int | grep tp_dst=3784 cookie=0xb1b9218c, duration=1797.314s, table=11, n_packets=0, n_bytes=0, priority=110,udp6,metadata=0x11,ipv6_dst=fe80::250:56ff:feac:61dc,tp_dst=3784 actions=controller(userdata=00.00.00.17.00.00.00.00) cookie=0x8403fc30, duration=1797.314s, table=11, n_packets=0, n_bytes=0, priority=110,udp,metadata=0x11,nw_dst=172.31.249.221,tp_dst=3784 actions=controller(userdata=00.00.00.17.00.00.00.00) cookie=0x92c08538, duration=1797.314s, table=11, n_packets=0, n_bytes=0, priority=110,udp6,metadata=0x11,ipv6_src=fe80::250:56ff:feac:61dc,tp_dst=3784 actions=resubmit(,12) cookie=0xb450fcd2, duration=1797.314s, table=11, n_packets=4105, n_bytes=270930, priority=110,udp,metadata=0x11,nw_src=172.31.249.221,tp_dst=3784 actions=resubmit(,12) cookie=0x30f403ae, duration=1797.316s, table=18, n_packets=0, n_bytes=0, priority=130,udp6,reg14=0x2,metadata=0x11,ipv6_dst=fe80::/64,tp_dst=3784 actions=load:0->OXM_OF_PKT_REG4[32..47],move:NXM_NX_IPV6_DST[]->NXM_NX_XXREG0[],set_field:0xfe80000000000000025056fffeac61dc->xxreg1,set_field:00:50:56:ac:61:dc->eth_src,set_field:0x2->reg15,load:0x1->NXM_NX_REG10[0],resubmit(,19) cookie=0xc7f30157, duration=1797.316s, table=18, n_packets=0, n_bytes=0, priority=48,udp,metadata=0x11,nw_dst=172.31.248.0/23,tp_dst=3784 actions=load:0->OXM_OF_PKT_REG4[32..47],move:NXM_OF_IP_DST[]->NXM_NX_XXREG0[96..127],load:0xac1ff9dd->NXM_NX_XXREG0[64..95],set_field:00:50:56:ac:61:dc->eth_src,set_field:0x2->reg15,load:0x1->NXM_NX_REG10[0],resubmit(,19) cookie=0x42621563, duration=1797.316s, table=18, n_packets=4105, n_bytes=270930, priority=2,udp,metadata=0x11,tp_dst=3784 actions=load:0->OXM_OF_PKT_REG4[32..47],load:0xac1ff801->NXM_NX_XXREG0[96..127],load:0xac1ff9dd->NXM_NX_XXREG0[64..95],set_field:00:50:56:ac:61:dc->eth_src,set_field:0x2->reg15,load:0x1->NXM_NX_REG10[0],resubmit(,19) @trozet Could you help check if comment 3 is enough to verify this bug? thanks. In case it's not enough: I used frr to test and CI it, maybe it helps. http://docs.frrouting.org/en/latest/bfd.html You need to add sed -i 's/^bfdd=no/bfdd=yes/g' /etc/frr/daemons To enable bfdd, and cat << EOF >> /etc/frr/frr.conf bfd peer 172.18.0.4 no shutdown ! ! EOF to add a peer. vtysh -c "show bfd peers" would show the coupled peers ovn-nbctl find bfd show the peer status from the ovn point of view. If you are unable to run a bfd client to verify, then you can at minimum verify you see bfd control packets coming from the node. At least that provides some indication that bfd is functioning from the OVN side. frr was easy enough sh-4.4# ovn-nbctl find bfd _uuid : 82f83d39-5769-463e-88de-fae888bc480b detect_mult : [] dst_ip : "10.0.24.40" external_ids : {} logical_port : rtoe-GR_ip-10-0-142-146.compute.internal min_rx : [] min_tx : [] options : {} status : up sh-4.4# ovn-nbctl find Logical_Router_Static_Route bfd!=[] _uuid : 7b542cfc-3853-4da5-b1ab-c45ecdcf5f51 bfd : 82f83d39-5769-463e-88de-fae888bc480b external_ids : {} ip_prefix : "10.129.3.121" nexthop : "10.0.24.40" options : {ecmp_symmetric_reply="true"} output_port : rtoe-GR_ip-10-0-142-146.compute.internal policy : src-ip [root@ip-10-0-24-40 frr]# tcpdump -i eth0 port '(3784 or 3785)' dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 01:56:46.476071 IP ip-10-0-142-146.compute.internal.49152 > ip-10-0-24-40.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24 01:56:46.649478 IP ip-10-0-24-40.compute.internal.49152 > ip-10-0-142-146.compute.internal.bfd-control: BFDv1, Control, State Up, Flags: [none], length: 24 [root@ip-10-0-24-40 frr]# vtysh -c "show bfd peers" BFD Peers: peer 10.0.142.146 ID: 1 Remote ID: 3771289794 Status: up Uptime: 2 minute(s), 3 second(s) Diagnostics: ok Remote diagnostics: ok Local timers: Receive interval: 300ms Transmission interval: 300ms Echo transmission interval: disabled Remote timers: Receive interval: 1000ms Transmission interval: 1000ms Echo transmission interval: 0ms Internet Protocol Version 4, Src: 10.0.142.146, Dst: 10.0.24.40 0100 .... = Version: 4 .... 0101 = Header Length: 20 bytes (5) Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) 0000 00.. = Differentiated Services Codepoint: Default (0) .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) Total Length: 52 Identification: 0x0000 (0) Flags: 0x40, Don't fragment 0... .... = Reserved bit: Not set .1.. .... = Don't fragment: Set ..0. .... = More fragments: Not set Fragment Offset: 0 Time to Live: 255 Protocol: UDP (17) Header Checksum: 0xc0fe [validation disabled] [Header checksum status: Unverified] Source Address: 10.0.142.146 Destination Address: 10.0.24.40 User Datagram Protocol, Src Port: 49152, Dst Port: 3784 Source Port: 49152 Destination Port: 3784 Length: 32 [Checksum: [missing]] [Checksum Status: Not present] [Stream index: 0] [Timestamps] [Time since first frame: 0.000000000 seconds] [Time since previous frame: 0.000000000 seconds] UDP payload (24 bytes) BFD Control message 001. .... = Protocol Version: 1 ...0 0000 = Diagnostic Code: No Diagnostic (0x00) 11.. .... = Session State: Up (0x3) Message Flags: 0xc0 0... .. = Poll: Not set .0.. .. = Final: Not set ..0. .. = Control Plane Independent: Not set ...0 .. = Authentication Present: Not set .... 0. = Demand: Not set .... .0 = Multipoint: Not set Detect Time Multiplier: 5 (= 5000 ms Detection time) Message Length: 24 bytes My Discriminator: 0xe0c950c2 Your Discriminator: 0x00000001 Desired Min TX Interval: 1000 ms (1000000 us) Required Min RX Interval: 1000 ms (1000000 us) Required Min Echo Interval: 0 ms (0 us) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |