Created attachment 1745602 [details] Setup information from ovn-control-plane and ovn-worker Description of problem: RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1881826 implemented ECMP for logical router policies. This is to say: traffic is load balanced between the nexthops, which are programmed by the CMS in the logical router policy column "nexthops". The next step for a complete feature is supporting a BFD like liveness check on these nexthops as to be able to make accurate routing decisions based on if the route to the nexthop is healthy or not. OVN already allows the CMS to enable BFD sessions between gateway routers (http://www.openvswitch.org/support/dist-docs/ovn-nb.5.html) by specifying multiple gateway_chassis on the logical router port connected to a distributed gateway router. I have used this feature in combination with manual modification of the OVS flows to implemented the wanted behavior in a POC. You can find all information attached to this BZ. Essentially what I've done is create a gateway_chassis for each node in my cluster: $ ovn-nbctl list gateway_chassis _uuid : bd11e875-178f-4405-908f-c27e8b21c717 chassis_name : "c0cfd470-ff02-409d-ac28-c0959af86e95" external_ids : {dgp_name=rtos-node_local_switch} name : rtos-node_local_switch_c0cfd470-ff02-409d-ac28-c0959af86e95 options : {} priority : 100 _uuid : 7ff87c9e-8520-4f6a-9fc8-097343c03340 chassis_name : "bf5d9f87-3a42-4573-b6f8-52e4349bece8" external_ids : {dgp_name=rtos-node_local_switch} name : rtos-node_local_switch_bf5d9f87-3a42-4573-b6f8-52e4349bece8 options : {} priority : 100 _uuid : 20ad871b-f7c2-4cef-ab51-08ddc0531a5f chassis_name : "99e049b5-a881-4a51-a24d-e01d7cee446a" external_ids : {dgp_name=rtos-node_local_switch} name : rtos-node_local_switch_99e049b5-a881-4a51-a24d-e01d7cee446a options : {} priority : 100 These gateway_chassis are then attached to the logical router port connected to the distributed gateway router: $ ovn-nbctl list logical_router_port _uuid : 80f9f38a-a58e-401d-aca6-7a33c7be86b4 enabled : [] external_ids : {} gateway_chassis : [20ad871b-f7c2-4cef-ab51-08ddc0531a5f, 7ff87c9e-8520-4f6a-9fc8-097343c03340, bd11e875-178f-4405-908f-c27e8b21c717] ha_chassis_group : [] ipv6_prefix : [] ipv6_ra_configs : {} mac : "0a:58:a9:fe:00:02" name : rtos-node_local_switch networks : ["169.254.0.2/20"] options : {} peer : [] $ ovn-nbctl list logical_router _uuid : cb712e31-0250-4e73-9090-0d6a37df0996 enabled : [] external_ids : {k8s-cluster-router=yes, k8s-ovn-topo-version="1"} load_balancer : [] name : ovn_cluster_router nat : [35e33d26-be8a-4680-9c4e-caf28762bc24, c6853103-3e93-4d5b-908f-a0b47cf51178, c73262f2-1708-4dae-847f-61e570550812] options : {} policies : [12a0e838-d5f8-4a19-9c82-3c9178ecc318, 1343ae93-7011-4bc7-ab5f-01235db02159, 30716998-4391-4801-8f04-d4fc15cf5c15, 30ad8058-bcc4-47f3-99db-592063ecf002, 380a22c5-f5c9-4771-bb80-ee1179fed569, 46a61e99-b3df-4768-9db6-3954a4535254, 55d06d99-5bf8-4c18-9016-b24a0937cd5b, 5eb37b4c-14bb-47c4-8899-d629b3c22601, 6777dae3-5380-4229-b136-17ecbe09e454, 6a128443-9ac8-4559-af2f-a769ab24a7d8, 7acdd6e6-8ef3-4a5d-b88b-a2c6ba3f676b, 9ac20de8-37dd-47eb-84b2-1e41d64d7801, b541bf15-fd7d-468b-bee7-93d0845cc263, e05c9c57-558e-4286-935e-e1174ca1a7ee] ports : [10ec8a5b-554e-40e2-999f-32e753e37ead, 3fe07cc8-db7e-47c2-86ca-cb4be5ddc3de, 490ca095-755a-457b-ac53-8eb24faa0ddb, 56f003c7-61a6-4a55-8928-ed29e0d747f5, 80f9f38a-a58e-401d-aca6-7a33c7be86b4] static_routes : [01d0e96c-45b8-43b4-9c37-fe3f3f213786, 0a113866-465b-4d19-a2d1-2937eb22e223, 2878ca40-05d5-4cfd-8417-e5b107d1d31c, 5d315d42-2954-44c4-99c4-b85a95671a72, 6635fb84-4ee8-4eab-b15d-6fdb6ab377f3, 7e7ea132-5973-431b-9605-c5a5ab5d4247, a1aae7b6-7f6c-47fa-a378-11f39822e107, a79841e5-a701-43d0-97e1-b8683a1283bf, c312443d-784d-4215-81a2-bbbe7591a2f0] Doing this will enable BFD sessions between all nodes by ovn-controller, as seen in it's logs: oc get pod -A -owide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default netserver-0 1/1 Running 0 56m 10.244.2.4 ovn-worker <none> <none> kube-system coredns-f9fd979d6-dd479 1/1 Running 0 119m 10.244.0.3 ovn-worker2 <none> <none> kube-system coredns-f9fd979d6-jwpwd 1/1 Running 0 119m 10.244.2.3 ovn-worker <none> <none> kube-system etcd-ovn-control-plane 1/1 Running 0 119m 172.18.0.4 ovn-control-plane <none> <none> kube-system kube-apiserver-ovn-control-plane 1/1 Running 0 119m 172.18.0.4 ovn-control-plane <none> <none> kube-system kube-controller-manager-ovn-control-plane 1/1 Running 0 119m 172.18.0.4 ovn-control-plane <none> <none> kube-system kube-scheduler-ovn-control-plane 1/1 Running 0 119m 172.18.0.4 ovn-control-plane <none> <none> local-path-storage local-path-provisioner-78776bfc44-cxrxs 1/1 Running 0 119m 10.244.0.4 ovn-worker2 <none> <none> ovn-kubernetes ovnkube-db-0 3/3 Running 0 118m 172.18.0.2 ovn-worker <none> <none> ovn-kubernetes ovnkube-db-1 3/3 Running 0 118m 172.18.0.3 ovn-worker2 <none> <none> ovn-kubernetes ovnkube-db-2 3/3 Running 0 118m 172.18.0.4 ovn-control-plane <none> <none> ovn-kubernetes ovnkube-master-848f8c6f4f-7chnx 3/3 Running 3 118m 172.18.0.2 ovn-worker <none> <none> ovn-kubernetes ovnkube-master-848f8c6f4f-kn8vc 3/3 Running 3 118m 172.18.0.4 ovn-control-plane <none> <none> ovn-kubernetes ovnkube-master-848f8c6f4f-vrtfr 3/3 Running 3 118m 172.18.0.3 ovn-worker2 <none> <none> ovn-kubernetes ovnkube-node-8shdq 3/3 Running 0 118m 172.18.0.2 ovn-worker <none> <none> ovn-kubernetes ovnkube-node-hfkml 3/3 Running 0 118m 172.18.0.3 ovn-worker2 <none> <none> ovn-kubernetes ovnkube-node-sc4n2 3/3 Running 0 118m 172.18.0.4 ovn-control-plane <none> <none> ovn-kubernetes ovs-node-7svcl 1/1 Running 0 118m 172.18.0.2 ovn-worker <none> <none> ovn-kubernetes ovs-node-9tfs4 1/1 Running 0 118m 172.18.0.3 ovn-worker2 <none> <none> ovn-kubernetes ovs-node-zgxbs 1/1 Running 0 118m 172.18.0.4 ovn-control-plane <none> <none> oc logs -c ovn-controller ovnkube-node-8shdq -n ovn-kubernetes ... 2021-01-08T12:08:42.862Z|00038|ovn_bfd|INFO|Enabled BFD on interface ovn-bf5d9f-0 2021-01-08T12:08:42.862Z|00039|ovn_bfd|INFO|Enabled BFD on interface ovn-99e049-0 ... The OpenFlows programmed as a result of RFE https://bugzilla.redhat.com/show_bug.cgi?id=1881826 will on ovn-worker create a group of type select, as follows: $ ovs-ofctl -O OpenFlow13 dump-groups br-int ... group_id=2,type=select,bucket=weight:100,actions=load:0x1->OXM_OF_PKT_REG4[48..63],resubmit(,21),bucket=weight:100,actions=load:0x2->OXM_OF_PKT_REG4[48..63],resubmit(,21) by modifying that flow manually to: group_id=2,type=select,bucket=weight:100,watch_port:"ovn-bf5d9f-0",actions=load:0x1->OXM_OF_PKT_REG4[48..63],resubmit(,21),bucket=weight:100,actions=load:0x2->OXM_OF_PKT_REG4[48..63],resubmit(,21) I have been able to implement the liveness check on the OVS port which determines the routing decision for that bucket, and has proven to work (i.e: avoid the bucket) if the node, corresponding to that OVS port, goes down (in this case, it was node ovn-worker2) This RFE thus requests that ovn-controller programs the OVS group with a watch_port for each specified nexthop in the logical router policy. If the OVS port cannot be determined from the nexthop IP provided by the CMS, then it would also be acceptable for the CMS to provide another identifier as nexthop which could help ovn-controller determine the OVS port. Some information concerning my cluster setup: 3 OpenShift nodes: -ovn-control-plane -ovn-worker -ovn-worker2 1 test pod: - netserver-0 which runs on ovn-worker 1 test node for the liveness check (which is stopped in my KIND setup, when I need to test the watch_port behavior): - ovn-worker 2 netserver-0 targets an external service continuously in a loop. My setup configures 2 OVN SNATs on ovn-worker and ovn-worker2. It also configures a logical router policy on the distributed gateway router (ovn_cluster_router) with the nexthops IPs of the join gateway router IPs of ovn-worker and ovn-worker2. It then finally creates the BFD sessions, as mentioned above and manually programs the watch_port on the group created by ovn-controller following the creation of that logical router policy. You will find the NB and SB DB from ovn-control-plane and ovn-worker, as well as other OVS information in the attachments. Version-Release number of selected component (if applicable): I am not sure which OVN version I am suppose to set, so I just set that to FDP 20.H. Feel free to change. How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Assigning to Numan, since he's most familiar with what is being requested.
Hi Alex, is this RFE still relevant? I remember we spoke some time back and I thought the conclusion was that watch_port wont help you here since it only checks for local port being up. I thought what you needed was BFD, but that is not supported on a distributed router.
Yes, the RFE is still relevant. I am not sure what you mean specifically, as I managed to make it work using a watch_port with a BFD session (see #comment 0)
Uploading some fresh databases as discussed off-band with Numan. The OVN-Kubernetes topology has changes slightly since this RFE was created, I have thus implemented the same POC as I originally did but on this slightly different topology. I am attaching the NB and SB DB This is the final outcome from my POC: the logical router policy; _uuid : a23ed366-90a5-4af3-a0d9-b655e3a73c5f action : reroute external_ids : {name=egressip} match : "ip4.src == 10.244.0.3" nexthop : [] nexthops : ["100.64.0.2", "100.64.0.4"] options : {} priority : 100 ends up writing an OVS group on the node: group_id=5,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=load:0x1->OXM_OF_PKT_REG4[48..63],resubmit(,21),bucket=bucket_id:1,weight:100,actions=load:0x2->OXM_OF_PKT_REG4[48..63],resubmit(,21) In the POC I have made I have enabled BFD sessions between the tunnel port on all cluster nodes. On the node in question (ovn-worker) we have the following OVS ports: _uuid : d9900576-0f01-4246-bdbc-19adee42d782 admin_state : up bfd : {enable="true"} bfd_status : {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up, state=up} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 4 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "26:e7:06:54:d4:69" mtu : [] mtu_request : [] name : ovn-ce36ce-0 ofport : 3 ofport_request : [] options : {csum="true", key=flow, remote_ip="172.18.0.4"} other_config : {} statistics : {rx_bytes=70624, rx_packets=1049, tx_bytes=70304, tx_packets=1059} status : {tunnel_egress_iface=breth0, tunnel_egress_iface_carrier=up} type : geneve _uuid : 5f669662-e6a6-4e48-a68d-adb6c7b9d757 admin_state : up bfd : {enable="true"} bfd_status : {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up, state=up} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 4 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "d6:b5:a2:61:58:e5" mtu : [] mtu_request : [] name : ovn-f4d5f8-0 ofport : 1 ofport_request : [] options : {csum="true", key=flow, remote_ip="172.18.0.2"} other_config : {} statistics : {rx_bytes=76520, rx_packets=1092, tx_bytes=74968, tx_packets=1116} status : {tunnel_egress_iface=breth0, tunnel_egress_iface_carrier=up} type : geneve The final outcome of this RFE is to have the logical router policy write out the OVS group as: group_id=5,type=select,selection_method=dp_hash,bucket=bucket_id:0,watch_port:ovn-f4d5f8-0,weight:100,actions=load:0x1->OXM_OF_PKT_REG4[48..63],resubmit(,21),bucket=bucket_id:1,watch_port:ovn-ce36ce-0,weight:100,actions=load:0x2->OXM_OF_PKT_REG4[48..63],resubmit(,21)
Created attachment 1788565 [details] NB DB
Created attachment 1788566 [details] SB DB