Bug 1914282 - [RFE] implement watch_port for logical router policies with multiple nexthops
Summary: [RFE] implement watch_port for logical router policies with multiple nexthops
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn2.13
Version: FDP 20.H
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-08 14:20 UTC by Alexander Constantinescu
Modified: 2023-07-13 07:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Setup information from ovn-control-plane and ovn-worker (366.46 KB, application/gzip)
2021-01-08 14:20 UTC, Alexander Constantinescu
no flags Details
NB DB (94.33 KB, text/plain)
2021-06-01 19:48 UTC, Alexander Constantinescu
no flags Details
SB DB (457.98 KB, text/plain)
2021-06-01 19:49 UTC, Alexander Constantinescu
no flags Details

Description Alexander Constantinescu 2021-01-08 14:20:06 UTC
Created attachment 1745602 [details]
Setup information from ovn-control-plane and ovn-worker

Description of problem:

RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1881826 implemented ECMP for logical router policies. This is to say: traffic is load balanced between the nexthops, which are programmed by the CMS in the logical router policy column "nexthops". The next step for a complete feature is supporting a BFD like liveness check on these nexthops as to be able to make accurate routing decisions based on if the route to the nexthop is healthy or not. 

OVN already allows the CMS to enable BFD sessions between gateway routers (http://www.openvswitch.org/support/dist-docs/ovn-nb.5.html) by specifying multiple gateway_chassis on the logical router port connected to a distributed gateway router. I have used this feature in combination with manual modification of the OVS flows to implemented the wanted behavior in a POC. You can find all information attached to this BZ. 

Essentially what I've done is create a gateway_chassis for each node in my cluster:

$ ovn-nbctl list gateway_chassis
_uuid               : bd11e875-178f-4405-908f-c27e8b21c717
chassis_name        : "c0cfd470-ff02-409d-ac28-c0959af86e95"
external_ids        : {dgp_name=rtos-node_local_switch}
name                : rtos-node_local_switch_c0cfd470-ff02-409d-ac28-c0959af86e95
options             : {}
priority            : 100

_uuid               : 7ff87c9e-8520-4f6a-9fc8-097343c03340
chassis_name        : "bf5d9f87-3a42-4573-b6f8-52e4349bece8"
external_ids        : {dgp_name=rtos-node_local_switch}
name                : rtos-node_local_switch_bf5d9f87-3a42-4573-b6f8-52e4349bece8
options             : {}
priority            : 100

_uuid               : 20ad871b-f7c2-4cef-ab51-08ddc0531a5f
chassis_name        : "99e049b5-a881-4a51-a24d-e01d7cee446a"
external_ids        : {dgp_name=rtos-node_local_switch}
name                : rtos-node_local_switch_99e049b5-a881-4a51-a24d-e01d7cee446a
options             : {}
priority            : 100


These gateway_chassis are then attached to the logical router port connected to the distributed gateway router:

$ ovn-nbctl list logical_router_port
_uuid               : 80f9f38a-a58e-401d-aca6-7a33c7be86b4
enabled             : []
external_ids        : {}
gateway_chassis     : [20ad871b-f7c2-4cef-ab51-08ddc0531a5f, 7ff87c9e-8520-4f6a-9fc8-097343c03340, bd11e875-178f-4405-908f-c27e8b21c717]
ha_chassis_group    : []
ipv6_prefix         : []
ipv6_ra_configs     : {}
mac                 : "0a:58:a9:fe:00:02"
name                : rtos-node_local_switch
networks            : ["169.254.0.2/20"]
options             : {}
peer                : []

$ ovn-nbctl list logical_router
_uuid               : cb712e31-0250-4e73-9090-0d6a37df0996
enabled             : []
external_ids        : {k8s-cluster-router=yes, k8s-ovn-topo-version="1"}
load_balancer       : []
name                : ovn_cluster_router
nat                 : [35e33d26-be8a-4680-9c4e-caf28762bc24, c6853103-3e93-4d5b-908f-a0b47cf51178, c73262f2-1708-4dae-847f-61e570550812]
options             : {}
policies            : [12a0e838-d5f8-4a19-9c82-3c9178ecc318, 1343ae93-7011-4bc7-ab5f-01235db02159, 30716998-4391-4801-8f04-d4fc15cf5c15, 30ad8058-bcc4-47f3-99db-592063ecf002, 380a22c5-f5c9-4771-bb80-ee1179fed569, 46a61e99-b3df-4768-9db6-3954a4535254, 55d06d99-5bf8-4c18-9016-b24a0937cd5b, 5eb37b4c-14bb-47c4-8899-d629b3c22601, 6777dae3-5380-4229-b136-17ecbe09e454, 6a128443-9ac8-4559-af2f-a769ab24a7d8, 7acdd6e6-8ef3-4a5d-b88b-a2c6ba3f676b, 9ac20de8-37dd-47eb-84b2-1e41d64d7801, b541bf15-fd7d-468b-bee7-93d0845cc263, e05c9c57-558e-4286-935e-e1174ca1a7ee]
ports               : [10ec8a5b-554e-40e2-999f-32e753e37ead, 3fe07cc8-db7e-47c2-86ca-cb4be5ddc3de, 490ca095-755a-457b-ac53-8eb24faa0ddb, 56f003c7-61a6-4a55-8928-ed29e0d747f5, 80f9f38a-a58e-401d-aca6-7a33c7be86b4]
static_routes       : [01d0e96c-45b8-43b4-9c37-fe3f3f213786, 0a113866-465b-4d19-a2d1-2937eb22e223, 2878ca40-05d5-4cfd-8417-e5b107d1d31c, 5d315d42-2954-44c4-99c4-b85a95671a72, 6635fb84-4ee8-4eab-b15d-6fdb6ab377f3, 7e7ea132-5973-431b-9605-c5a5ab5d4247, a1aae7b6-7f6c-47fa-a378-11f39822e107, a79841e5-a701-43d0-97e1-b8683a1283bf, c312443d-784d-4215-81a2-bbbe7591a2f0]

Doing this will enable BFD sessions between all nodes by ovn-controller, as seen in it's logs:

oc get pod -A -owide
NAMESPACE            NAME                                        READY   STATUS    RESTARTS   AGE    IP           NODE                NOMINATED NODE   READINESS GATES
default              netserver-0                                 1/1     Running   0          56m    10.244.2.4   ovn-worker          <none>           <none>
kube-system          coredns-f9fd979d6-dd479                     1/1     Running   0          119m   10.244.0.3   ovn-worker2         <none>           <none>
kube-system          coredns-f9fd979d6-jwpwd                     1/1     Running   0          119m   10.244.2.3   ovn-worker          <none>           <none>
kube-system          etcd-ovn-control-plane                      1/1     Running   0          119m   172.18.0.4   ovn-control-plane   <none>           <none>
kube-system          kube-apiserver-ovn-control-plane            1/1     Running   0          119m   172.18.0.4   ovn-control-plane   <none>           <none>
kube-system          kube-controller-manager-ovn-control-plane   1/1     Running   0          119m   172.18.0.4   ovn-control-plane   <none>           <none>
kube-system          kube-scheduler-ovn-control-plane            1/1     Running   0          119m   172.18.0.4   ovn-control-plane   <none>           <none>
local-path-storage   local-path-provisioner-78776bfc44-cxrxs     1/1     Running   0          119m   10.244.0.4   ovn-worker2         <none>           <none>
ovn-kubernetes       ovnkube-db-0                                3/3     Running   0          118m   172.18.0.2   ovn-worker          <none>           <none>
ovn-kubernetes       ovnkube-db-1                                3/3     Running   0          118m   172.18.0.3   ovn-worker2         <none>           <none>
ovn-kubernetes       ovnkube-db-2                                3/3     Running   0          118m   172.18.0.4   ovn-control-plane   <none>           <none>
ovn-kubernetes       ovnkube-master-848f8c6f4f-7chnx             3/3     Running   3          118m   172.18.0.2   ovn-worker          <none>           <none>
ovn-kubernetes       ovnkube-master-848f8c6f4f-kn8vc             3/3     Running   3          118m   172.18.0.4   ovn-control-plane   <none>           <none>
ovn-kubernetes       ovnkube-master-848f8c6f4f-vrtfr             3/3     Running   3          118m   172.18.0.3   ovn-worker2         <none>           <none>
ovn-kubernetes       ovnkube-node-8shdq                          3/3     Running   0          118m   172.18.0.2   ovn-worker          <none>           <none>
ovn-kubernetes       ovnkube-node-hfkml                          3/3     Running   0          118m   172.18.0.3   ovn-worker2         <none>           <none>
ovn-kubernetes       ovnkube-node-sc4n2                          3/3     Running   0          118m   172.18.0.4   ovn-control-plane   <none>           <none>
ovn-kubernetes       ovs-node-7svcl                              1/1     Running   0          118m   172.18.0.2   ovn-worker          <none>           <none>
ovn-kubernetes       ovs-node-9tfs4                              1/1     Running   0          118m   172.18.0.3   ovn-worker2         <none>           <none>
ovn-kubernetes       ovs-node-zgxbs                              1/1     Running   0          118m   172.18.0.4   ovn-control-plane   <none>           <none>

oc logs -c ovn-controller ovnkube-node-8shdq -n ovn-kubernetes 
...
2021-01-08T12:08:42.862Z|00038|ovn_bfd|INFO|Enabled BFD on interface ovn-bf5d9f-0
2021-01-08T12:08:42.862Z|00039|ovn_bfd|INFO|Enabled BFD on interface ovn-99e049-0
...

The OpenFlows programmed as a result of RFE https://bugzilla.redhat.com/show_bug.cgi?id=1881826 will on ovn-worker create a group of type select, as follows:

$ ovs-ofctl -O OpenFlow13 dump-groups br-int
...
 group_id=2,type=select,bucket=weight:100,actions=load:0x1->OXM_OF_PKT_REG4[48..63],resubmit(,21),bucket=weight:100,actions=load:0x2->OXM_OF_PKT_REG4[48..63],resubmit(,21)

by modifying that flow manually to:

group_id=2,type=select,bucket=weight:100,watch_port:"ovn-bf5d9f-0",actions=load:0x1->OXM_OF_PKT_REG4[48..63],resubmit(,21),bucket=weight:100,actions=load:0x2->OXM_OF_PKT_REG4[48..63],resubmit(,21)

I have been able to implement the liveness check on the OVS port which determines the routing decision for that bucket, and has proven to work (i.e: avoid the bucket) if the node, corresponding to that OVS port, goes down (in this case, it was node ovn-worker2)

This RFE thus requests that ovn-controller programs the OVS group with a watch_port for each specified nexthop in the logical router policy. If the OVS port cannot be determined from the nexthop IP provided by the CMS, then it would also be acceptable for the CMS to provide another identifier as nexthop which could help ovn-controller determine the OVS port. 

Some information concerning my cluster setup:

3 OpenShift nodes: 

-ovn-control-plane
-ovn-worker
-ovn-worker2

1 test pod:

- netserver-0 which runs on ovn-worker 

1 test node for the liveness check (which is stopped in my KIND setup, when I need to test the watch_port behavior):

- ovn-worker 2

netserver-0 targets an external service continuously in a loop. 

My setup configures 2 OVN SNATs on ovn-worker and ovn-worker2. It also configures a logical router policy on the distributed gateway router (ovn_cluster_router) with the nexthops IPs of the join gateway router IPs of ovn-worker and ovn-worker2. It then finally creates the BFD sessions, as mentioned above and manually programs the watch_port on the group created by ovn-controller following the creation of that logical router policy. 

You will find the NB and SB DB from ovn-control-plane and ovn-worker, as well as other OVS information in the attachments.  


Version-Release number of selected component (if applicable):

I am not sure which OVN version I am suppose to set, so I just set that to FDP 20.H. Feel free to change. 

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Alexander Constantinescu 2021-01-08 14:22:42 UTC
Assigning to Numan, since he's most familiar with what is being requested.

Comment 2 Tim Rozet 2021-03-29 14:49:40 UTC
Hi Alex, is this RFE still relevant? I remember we spoke some time back and I thought the conclusion was that watch_port wont help you here since it only checks for local port being up. I thought what you needed was BFD, but that is not supported on a distributed router.

Comment 3 Alexander Constantinescu 2021-04-15 10:20:39 UTC
Yes, the RFE is still relevant. I am not sure what you mean specifically, as I managed to make it work using a watch_port with a BFD session (see #comment 0)

Comment 4 Alexander Constantinescu 2021-06-01 19:47:42 UTC
Uploading some fresh databases as discussed off-band with Numan. The OVN-Kubernetes topology has changes slightly since this RFE was created, I have thus implemented the same POC as I originally did but on this slightly different topology. I am attaching the NB and SB DB

This is the final outcome from my POC: the logical router policy;

_uuid               : a23ed366-90a5-4af3-a0d9-b655e3a73c5f
action              : reroute
external_ids        : {name=egressip}
match               : "ip4.src == 10.244.0.3"
nexthop             : []
nexthops            : ["100.64.0.2", "100.64.0.4"]
options             : {}
priority            : 100

ends up writing an OVS group on the node:

group_id=5,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=load:0x1->OXM_OF_PKT_REG4[48..63],resubmit(,21),bucket=bucket_id:1,weight:100,actions=load:0x2->OXM_OF_PKT_REG4[48..63],resubmit(,21)

In the POC I have made I have enabled BFD sessions between the tunnel port on all cluster nodes. On the node in question (ovn-worker) we have the following OVS ports:


 _uuid               : d9900576-0f01-4246-bdbc-19adee42d782
admin_state         : up
bfd                 : {enable="true"}
bfd_status          : {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up, state=up}
cfm_fault           : []
cfm_fault_status    : []
cfm_flap_count      : []
cfm_health          : []
cfm_mpid            : []
cfm_remote_mpids    : []
cfm_remote_opstate  : []
duplex              : []
error               : []
external_ids        : {}
ifindex             : 4
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current        : []
link_resets         : 0
link_speed          : []
link_state          : up
lldp                : {}
mac                 : []
mac_in_use          : "26:e7:06:54:d4:69"
mtu                 : []
mtu_request         : []
name                : ovn-ce36ce-0
ofport              : 3
ofport_request      : []
options             : {csum="true", key=flow, remote_ip="172.18.0.4"}
other_config        : {}
statistics          : {rx_bytes=70624, rx_packets=1049, tx_bytes=70304, tx_packets=1059}
status              : {tunnel_egress_iface=breth0, tunnel_egress_iface_carrier=up}
type                : geneve


_uuid               : 5f669662-e6a6-4e48-a68d-adb6c7b9d757
admin_state         : up
bfd                 : {enable="true"}
bfd_status          : {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up, state=up}
cfm_fault           : []
cfm_fault_status    : []
cfm_flap_count      : []
cfm_health          : []
cfm_mpid            : []
cfm_remote_mpids    : []
cfm_remote_opstate  : []
duplex              : []
error               : []
external_ids        : {}
ifindex             : 4
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current        : []
link_resets         : 0
link_speed          : []
link_state          : up
lldp                : {}
mac                 : []
mac_in_use          : "d6:b5:a2:61:58:e5"
mtu                 : []
mtu_request         : []
name                : ovn-f4d5f8-0
ofport              : 1
ofport_request      : []
options             : {csum="true", key=flow, remote_ip="172.18.0.2"}
other_config        : {}
statistics          : {rx_bytes=76520, rx_packets=1092, tx_bytes=74968, tx_packets=1116}
status              : {tunnel_egress_iface=breth0, tunnel_egress_iface_carrier=up}
type                : geneve

The final outcome of this RFE is to have the logical router policy write out the OVS group as:

group_id=5,type=select,selection_method=dp_hash,bucket=bucket_id:0,watch_port:ovn-f4d5f8-0,weight:100,actions=load:0x1->OXM_OF_PKT_REG4[48..63],resubmit(,21),bucket=bucket_id:1,watch_port:ovn-ce36ce-0,weight:100,actions=load:0x2->OXM_OF_PKT_REG4[48..63],resubmit(,21)

Comment 5 Alexander Constantinescu 2021-06-01 19:48:39 UTC
Created attachment 1788565 [details]
NB DB

Comment 6 Alexander Constantinescu 2021-06-01 19:49:19 UTC
Created attachment 1788566 [details]
SB DB


Note You need to log in before you can comment on or make changes to this bug.