Hide Forgot
Description of problem: In our tests we create 35 app namesapces and then launch 35 namespaces, each namespace with 4 SPK emulation pods. This is on a 120 node cluster. Every app namespace is served by 4 SPK pods on a separate namespace. The annotation on each SPk pod are like k8s.ovn.org/bfd-enabled: true k8s.ovn.org/routing-namespaces: served-ns-5 k8s.ovn.org/routing-network: sriov-net-ens2f0-1 k8s.v1.cni.cncf.io/network-status: [{ "name": "sriov-net-ens2f0-1", "interface": "net1", "ips": [ "192.168.221.4" ], "dns": {} }] k8s.v1.cni.cncf.io/networks: [{ "name": "sriov-net-ens2f0-1", "ips": [ "192.168.221.4/21" ]}] Even after launching several application pods in each of the 35 namespaces, not all SPK pods establish BFD sessions with all the workers nodes that have app pods that the SPK pods serve. In our testing though, each SPK pod is establishing one BFD session with some random worker node. Version-Release number of selected component (if applicable): 4.7.28 How reproducible: 100% Steps to Reproduce: 1. kube-burner to create serving and served namespaces 2. 3. Actual results: We see BFD sessions established with only some worker nodes randomly not with all the workers that have app pods for that SPK. Expected results: Every SPK should establish BFD sessions with every worker node that has app pods being served by this SPK pod. Additional info:
In our test we had 35 SPK namespaces and 35 app namespaces each with 50 app pods. Attaching the must-gather and output of ovn-nbctl list bfd along with ovn dbs in below comment.
Ongoing Slack Discussion: https://coreos.slack.com/archives/C01G7T6SYSD/p1632469535137600 Update: Ok so even on a KIND cluster it actually behaves the same way as it does on the scale environment. Let me explain what happens from a OVN view point, and we can debate if that's an OVN bug or not. We have 2 worker nodes: ovn-worker & ovn-worker2. I have 2 SPK pods one serving bar namespace and one serving foo namespace. There are 3 app pods in bar namespace 1 on ovn-worker and 2 on ovn-worker2 and 0 app pods on foo namespace. Naturally we'd expect to have 2 BFD sessions established - 1 on ovn-worker and other on ovn-worker2 since both nodes have app pods that are served by a SPK pod. However the way OVN creates a BFD session is, it checks if lr-route-add provides a --bfd <uuid> to it while creating the ecmp route. If its not provided (OVN-K doesn't provide a uuid for bfd), OVN will do a lookup into BFD table based on nexthop, which in bfd terms is the dst-ip field. If it finds an existing bfd entry with that dst-ip already it won't create another session. If it doesn't find a bdf entry with that nexthop/dst-ip it would create a new bfd session with the specified outport/logical_port. So as of now, @sai or @murali you both are not doing anything wrong with the setup. Even on KIND I can't see a BFD session against each worker with an app pod served by that spk pod. Question is, is this a design or a bug? Will wait for updates from OVN team.
I have opened https://bugzilla.redhat.com/show_bug.cgi?id=2007549 to track a proper fix on the OVN side. Meanwhile I have created ovn-org/ovn-kubernetes/pull/2513 as a workaround to fix this.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056