Bug 2007443

Summary: [ICNI 2.0] Loadbalancer pods do not establish BFD sessions with all workers that host pods for the routed namespace
Product: OpenShift Container Platform Reporter: Sai Sindhur Malleni <smalleni>
Component: NetworkingAssignee: Surya Seetharaman <surya>
Networking sub component: ovn-kubernetes QA Contact: Yurii Prokulevych <yprokule>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: astoycos, dcbw, jlema, murali, surya, trozet, yprokule
Version: 4.7   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2012025 (view as bug list) Environment:
Last Closed: 2022-03-12 04:38:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2007549    
Bug Blocks: 2012025    

Description Sai Sindhur Malleni 2021-09-23 21:25:50 UTC
Description of problem:

In our tests we create 35 app namesapces and then launch 35 namespaces, each namespace with 4 SPK emulation pods. This is on a 120 node cluster. Every app namespace is served by 4 SPK pods on a separate namespace.
The annotation on each SPk pod are like

              k8s.ovn.org/bfd-enabled: true                     

              k8s.ovn.org/routing-namespaces: served-ns-5       
              k8s.ovn.org/routing-network: sriov-net-ens2f0-1   
              k8s.v1.cni.cncf.io/network-status:                
                [{                                              
                  "name": "sriov-net-ens2f0-1",                 
                  "interface": "net1",                          
                  "ips": [                                      
                      "192.168.221.4"                           
                   ],                                                   
                  "dns": {}                                     
                }]   
              k8s.v1.cni.cncf.io/networks: [{ "name": "sriov-net-ens2f0-1", "ips": [ "192.168.221.4/21" ]}]


Even after launching several application pods in each of the 35 namespaces, not all SPK pods establish BFD sessions with all the workers nodes that have app pods that the SPK pods serve. In our testing though, each SPK pod is establishing one BFD session with some random worker node.

Version-Release number of selected component (if applicable):

4.7.28
How reproducible:
100%

Steps to Reproduce:
1. kube-burner to create serving and served namespaces
2.
3.

Actual results:
We see BFD sessions established with only some worker nodes randomly not with all the workers that have app pods for that SPK.

Expected results:
Every SPK should establish BFD sessions with every worker node that has app pods being served by this SPK pod.

Additional info:

Comment 1 Sai Sindhur Malleni 2021-09-23 23:27:22 UTC
In our test we had 35 SPK namespaces and 35 app namespaces each with 50 app pods. Attaching the must-gather and output of ovn-nbctl list bfd along with ovn dbs in below comment.

Comment 3 Surya Seetharaman 2021-09-24 07:58:55 UTC
Ongoing Slack Discussion: https://coreos.slack.com/archives/C01G7T6SYSD/p1632469535137600

Update:

Ok so even on a KIND cluster it actually behaves the same way as it does on the scale environment. Let me explain what happens from a OVN view point, and we can debate if that's an OVN bug or not.


We have 2 worker nodes: ovn-worker & ovn-worker2. I have 2 SPK pods one serving bar namespace and one serving foo namespace. There are 3 app pods in bar namespace 1 on ovn-worker and 2 on ovn-worker2 and 0 app pods on foo namespace. Naturally we'd expect to have 2 BFD sessions established - 1 on ovn-worker and other on ovn-worker2 since both nodes have app pods that are served by a SPK pod. However the way OVN creates a BFD session is, it checks if lr-route-add provides a --bfd <uuid> to it while creating the ecmp route. If its not provided (OVN-K doesn't provide a uuid for bfd), OVN will do a lookup into BFD table based on nexthop, which in bfd terms is the dst-ip field. If it finds an existing bfd entry with that dst-ip already it won't create another session. If it doesn't find a bdf entry with that nexthop/dst-ip it would create a new bfd session with the specified outport/logical_port.


So as of now, @sai or @murali you both are not doing anything wrong with the setup. Even on KIND I can't see a BFD session against each worker with an app pod served by that spk pod. Question is, is this a design or a bug? 

Will wait for updates from OVN team.

Comment 4 Surya Seetharaman 2021-09-24 11:38:48 UTC
I have opened https://bugzilla.redhat.com/show_bug.cgi?id=2007549 to track a proper fix on the OVN side. Meanwhile I have created ovn-org/ovn-kubernetes/pull/2513 as a workaround to fix this.

Comment 13 errata-xmlrpc 2022-03-12 04:38:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056