2007443 – [ICNI 2.0] Loadbalancer pods do not establish BFD sessions with all workers that host pods for the routed namespace

Bug 2007443 - [ICNI 2.0] Loadbalancer pods do not establish BFD sessions with all workers that host pods for the routed namespace

Summary: [ICNI 2.0] Loadbalancer pods do not establish BFD sessions with all workers t...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Surya Seetharaman
QA Contact:	Yurii Prokulevych
Docs Contact:
URL:
Whiteboard:
Depends On:	2007549
Blocks:	2012025
TreeView+	depends on / blocked

Reported:	2021-09-23 21:25 UTC by Sai Sindhur Malleni
Modified:	2022-03-12 04:38 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	2012025 (view as bug list)
Environment:
Last Closed:	2022-03-12 04:38:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift ovn-kubernetes pull 784	None	open	Bug 2007443: bump OVN to ovn21.09-21.09.0-20.el8fdp	2021-10-06 21:18:15 UTC
Github	ovn-org ovn-kubernetes pull 2513	None	open	Create bfd sessions and feed it into lr-route-add	2021-09-24 11:37:57 UTC
Red Hat Product Errata	RHSA-2022:0056	None	None	None	2022-03-12 04:38:46 UTC

Description Sai Sindhur Malleni 2021-09-23 21:25:50 UTC

Description of problem:

In our tests we create 35 app namesapces and then launch 35 namespaces, each namespace with 4 SPK emulation pods. This is on a 120 node cluster. Every app namespace is served by 4 SPK pods on a separate namespace.
The annotation on each SPk pod are like

              k8s.ovn.org/bfd-enabled: true                     

              k8s.ovn.org/routing-namespaces: served-ns-5       
              k8s.ovn.org/routing-network: sriov-net-ens2f0-1   
              k8s.v1.cni.cncf.io/network-status:                
                [{                                              
                  "name": "sriov-net-ens2f0-1",                 
                  "interface": "net1",                          
                  "ips": [                                      
                      "192.168.221.4"                           
                   ],                                                   
                  "dns": {}                                     
                }]   
              k8s.v1.cni.cncf.io/networks: [{ "name": "sriov-net-ens2f0-1", "ips": [ "192.168.221.4/21" ]}]


Even after launching several application pods in each of the 35 namespaces, not all SPK pods establish BFD sessions with all the workers nodes that have app pods that the SPK pods serve. In our testing though, each SPK pod is establishing one BFD session with some random worker node.

Version-Release number of selected component (if applicable):

4.7.28
How reproducible:
100%

Steps to Reproduce:
1. kube-burner to create serving and served namespaces
2.
3.

Actual results:
We see BFD sessions established with only some worker nodes randomly not with all the workers that have app pods for that SPK.

Expected results:
Every SPK should establish BFD sessions with every worker node that has app pods being served by this SPK pod.

Additional info:

Comment 1 Sai Sindhur Malleni 2021-09-23 23:27:22 UTC

In our test we had 35 SPK namespaces and 35 app namespaces each with 50 app pods. Attaching the must-gather and output of ovn-nbctl list bfd along with ovn dbs in below comment.

Comment 3 Surya Seetharaman 2021-09-24 07:58:55 UTC

Ongoing Slack Discussion: https://coreos.slack.com/archives/C01G7T6SYSD/p1632469535137600

Update:

Ok so even on a KIND cluster it actually behaves the same way as it does on the scale environment. Let me explain what happens from a OVN view point, and we can debate if that's an OVN bug or not.


We have 2 worker nodes: ovn-worker & ovn-worker2. I have 2 SPK pods one serving bar namespace and one serving foo namespace. There are 3 app pods in bar namespace 1 on ovn-worker and 2 on ovn-worker2 and 0 app pods on foo namespace. Naturally we'd expect to have 2 BFD sessions established - 1 on ovn-worker and other on ovn-worker2 since both nodes have app pods that are served by a SPK pod. However the way OVN creates a BFD session is, it checks if lr-route-add provides a --bfd <uuid> to it while creating the ecmp route. If its not provided (OVN-K doesn't provide a uuid for bfd), OVN will do a lookup into BFD table based on nexthop, which in bfd terms is the dst-ip field. If it finds an existing bfd entry with that dst-ip already it won't create another session. If it doesn't find a bdf entry with that nexthop/dst-ip it would create a new bfd session with the specified outport/logical_port.


So as of now, @sai or @murali you both are not doing anything wrong with the setup. Even on KIND I can't see a BFD session against each worker with an app pod served by that spk pod. Question is, is this a design or a bug? 

Will wait for updates from OVN team.

Comment 4 Surya Seetharaman 2021-09-24 11:38:48 UTC

I have opened https://bugzilla.redhat.com/show_bug.cgi?id=2007549 to track a proper fix on the OVN side. Meanwhile I have created ovn-org/ovn-kubernetes/pull/2513 as a workaround to fix this.

Comment 13 errata-xmlrpc 2022-03-12 04:38:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.