Bug 2007443
Summary: | [ICNI 2.0] Loadbalancer pods do not establish BFD sessions with all workers that host pods for the routed namespace | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Sai Sindhur Malleni <smalleni> | |
Component: | Networking | Assignee: | Surya Seetharaman <surya> | |
Networking sub component: | ovn-kubernetes | QA Contact: | Yurii Prokulevych <yprokule> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | astoycos, dcbw, jlema, murali, surya, trozet, yprokule | |
Version: | 4.7 | |||
Target Milestone: | --- | |||
Target Release: | 4.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2012025 (view as bug list) | Environment: | ||
Last Closed: | 2022-03-12 04:38:27 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 2007549 | |||
Bug Blocks: | 2012025 |
Description
Sai Sindhur Malleni
2021-09-23 21:25:50 UTC
In our test we had 35 SPK namespaces and 35 app namespaces each with 50 app pods. Attaching the must-gather and output of ovn-nbctl list bfd along with ovn dbs in below comment. Ongoing Slack Discussion: https://coreos.slack.com/archives/C01G7T6SYSD/p1632469535137600 Update: Ok so even on a KIND cluster it actually behaves the same way as it does on the scale environment. Let me explain what happens from a OVN view point, and we can debate if that's an OVN bug or not. We have 2 worker nodes: ovn-worker & ovn-worker2. I have 2 SPK pods one serving bar namespace and one serving foo namespace. There are 3 app pods in bar namespace 1 on ovn-worker and 2 on ovn-worker2 and 0 app pods on foo namespace. Naturally we'd expect to have 2 BFD sessions established - 1 on ovn-worker and other on ovn-worker2 since both nodes have app pods that are served by a SPK pod. However the way OVN creates a BFD session is, it checks if lr-route-add provides a --bfd <uuid> to it while creating the ecmp route. If its not provided (OVN-K doesn't provide a uuid for bfd), OVN will do a lookup into BFD table based on nexthop, which in bfd terms is the dst-ip field. If it finds an existing bfd entry with that dst-ip already it won't create another session. If it doesn't find a bdf entry with that nexthop/dst-ip it would create a new bfd session with the specified outport/logical_port. So as of now, @sai or @murali you both are not doing anything wrong with the setup. Even on KIND I can't see a BFD session against each worker with an app pod served by that spk pod. Question is, is this a design or a bug? Will wait for updates from OVN team. I have opened https://bugzilla.redhat.com/show_bug.cgi?id=2007549 to track a proper fix on the OVN side. Meanwhile I have created ovn-org/ovn-kubernetes/pull/2513 as a workaround to fix this. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |