Bug 1996886
| Summary: | timedout waiting for flows during pod creation and ovn-controller pegged on worker nodes | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Sai Sindhur Malleni <smalleni> | |
| Component: | Networking | Assignee: | Surya Seetharaman <surya> | |
| Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | urgent | |||
| Priority: | urgent | CC: | astoycos, dblack, dcbw, jlema, murali, numan.siddique, rkhan, trozet, yprokule | |
| Version: | 4.7 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.10.0 | |||
| Hardware: | x86_64 | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2011385 (view as bug list) | Environment: | ||
| Last Closed: | 2022-03-10 16:05:43 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1978605 | |||
| Bug Blocks: | 2011385 | |||
|
Description
Sai Sindhur Malleni
2021-08-23 21:57:21 UTC
We also observed several nbdb leader election in this case. Could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1962344 (In reply to Tim Rozet from comment #3) > Could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1962344 Tim, But seems like the fix to that is only for shared gateway mode but not local gateway mode? We are using local gateway mode in our tests. (In reply to Sai Sindhur Malleni from comment #4) > (In reply to Tim Rozet from comment #3) > > Could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1962344 > > Tim, > > But seems like the fix to that is only for shared gateway mode but not local > gateway mode? We are using local gateway mode in our tests. Tim has been talking to Han and Team to do the same for LGW, if not we'll take it up ourselves. Btw I know we have a lot of 4.7.z bugs for scale related to staleness and/or pod latency and/or creation/deletion. I'll need to do some live debugging on the cluster and discover what's happening. At the very least when you see such problems happening could you grab a full must-gather and also with the gather_network_logs and attach it to the bz please? I can try and go through that at least. cc @smalleni and @jlema This bug will be used to track the local gateway changes, which will be irrelevant for 4.9 and later. Filed a bug in OVN to track the dependency there: https://bugzilla.redhat.com/show_bug.cgi?id=2007694 4.7.24 and later versions include OVN 20.12-140 that has a number of fixes to greatly reduce Logical Flows, including for ICNIv2. Do we see an improvement with 4.7.24 and later in the tests? Seeing this in 4.7.28 as well - provided the must-gather and DBs to Dan over slack. root cause of this was determined to be https://bugzilla.redhat.com/show_bug.cgi?id=1978605 OVN version with the fix is already present in 4.10 and 4.9 *** Bug 2011110 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |