Bug 2100249 - Revert Bug 2082599: add upper bound to number of failed attempts
Summary: Revert Bug 2082599: add upper bound to number of failed attempts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.12.0
Assignee: Tim Rozet
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks: 2101498
TreeView+ depends on / blocked
 
Reported: 2022-06-22 20:48 UTC by Tim Rozet
Modified: 2023-01-17 19:50 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:50:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 1161 0 None Merged Bug 2100249: Revert "Bug 2082599: add upper bound to number of failed attempts" 2022-06-24 21:35:56 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:50:45 UTC

Description Tim Rozet 2022-06-22 20:48:55 UTC
Description of problem:
This fix introduced an upper limit on the number of retries in ovnk for failed attempts. However, due to some compounding issues it causes pods to never come up. The scenario is like this:

1. a pod is added, fails to come up because the node is not ready
2. pod update happens, fails to come up because node is not ready (failed attempts incremented)
3. nodes 1..n come up, all immediately requesting retry of pods
4. backoff is ignored, cause immediate requests were made, the pod is retried n times, all fail, and failed attempts increment, causing the pod to be dropped

These steps happen over a period of 30 seconds, just before the node is finally ready. We need a few adjustments to the logic in ovnk before we can introduce an upper limit.

Comment 6 errata-xmlrpc 2023-01-17 19:50:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.