Bug 2100249

Summary: Revert Bug 2082599: add upper bound to number of failed attempts
Product: OpenShift Container Platform Reporter: Tim Rozet <trozet>
Component: NetworkingAssignee: Tim Rozet <trozet>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: surya
Version: 4.11   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:50:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2101498    

Description Tim Rozet 2022-06-22 20:48:55 UTC
Description of problem:
This fix introduced an upper limit on the number of retries in ovnk for failed attempts. However, due to some compounding issues it causes pods to never come up. The scenario is like this:

1. a pod is added, fails to come up because the node is not ready
2. pod update happens, fails to come up because node is not ready (failed attempts incremented)
3. nodes 1..n come up, all immediately requesting retry of pods
4. backoff is ignored, cause immediate requests were made, the pod is retried n times, all fail, and failed attempts increment, causing the pod to be dropped

These steps happen over a period of 30 seconds, just before the node is finally ready. We need a few adjustments to the logic in ovnk before we can introduce an upper limit.

Comment 6 errata-xmlrpc 2023-01-17 19:50:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399