Description of problem: This fix introduced an upper limit on the number of retries in ovnk for failed attempts. However, due to some compounding issues it causes pods to never come up. The scenario is like this: 1. a pod is added, fails to come up because the node is not ready 2. pod update happens, fails to come up because node is not ready (failed attempts incremented) 3. nodes 1..n come up, all immediately requesting retry of pods 4. backoff is ignored, cause immediate requests were made, the pod is retried n times, all fail, and failed attempts increment, causing the pod to be dropped These steps happen over a period of 30 seconds, just before the node is finally ready. We need a few adjustments to the logic in ovnk before we can introduce an upper limit.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399