2100249 – Revert Bug 2082599: add upper bound to number of failed attempts

Bug 2100249 - Revert Bug 2082599: add upper bound to number of failed attempts

Summary: Revert Bug 2082599: add upper bound to number of failed attempts

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.12.0
Assignee:	Tim Rozet
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2101498
TreeView+	depends on / blocked

Reported:	2022-06-22 20:48 UTC by Tim Rozet
Modified:	2023-01-17 19:50 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-01-17 19:50:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift ovn-kubernetes pull 1161	0	None	Merged	Bug 2100249: Revert "Bug 2082599: add upper bound to number of failed attempts"	2022-06-24 21:35:56 UTC
Red Hat Product Errata	RHSA-2022:7399	0	None	None	None	2023-01-17 19:50:45 UTC

Description Tim Rozet 2022-06-22 20:48:55 UTC

Description of problem:
This fix introduced an upper limit on the number of retries in ovnk for failed attempts. However, due to some compounding issues it causes pods to never come up. The scenario is like this:

1. a pod is added, fails to come up because the node is not ready
2. pod update happens, fails to come up because node is not ready (failed attempts incremented)
3. nodes 1..n come up, all immediately requesting retry of pods
4. backoff is ignored, cause immediate requests were made, the pod is retried n times, all fail, and failed attempts increment, causing the pod to be dropped

These steps happen over a period of 30 seconds, just before the node is finally ready. We need a few adjustments to the logic in ovnk before we can introduce an upper limit.

Comment 6 errata-xmlrpc 2023-01-17 19:50:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Note You need to log in before you can comment on or make changes to this bug.