2082599 – retry logic should have an upper bound on the number of failed attempts

Bug 2082599 - retry logic should have an upper bound on the number of failed attempts

Summary: retry logic should have an upper bound on the number of failed attempts

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.12.0
Assignee:	Riccardo Ravaioli
QA Contact:	Ross Brattain
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-05-06 14:08 UTC by Riccardo Ravaioli
Modified:	2023-01-17 19:49 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-01-17 19:48:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift ovn-kubernetes pull 1147	None	Merged	Bug 2082599: add upper bound to number of failed attempts	2022-07-25 16:33:46 UTC
Github	openshift ovn-kubernetes pull 1205	None	Merged	[DownstreamMerge] 4.12 initial merge from upstream: 7-18-22	2022-07-25 16:33:51 UTC
Red Hat Product Errata	RHSA-2022:7399	None	None	None	2023-01-17 19:49:04 UTC

Description Riccardo Ravaioli 2022-05-06 14:08:24 UTC

The generic retry logic now used for various resource types in ovnk master attempts to add or delete a given object indefinitely until it succeeds.

Just like with level-driven controllers, we should add an upper bound to number of retries, after which the retry entry should be discarded.

Comment 1 Riccardo Ravaioli 2022-05-06 14:16:13 UTC

upstream PR: https://github.com/ovn-org/ovn-kubernetes/pull/2970

Comment 10 Ross Brattain 2022-08-05 18:59:46 UTC

Scaled a few times up to 123 nodes, unable to get any failed attempts.


4.12.0-0.nightly-2022-08-05-045104


log_ovnkube-master-gk925_ip.eu-central-1.compute.internal:I0805 17:35:39.639802       1 obj_retry.go:1245] Retry successful for *v1.Pod openshift-multus/multus-b2449 after 0 failed attempt(s)
log_ovnkube-master-gk925_ip.eu-central-1.compute.internal:I0805 17:35:39.639896       1 obj_retry.go:1245] Retry successful for *v1.Pod openshift-dns/node-resolver-ktqd7 after 0 failed attempt(s)
log_ovnkube-master-gk925_ip.eu-central-1.compute.internal:I0805 17:35:39.639942       1 obj_retry.go:1245] Retry successful for *v1.Pod openshift-machine-config-operator/machine-config-daemon-grxwt after 0 failed attempt(s)
log_ovnkube-master-gk925_ip.eu-central-1.compute.internal:I0805 17:35:39.642075       1 obj_retry.go:1245] Retry successful for *v1.Pod openshift-network-diagnostics/network-check-target-brvnm after 0 failed attempt(s)
log_ovnkube-master-gk925_ip.eu-central-1.compute.internal:I0805 17:35:39.642480       1 obj_retry.go:1245] Retry successful for *v1.Pod openshift-multus/network-metrics-daemon-d9n6v after 0 failed attempt(s)

Comment 13 errata-xmlrpc 2023-01-17 19:48:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Note You need to log in before you can comment on or make changes to this bug.