1880161 – Actuator Update calls should have fixed retry time

Bug 1880161 - Actuator Update calls should have fixed retry time

Summary: Actuator Update calls should have fixed retry time

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Alexander Demicev
QA Contact:	Milind Yadav
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-17 19:54 UTC by Michael Gugino
Modified:	2021-02-24 15:20 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-24 15:18:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-api-operator pull 759	0	None	closed	Bug 1880161: Set retry timeout for actuator.Update() failures	2021-02-17 21:49:30 UTC
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:20:53 UTC

Description Michael Gugino 2020-09-17 19:54:14 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1880110 is one example (though this is not the root of the problem it did cause a delay).

For platforms that are slow to go from provisioning to provisioned (and possibly other phases), exponential backoff can cause several minutes between reconciles after several failed attempts.

We should probably set 30s as a static retry for things where appropriate.

Comment 1 Joel Speed 2020-09-30 16:14:48 UTC

This seems like a reasonable suggestion, I believe we have something already because of eventual consistency issues with creation in AWS? Let's get someone on the team to pick this up next sprint

Comment 3 Milind Yadav 2020-12-10 07:47:29 UTC

Validated on Disconnected IPI vpshere cluster  :




Steps :
1.Create a new machineset 

Actual and expected : Machineset created successfully

2.oc get machines 
Actual and expected :
[miyadav@miyadav ~]$ oc get machines
NAME                                 PHASE     TYPE   REGION   ZONE   AGE
miyadav1210vs-mf7kk-master-0         Running                          70m
miyadav1210vs-mf7kk-master-1         Running                          70m
miyadav1210vs-mf7kk-master-2         Running                          70m
miyadav1210vs-mf7kk-worker-mzbfh     Running                          66m
miyadav1210vs-mf7kk-worker-n-46zht   Running                          3m48s
miyadav1210vs-mf7kk-worker-n-8cxgl   Running                          3m48s
miyadav1210vs-mf7kk-worker-vhg9f     Running                          66m

3. oc get logs 
Actual & Expected :
[miyadav@miyadav ~]$ oc logs  machine-api-controllers-7644ff75b-96nq9 -c machine-controller | grep -i retrying
E1210 06:36:47.953376       1 controller.go:281] miyadav1210vs-mf7kk-master-1: error updating machine: Operation cannot be fulfilled on machines.machine.openshift.io "miyadav1210vs-mf7kk-master-1": the object has been modified; please apply your changes to the latest version and try again, retrying in 30s seconds
E1210 06:42:50.606158       1 controller.go:281] miyadav1210vs-mf7kk-worker-mzbfh: error updating machine: Timeout: request did not complete within requested timeout 34s, retrying in 30s seconds
E1210 07:39:26.277411       1 controller.go:281] miyadav1210vs-mf7kk-worker-n-8cxgl: error updating machine: miyadav1210vs-mf7kk-worker-n-8cxgl: reconciler failed to Update machine: task task-607014 has not finished, retrying in 30s seconds
E1210 07:39:26.902082       1 controller.go:281] miyadav1210vs-mf7kk-worker-n-46zht: error updating machine: miyadav1210vs-mf7kk-worker-n-46zht: reconciler failed to Update machine: task task-607015 has not finished, retrying in 30s seconds
[miyadav@miyadav ~]$ 


Additional Info:
Moved to VERIFIED

Comment 4 Milind Yadav 2020-12-10 07:48:16 UTC

4.7.0-0.nightly-2020-12-09-112139 - clusterversion used in above validation

Comment 7 errata-xmlrpc 2021-02-24 15:18:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.