Bug 1880161

Summary:	Actuator Update calls should have fixed retry time
Product:	OpenShift Container Platform	Reporter:	Michael Gugino <mgugino>
Component:	Cloud Compute	Assignee:	Alexander Demicev <ademicev>
Cloud Compute sub component:	Other Providers	QA Contact:	Milind Yadav <miyadav>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	unspecified	CC:	ademicev
Version:	4.6	Keywords:	NeedsTestCase
Target Milestone:	---
Target Release:	4.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-02-24 15:18:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Michael Gugino 2020-09-17 19:54:14 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1880110 is one example (though this is not the root of the problem it did cause a delay).

For platforms that are slow to go from provisioning to provisioned (and possibly other phases), exponential backoff can cause several minutes between reconciles after several failed attempts.

We should probably set 30s as a static retry for things where appropriate.

Comment 1 Joel Speed 2020-09-30 16:14:48 UTC

This seems like a reasonable suggestion, I believe we have something already because of eventual consistency issues with creation in AWS? Let's get someone on the team to pick this up next sprint

Comment 3 Milind Yadav 2020-12-10 07:47:29 UTC

Validated on Disconnected IPI vpshere cluster  :




Steps :
1.Create a new machineset 

Actual and expected : Machineset created successfully

2.oc get machines 
Actual and expected :
[miyadav@miyadav ~]$ oc get machines
NAME                                 PHASE     TYPE   REGION   ZONE   AGE
miyadav1210vs-mf7kk-master-0         Running                          70m
miyadav1210vs-mf7kk-master-1         Running                          70m
miyadav1210vs-mf7kk-master-2         Running                          70m
miyadav1210vs-mf7kk-worker-mzbfh     Running                          66m
miyadav1210vs-mf7kk-worker-n-46zht   Running                          3m48s
miyadav1210vs-mf7kk-worker-n-8cxgl   Running                          3m48s
miyadav1210vs-mf7kk-worker-vhg9f     Running                          66m

3. oc get logs 
Actual & Expected :
[miyadav@miyadav ~]$ oc logs  machine-api-controllers-7644ff75b-96nq9 -c machine-controller | grep -i retrying
E1210 06:36:47.953376       1 controller.go:281] miyadav1210vs-mf7kk-master-1: error updating machine: Operation cannot be fulfilled on machines.machine.openshift.io "miyadav1210vs-mf7kk-master-1": the object has been modified; please apply your changes to the latest version and try again, retrying in 30s seconds
E1210 06:42:50.606158       1 controller.go:281] miyadav1210vs-mf7kk-worker-mzbfh: error updating machine: Timeout: request did not complete within requested timeout 34s, retrying in 30s seconds
E1210 07:39:26.277411       1 controller.go:281] miyadav1210vs-mf7kk-worker-n-8cxgl: error updating machine: miyadav1210vs-mf7kk-worker-n-8cxgl: reconciler failed to Update machine: task task-607014 has not finished, retrying in 30s seconds
E1210 07:39:26.902082       1 controller.go:281] miyadav1210vs-mf7kk-worker-n-46zht: error updating machine: miyadav1210vs-mf7kk-worker-n-46zht: reconciler failed to Update machine: task task-607015 has not finished, retrying in 30s seconds
[miyadav@miyadav ~]$ 


Additional Info:
Moved to VERIFIED

Comment 4 Milind Yadav 2020-12-10 07:48:16 UTC

4.7.0-0.nightly-2020-12-09-112139 - clusterversion used in above validation

Comment 7 errata-xmlrpc 2021-02-24 15:18:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633