https://bugzilla.redhat.com/show_bug.cgi?id=1880110 is one example (though this is not the root of the problem it did cause a delay).
For platforms that are slow to go from provisioning to provisioned (and possibly other phases), exponential backoff can cause several minutes between reconciles after several failed attempts.
We should probably set 30s as a static retry for things where appropriate.
This seems like a reasonable suggestion, I believe we have something already because of eventual consistency issues with creation in AWS? Let's get someone on the team to pick this up next sprint
Validated on Disconnected IPI vpshere cluster :
1.Create a new machineset
Actual and expected : Machineset created successfully
2.oc get machines
Actual and expected :
[miyadav@miyadav ~]$ oc get machines
NAME PHASE TYPE REGION ZONE AGE
miyadav1210vs-mf7kk-master-0 Running 70m
miyadav1210vs-mf7kk-master-1 Running 70m
miyadav1210vs-mf7kk-master-2 Running 70m
miyadav1210vs-mf7kk-worker-mzbfh Running 66m
miyadav1210vs-mf7kk-worker-n-46zht Running 3m48s
miyadav1210vs-mf7kk-worker-n-8cxgl Running 3m48s
miyadav1210vs-mf7kk-worker-vhg9f Running 66m
3. oc get logs
Actual & Expected :
[miyadav@miyadav ~]$ oc logs machine-api-controllers-7644ff75b-96nq9 -c machine-controller | grep -i retrying
E1210 06:36:47.953376 1 controller.go:281] miyadav1210vs-mf7kk-master-1: error updating machine: Operation cannot be fulfilled on machines.machine.openshift.io "miyadav1210vs-mf7kk-master-1": the object has been modified; please apply your changes to the latest version and try again, retrying in 30s seconds
E1210 06:42:50.606158 1 controller.go:281] miyadav1210vs-mf7kk-worker-mzbfh: error updating machine: Timeout: request did not complete within requested timeout 34s, retrying in 30s seconds
E1210 07:39:26.277411 1 controller.go:281] miyadav1210vs-mf7kk-worker-n-8cxgl: error updating machine: miyadav1210vs-mf7kk-worker-n-8cxgl: reconciler failed to Update machine: task task-607014 has not finished, retrying in 30s seconds
E1210 07:39:26.902082 1 controller.go:281] miyadav1210vs-mf7kk-worker-n-46zht: error updating machine: miyadav1210vs-mf7kk-worker-n-46zht: reconciler failed to Update machine: task task-607015 has not finished, retrying in 30s seconds
Moved to VERIFIED
4.7.0-0.nightly-2020-12-09-112139 - clusterversion used in above validation
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.