https://bugzilla.redhat.com/show_bug.cgi?id=1880110 is one example (though this is not the root of the problem it did cause a delay). For platforms that are slow to go from provisioning to provisioned (and possibly other phases), exponential backoff can cause several minutes between reconciles after several failed attempts. We should probably set 30s as a static retry for things where appropriate.
This seems like a reasonable suggestion, I believe we have something already because of eventual consistency issues with creation in AWS? Let's get someone on the team to pick this up next sprint
Validated on Disconnected IPI vpshere cluster : Steps : 1.Create a new machineset Actual and expected : Machineset created successfully 2.oc get machines Actual and expected : [miyadav@miyadav ~]$ oc get machines NAME PHASE TYPE REGION ZONE AGE miyadav1210vs-mf7kk-master-0 Running 70m miyadav1210vs-mf7kk-master-1 Running 70m miyadav1210vs-mf7kk-master-2 Running 70m miyadav1210vs-mf7kk-worker-mzbfh Running 66m miyadav1210vs-mf7kk-worker-n-46zht Running 3m48s miyadav1210vs-mf7kk-worker-n-8cxgl Running 3m48s miyadav1210vs-mf7kk-worker-vhg9f Running 66m 3. oc get logs Actual & Expected : [miyadav@miyadav ~]$ oc logs machine-api-controllers-7644ff75b-96nq9 -c machine-controller | grep -i retrying E1210 06:36:47.953376 1 controller.go:281] miyadav1210vs-mf7kk-master-1: error updating machine: Operation cannot be fulfilled on machines.machine.openshift.io "miyadav1210vs-mf7kk-master-1": the object has been modified; please apply your changes to the latest version and try again, retrying in 30s seconds E1210 06:42:50.606158 1 controller.go:281] miyadav1210vs-mf7kk-worker-mzbfh: error updating machine: Timeout: request did not complete within requested timeout 34s, retrying in 30s seconds E1210 07:39:26.277411 1 controller.go:281] miyadav1210vs-mf7kk-worker-n-8cxgl: error updating machine: miyadav1210vs-mf7kk-worker-n-8cxgl: reconciler failed to Update machine: task task-607014 has not finished, retrying in 30s seconds E1210 07:39:26.902082 1 controller.go:281] miyadav1210vs-mf7kk-worker-n-46zht: error updating machine: miyadav1210vs-mf7kk-worker-n-46zht: reconciler failed to Update machine: task task-607015 has not finished, retrying in 30s seconds [miyadav@miyadav ~]$ Additional Info: Moved to VERIFIED
4.7.0-0.nightly-2020-12-09-112139 - clusterversion used in above validation
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633