Bug 1880161 - Actuator Update calls should have fixed retry time
Summary: Actuator Update calls should have fixed retry time
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.7.0
Assignee: Alexander Demicev
QA Contact: Milind Yadav
Depends On:
TreeView+ depends on / blocked
Reported: 2020-09-17 19:54 UTC by Michael Gugino
Modified: 2021-02-24 15:20 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed: 2021-02-24 15:18:37 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift machine-api-operator pull 759 0 None closed Bug 1880161: Set retry timeout for actuator.Update() failures 2021-02-17 21:49:30 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:20:53 UTC

Description Michael Gugino 2020-09-17 19:54:14 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1880110 is one example (though this is not the root of the problem it did cause a delay).

For platforms that are slow to go from provisioning to provisioned (and possibly other phases), exponential backoff can cause several minutes between reconciles after several failed attempts.

We should probably set 30s as a static retry for things where appropriate.

Comment 1 Joel Speed 2020-09-30 16:14:48 UTC
This seems like a reasonable suggestion, I believe we have something already because of eventual consistency issues with creation in AWS? Let's get someone on the team to pick this up next sprint

Comment 3 Milind Yadav 2020-12-10 07:47:29 UTC
Validated on Disconnected IPI vpshere cluster  :

Steps :
1.Create a new machineset 

Actual and expected : Machineset created successfully

2.oc get machines 
Actual and expected :
[miyadav@miyadav ~]$ oc get machines
NAME                                 PHASE     TYPE   REGION   ZONE   AGE
miyadav1210vs-mf7kk-master-0         Running                          70m
miyadav1210vs-mf7kk-master-1         Running                          70m
miyadav1210vs-mf7kk-master-2         Running                          70m
miyadav1210vs-mf7kk-worker-mzbfh     Running                          66m
miyadav1210vs-mf7kk-worker-n-46zht   Running                          3m48s
miyadav1210vs-mf7kk-worker-n-8cxgl   Running                          3m48s
miyadav1210vs-mf7kk-worker-vhg9f     Running                          66m

3. oc get logs 
Actual & Expected :
[miyadav@miyadav ~]$ oc logs  machine-api-controllers-7644ff75b-96nq9 -c machine-controller | grep -i retrying
E1210 06:36:47.953376       1 controller.go:281] miyadav1210vs-mf7kk-master-1: error updating machine: Operation cannot be fulfilled on machines.machine.openshift.io "miyadav1210vs-mf7kk-master-1": the object has been modified; please apply your changes to the latest version and try again, retrying in 30s seconds
E1210 06:42:50.606158       1 controller.go:281] miyadav1210vs-mf7kk-worker-mzbfh: error updating machine: Timeout: request did not complete within requested timeout 34s, retrying in 30s seconds
E1210 07:39:26.277411       1 controller.go:281] miyadav1210vs-mf7kk-worker-n-8cxgl: error updating machine: miyadav1210vs-mf7kk-worker-n-8cxgl: reconciler failed to Update machine: task task-607014 has not finished, retrying in 30s seconds
E1210 07:39:26.902082       1 controller.go:281] miyadav1210vs-mf7kk-worker-n-46zht: error updating machine: miyadav1210vs-mf7kk-worker-n-46zht: reconciler failed to Update machine: task task-607015 has not finished, retrying in 30s seconds
[miyadav@miyadav ~]$ 

Additional Info:

Comment 4 Milind Yadav 2020-12-10 07:48:16 UTC
4.7.0-0.nightly-2020-12-09-112139 - clusterversion used in above validation

Comment 7 errata-xmlrpc 2021-02-24 15:18:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.