Similar to the GCP issue fixed in bug 1736293, the machine controller is not respecting it's intended requeue delay. For an example, see the 4.3 CI run in [1]. Another example, provided by Nate from 4.2.2, is: $ grep ocp-m45ls-workerocs-us-east-1c-zxpfz must-gather.local.7751879504668897656/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-51162f73217480b3b5a47b392e251e7b7a22f6c10fa4e2050ca1b54558625b07/namespaces/openshift-machine-api/pods/machine-api-controllers-df6979876-tzjgk/machine-controller/machine-controller/logs/current.log ... 2019-11-05T18:45:38.848592929Z I1105 18:45:38.848388 1 controller.go:133] Reconciling Machine "ocp-m45ls-workerocs-us-east-1c-zxpfz" 2019-11-05T18:45:38.848592929Z I1105 18:45:38.848427 1 controller.go:304] Machine "ocp-m45ls-workerocs-us-east-1c-zxpfz" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster 2019-11-05T18:45:38.848706623Z I1105 18:45:38.848445 1 actuator.go:487] ocp-m45ls-workerocs-us-east-1c-zxpfz: Checking if machine exists 2019-11-05T18:45:38.898335622Z I1105 18:45:38.898295 1 actuator.go:495] ocp-m45ls-workerocs-us-east-1c-zxpfz: Instance does not exist 2019-11-05T18:45:38.898335622Z I1105 18:45:38.898317 1 controller.go:253] Reconciling machine object ocp-m45ls-workerocs-us-east-1c-zxpfz triggers idempotent create. 2019-11-05T18:45:38.898335622Z I1105 18:45:38.898325 1 actuator.go:113] ocp-m45ls-workerocs-us-east-1c-zxpfz: creating machine 2019-11-05T18:45:38.898512666Z E1105 18:45:38.898485 1 utils.go:191] NodeRef not found in machine ocp-m45ls-workerocs-us-east-1c-zxpfz 2019-11-05T18:45:38.934408372Z I1105 18:45:38.934370 1 instances.go:44] No stopped instances found for machine ocp-m45ls-workerocs-us-east-1c-zxpfz 2019-11-05T18:45:40.700248288Z I1105 18:45:40.700196 1 actuator.go:199] ocp-m45ls-workerocs-us-east-1c-zxpfz: ProviderID updated at machine spec: aws:///us-east-1c/i-0538e37f1107ce960 2019-11-05T18:45:40.708475471Z I1105 18:45:40.708439 1 actuator.go:585] ocp-m45ls-workerocs-us-east-1c-zxpfz: Updating status 2019-11-05T18:45:40.708590743Z I1105 18:45:40.708545 1 actuator.go:629] ocp-m45ls-workerocs-us-east-1c-zxpfz: finished calculating AWS status 2019-11-05T18:45:40.708661793Z I1105 18:45:40.708628 1 actuator.go:231] ocp-m45ls-workerocs-us-east-1c-zxpfz: machine status has changed, updating 2019-11-05T18:45:40.717126682Z I1105 18:45:40.717060 1 actuator.go:647] ocp-m45ls-workerocs-us-east-1c-zxpfz: Instance state still pending, returning an error to requeue 2019-11-05T18:45:40.717263389Z W1105 18:45:40.717227 1 controller.go:255] Failed to create machine "ocp-m45ls-workerocs-us-east-1c-zxpfz": requeue in: 20s 2019-11-05T18:45:40.71748592Z I1105 18:45:40.717460 1 controller.go:133] Reconciling Machine "ocp-m45ls-workerocs-us-east-1c-zxpfz" 2019-11-05T18:45:40.71758936Z I1105 18:45:40.717532 1 controller.go:304] Machine "ocp-m45ls-workerocs-us-east-1c-zxpfz" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster 2019-11-05T18:45:40.71758936Z I1105 18:45:40.717577 1 actuator.go:487] ocp-m45ls-workerocs-us-east-1c-zxpfz: Checking if machine exists 2019-11-05T18:45:40.801596164Z I1105 18:45:40.801550 1 actuator.go:495] ocp-m45ls-workerocs-us-east-1c-zxpfz: Instance does not exist 2019-11-05T18:45:40.801596164Z I1105 18:45:40.801579 1 controller.go:253] Reconciling machine object ocp-m45ls-workerocs-us-east-1c-zxpfz triggers idempotent create. 2019-11-05T18:45:40.801596164Z I1105 18:45:40.801588 1 actuator.go:113] ocp-m45ls-workerocs-us-east-1c-zxpfz: creating machine 2019-11-05T18:45:40.801784021Z E1105 18:45:40.801759 1 utils.go:191] NodeRef not found in machine ocp-m45ls-workerocs-us-east-1c-zxpfz 2019-11-05T18:45:40.83981453Z I1105 18:45:40.839771 1 instances.go:44] No stopped instances found for machine ocp-m45ls-workerocs-us-east-1c-zxpfz 2019-11-05T18:45:42.237560282Z W1105 18:45:42.237502 1 controller.go:255] Failed to create machine "ocp-m45ls-workerocs-us-east-1c-zxpfz": ocp-m45ls-workerocs-us-east-1c-zxpfz: failed to update machine object with providerID: ocp-m45ls-workerocs-us-east-1c-zxpfz: error updating machine spec ProviderID: Operation cannot be fulfilled on machines.machine.openshift.io "ocp-m45ls-workerocs-us-east-1c-zxpfz": the object has been modified; please apply your changes to the latest version and try again ... Michael looked over AWS in [2], but I'll go through [3] and see if anything looks suspicious. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1772163#c0 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1736293#c2 [3]: https://github.com/openshift/cluster-api-provider-gcp/pull/44/files
So at least the 4.3 CI thing [1] is not the GCP error wrapping thing. Here you can see delayIfRequeueAfterError triggering: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3/242/artifacts/e2e-aws-scaleup-rhel7/must-gather/registry-svc-ci-openshift-org-ci-op-49m8kdi7-stable-sha256-64c63eedf863406fbc6c7515026f909a7221472cf70283708fb7010dd5e6139e/namespaces/openshift-machine-api/pods/machine-api-controllers-7445cdcd69-vlqjh/machine-controller/machine-controller/logs/current.log | grep -1 'Actuator returned requeue-after error' | tail -n3 2019-11-13T16:57:16.780894255Z W1113 16:57:16.780861 1 controller.go:321] Failed to create machine "ci-op-49m8kdi7-6df53-9bmsm-worker-us-east-1b-centos-mwlr9": requeue in: 20s 2019-11-13T16:57:16.780947512Z I1113 16:57:16.780925 1 controller.go:409] Actuator returned requeue-after error: requeue in: 20s 2019-11-13T16:57:16.781080475Z I1113 16:57:16.781054 1 controller.go:164] Reconciling Machine "ci-op-49m8kdi7-6df53-9bmsm-worker-us-east-1b-centos-mwlr9" but still not stopping the subsequent reconciliation. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1772163#c0
Validated on : [miyadav@miyadav ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-04-12-180647 True False 5m22s Cluster version is 4.5.0-0.nightly-2020-04-12-180647 Steps: 1.[miyadav@miyadav ~]$ oc project openshift-machine-api Now using project "openshift-machine-api" on server "https://api.miyadav-1304.qe.devcluster.openshift.com:6443". 2.[miyadav@miyadav ~]$ oc get machines NAME PHASE TYPE REGION ZONE AGE miyadav-1304-plrql-master-0 Running m4.xlarge us-east-2 us-east-2a 35m miyadav-1304-plrql-master-1 Running m4.xlarge us-east-2 us-east-2b 35m miyadav-1304-plrql-master-2 Running m4.xlarge us-east-2 us-east-2c 35m miyadav-1304-plrql-worker-us-east-2a-rrhzj Running m4.large us-east-2 us-east-2a 5m59s miyadav-1304-plrql-worker-us-east-2b-zdft4 Running m4.large us-east-2 us-east-2b 24m miyadav-1304-plrql-worker-us-east-2c-g2kn7 Running m4.large us-east-2 us-east-2c 24m 3.[miyadav@miyadav ~]$ oc delete machine miyadav-1304-plrql-worker-us-east-2b-zdft4 machine.machine.openshift.io "miyadav-1304-plrql-worker-us-east-2b-zdft4" deleted 4. oc logs -f machine-api-controllers-56b68ff6d7-qczp9 -c machine-controller Actual & Expected : I0413 03:44:29.525626 1 reconciler.go:352] miyadav-1304-plrql-worker-us-east-2b-wsnrv: Instance state still pending, returning an error to requeue I0413 03:44:29.525635 1 machine_scope.go:80] miyadav-1304-plrql-worker-us-east-2b-wsnrv: patching machine E0413 03:44:29.538575 1 actuator.go:65] miyadav-1304-plrql-worker-us-east-2b-wsnrv error: requeue in: 20s E0413 03:44:29.538628 1 controller.go:275] miyadav-1304-plrql-worker-us-east-2b-wsnrv: error updating machine: requeue in: 20s I0413 03:44:29.538639 1 controller.go:401] Actuator returned requeue-after error: requeue in: 20s I0413 03:44:29.538724 1 recorder.go:52] controller-runtime/manager/events "msg"="Warning" "message"="requeue in: 20s" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"miyadav-1304-plrql-worker-us-east-2b-wsnrv","uid":"1e209294-8f6e-462b-a52c-849576609dad","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"27546"} "reason"="FailedCreate" I0413 03:44:49.538922 1 controller.go:165] miyadav-1304-plrql-worker-us-east-2b-wsnrv: reconciling Machine I0413 03:44:49.538950 1 actuator.go:97] miyadav-1304-plrql-worker-us-east-2b-wsnrv: actuator checking if machine exists I0413 03:44:49.631983 1 reconciler.go:367] miyadav-1304-plrql-worker-us-east-2b-wsnrv: Found instance by id: i-0ad778699bb16aa1a I0413 03:44:49.632007 1 controller.go:273] miyadav-1304-plrql-worker-us-east-2b-wsnrv: reconciling machine triggers idempotent update I0413 03:44:49.632013 1 actuator.go:112] miyadav-1304-plrql-worker-us-east-2b-wsnrv: actuator updating machine . . . Additional Info : 20s difference is seen in logs honoring requeue time
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days