Description of problem: The AWS machine actuator gets stuck in an error loop if a machine is deleted and can no longer be found. In particular it gets stuck here: https://github.com/openshift/cluster-api-provider-aws/blob/a815e7e7e6f7e2241e3c9de66793cc9154945c1c/pkg/actuators/machine/reconciler.go#L260-L263 Version-Release number of selected component (if applicable): 4.9.0-rc.1 How reproducible: Consistent Steps to Reproduce: 1. Delete a `machine` object with a `.spec.providerID` that doesn't exist 2. 3. Actual results: The controller returns an error on reconcile, which causes the reconcile to be requeued. However, there is no breakout condition (like a timeout or retry counter), so the loop continues perpetually. Expected results: Eventually the machine actuator would recognize that it's a delete operation and would give up and clean up the machine object eventually. Additional info:
*** Bug 2011089 has been marked as a duplicate of this bug. ***
verified clusterversion: 4.10.0-0.nightly-2021-10-16-173656 machine could be deleted if instance was removed. $ oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE zhsunaws1018-58vff-worker-us-east-2c-sfj2g Running m5.large us-east-2 us-east-2c 14h ip-10-0-219-50.us-east-2.compute.internal aws:///us-east-2c/i-01f4559d15b9d2abc shutting-down $ oc delete machine zhsunaws1018-58vff-worker-us-east-2c-sfj2g machine.machine.openshift.io "zhsunaws1018-58vff-worker-us-east-2c-sfj2g" deleted
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056