Description of problem: If a Machine is deleted while in Provisioning state, the openstack resources are not cleaned up and the Machine remains in Deleting state indefinitely. Version-Release number of selected component (if applicable): 4.8.2 How reproducible: 100% Reproducible Steps to Reproduce: 1. Delete a Machine owned by a MachineSet such that a new Machine is created 2. When new Machine reachies Provisioning state, `oc delete machine` on this machine. Actual results: * machine-api-controller pod, machine-controller container, begins to reconcile the delete, and errors out trying to update Machine with the resourceID (openstack server ID). * The openstack Server and Port are still created for this machine, but machine-controller complains that it cannot be found since the openstack-resourceId annotation is missing on the Machine. * The Machine is stuck in Deleting indefinitely, and the openstack resources remain for manual cleanup. Expected results: Machine-controller retries until it is able to update the Machine with the openstack-resourceId, allowing the Delete reconcile succeed. Additional info: In the customer environment, the Machine was deleted because of a MachineHealthCheck. The reproducer involves deleting the Machine to mimic this behavior. Below are the controller logs that show this sequence. ``` I0814 10:12:40.032222 1 machine_webhook.go:490] Validate webhook called for Machine: np-rtp-01-bl4dg-ext-worker-1c-l94ks I0814 10:12:40.042837 1 machinehealthcheck_controller.go:470] Reconciling openshift-machine-api/workers-notready-unknown/np-rtp-01-bl4dg-ext-worker-1c-l94ks/: is likely to go unhealthy in 10m0s I0814 10:14:42.025450 1 controller.go:174] np-rtp-01-bl4dg-ext-worker-1c-l94ks: reconciling Machine I0814 10:14:42.235133 1 controller.go:357] np-rtp-01-bl4dg-ext-worker-1c-l94ks: setting phase to Provisioning and requeuing I0814 10:14:42.245750 1 machinehealthcheck_controller.go:470] Reconciling openshift-machine-api/workers-notready-unknown/np-rtp-01-bl4dg-ext-worker-1c-l94ks/: is likely to go unhealthy in 10m0.754252052s I0814 10:23:55.706132 1 controller.go:174] np-rtp-01-bl4dg-ext-worker-1c-l94ks: reconciling Machine I0814 10:23:55.946367 1 controller.go:364] np-rtp-01-bl4dg-ext-worker-1c-l94ks: reconciling machine triggers idempotent create I0814 10:24:42.043465 1 machinehealthcheck_controller.go:652] openshift-machine-api/workers-notready-unknown/np-rtp-01-bl4dg-ext-worker-1c-l94ks/: deleting I0814 10:24:42.050795 1 nodelink_controller.go:306] No-op: Machine "np-rtp-01-bl4dg-ext-worker-1c-l94ks" has a deletion timestamp I0814 10:25:44.369860 1 actuator.go:595] Found the primary address for the machine np-rtp-01-bl4dg-ext-worker-1c-l94ks: 64.101.120.198 W0814 10:25:44.378811 1 controller.go:366] np-rtp-01-bl4dg-ext-worker-1c-l94ks: failed to create machine: Operation cannot be fulfilled on machines.machine.openshift.io "np-rtp-01-bl4dg-ext-worker-1c-l94ks": the object has been modified; please apply your changes to the latest version and try again E0814 10:25:44.379006 1 controller.go:302] controller-runtime/manager/controller/machine_controller "msg"="Reconciler error" "error"="Operation cannot be fulfilled on machines.machine.openshift.io \"np-rtp-01-bl4dg-ext-worker-1c-l94ks\": the object has been modified; please apply your changes to the latest version and try again" "name"="np-rtp-01-bl4dg-ext-worker-1c-l94ks" "namespace"="openshift-machine-api" I0814 10:26:35.892133 1 controller.go:174] np-rtp-01-bl4dg-ext-worker-1c-l94ks: reconciling Machine I0814 10:26:35.892144 1 controller.go:482] np-rtp-01-bl4dg-ext-worker-1c-l94ks: going into phase "Deleting" I0814 10:26:35.900778 1 controller.go:218] np-rtp-01-bl4dg-ext-worker-1c-l94ks: reconciling machine triggers delete W0814 10:26:36.608880 1 machineservice.go:953] Couldn't delete all instance ports: Resource not found E0814 10:26:36.628757 1 actuator.go:574] Machine error np-rtp-01-bl4dg-ext-worker-1c-l94ks: error deleting Openstack instance: Resource not found E0814 10:26:36.628797 1 controller.go:239] np-rtp-01-bl4dg-ext-worker-1c-l94ks: failed to delete machine: error deleting Openstack instance: Resource not found E0814 10:26:36.628867 1 controller.go:302] controller-runtime/manager/controller/machine_controller "msg"="Reconciler error" "error"="error deleting Openstack instance: Resource not found" "name"="np-rtp-01-bl4dg-ext-worker-1c-l94ks" "namespace"="openshift-machine-api" ```
@andrew when you say. "Steps to Reproduce: 1. Delete a Machine owned by a MachineSet such that a new Machine is created 2. When new Machine reachies Provisioning state, `oc delete machine` on this machine." When you say: "Delete machine own by machinset..." do you mean with OC command (oc delete machine) ... or do you mean with openstack command ("openstack server delete....") Thanks.
I mean: "Delete the machine API resource with oc command i.e. `oc delete machine`"
For what it's worth, the first step os only means of creating a new machine to get it into Provisioning state.