+++ This bug was initially created as a clone of Bug #1804738 +++ Description of problem: If a cloud provider instance is removed, the Machine Autoscaler determines that the Machine has an unregistered node and, after 15 minutes, will remove the unregistered node. This currently is not done idempotently and, if the Machine takes some time to be deleted (Machine controller slow to remove finalizer), the Autoscaler will call to scale down the replicaset a second or third time. This can result in healthy nodes being removed from the cluster for no reason Version-Release number of selected component (if applicable): 4.4 How reproducible: Easily reproducible Steps to Reproduce: 1. Deploy Openshift cluster with Machine Autoscaler pointing to a MachinSet 2. Ensure there are at least 3 nodes in the MachineSet (you may need to add extra workloads) 3. Terminate an instance from the cloud provider side 4. Wait for 15 minutes and observe several Machines being deleted at once Actual results: Expected results: Additional info:
Verified 4.4.0-0.nightly-2020-02-21-045519 Only the machine associated with the unregistered node was deleted. $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun2-k9bts-m-0 Running n1-standard-4 us-central1 us-central1-a 142m zhsun2-k9bts-m-1 Running n1-standard-4 us-central1 us-central1-b 142m zhsun2-k9bts-m-2 Running n1-standard-4 us-central1 us-central1-c 142m zhsun2-k9bts-w-a-45r2z Failed n1-standard-4 us-central1 us-central1-a 16m zhsun2-k9bts-w-a-dd9n2 Running n1-standard-4 us-central1 us-central1-a 136m zhsun2-k9bts-w-a-jc79x Running n1-standard-4 us-central1 us-central1-a 16m zhsun2-k9bts-w-a-z2c9x Running n1-standard-4 us-central1 us-central1-a 16m zhsun2-k9bts-w-b-hvkfh Running n1-standard-4 us-central1 us-central1-b 136m zhsun2-k9bts-w-c-g7h6v Running n1-standard-4 us-central1 us-central1-c 136m $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun2-k9bts-m-0 Running n1-standard-4 us-central1 us-central1-a 158m zhsun2-k9bts-m-1 Running n1-standard-4 us-central1 us-central1-b 158m zhsun2-k9bts-m-2 Running n1-standard-4 us-central1 us-central1-c 158m zhsun2-k9bts-w-a-dd9n2 Running n1-standard-4 us-central1 us-central1-a 152m zhsun2-k9bts-w-a-jc79x Running n1-standard-4 us-central1 us-central1-a 32m zhsun2-k9bts-w-a-lh8jc Running n1-standard-4 us-central1 us-central1-a 8m26s zhsun2-k9bts-w-a-z2c9x Running n1-standard-4 us-central1 us-central1-a 32m zhsun2-k9bts-w-b-hvkfh Running n1-standard-4 us-central1 us-central1-b 152m zhsun2-k9bts-w-c-g7h6v Running n1-standard-4 us-central1 us-central1-c 152m