+++ This bug was initially created as a clone of Bug #1713061 +++ Description of problem: a machineset was scaled up and then scaled down. the nodes disappeared but the machine objects remain Version-Release number of selected component (if applicable): 4.1.0-rc.4 U3r2LdrhT-A= Additional info: NAME INSTANCE STATE TYPE REGION ZONE AGE cluster-4e40-c7df5-master-0 i-087186746072193f0 running m4.xlarge us-east-2 us-east-2a 24h cluster-4e40-c7df5-master-1 i-0eafe7e9e69f6aaec running m4.xlarge us-east-2 us-east-2b 24h cluster-4e40-c7df5-master-2 i-03c13bba692694646 running m4.xlarge us-east-2 us-east-2c 24h infranode-us-east-2a-t7xwt i-0c6ce0f9d57708d22 running m4.large us-east-2 us-east-2a 173m infranode-us-east-2a-z9nfh i-0c3f83d4c9003f5d0 running m4.large us-east-2 us-east-2a 3h39m nossd-1a-dczcf i-00a207dab2c9e970d running m4.large us-east-2 us-east-2a 3h57m ssd-1a-5l9fh i-090acc4f9598a37f3 running m4.large us-east-2 us-east-2a 121m ssd-1a-7cvrr i-0ccca476b234fc1da running m4.large us-east-2 us-east-2a 69m ssd-1a-q52pv i-0e9e6d01af5ca727a running m4.large us-east-2 us-east-2a 121m ssd-1a-q6hr9 i-08f4a48151276ce90 running m4.large us-east-2 us-east-2a 121m ssd-1a-sfhdm i-03eec775cb1ce8f3c running m4.large us-east-2 us-east-2a 121m ssd-1b-rtxxg i-08d06740a65e88be6 running m4.large us-east-2 us-east-2b 3h57m The machines that are 121m old in the `ssd-1a` set are the "orphans" without corresponding nodes. Each of them has a deletiontimestamp. --- Additional comment from Michael Gugino on 2019-05-22 20:37:20 UTC --- I have investigated this. We're failing to retrieve the node from the nodeRef specified on the machine-object. This is either because the machine-controller deleted the node already and failed to update that annotation for some reason, or an admin removed the node manually before attempting to scale. Either way, this is definitely a bug and is not easily correctable by the end-user. I will get a patch out for master and pick to 4.1. --- Additional comment from Michael Gugino on 2019-05-22 21:05:59 UTC --- Added a reference to 4.1 known-issue tracker: https://github.com/openshift/openshift-docs/issues/12487
Patch created against openshift/cluster-api 4.2 branch: https://github.com/openshift/cluster-api/pull/43
Verified. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-06-24-160709 True False 16m Cluster version is 4.2.0-0.nightly-2019-06-24-160709 $ oc delete node ip-10-0-129-227.us-west-2.compute.internal node "ip-10-0-129-227.us-west-2.compute.internal" deleted $ oc get machine zhsun2-2rhzr-worker-us-west-2a-4czd4 -o yaml status: addresses: - address: 10.0.129.227 type: InternalIP - address: "" type: ExternalDNS - address: ip-10-0-129-227.us-west-2.compute.internal type: InternalDNS lastUpdated: "2019-06-25T06:00:15Z" nodeRef: kind: Node name: ip-10-0-129-227.us-west-2.compute.internal uid: 61d697a9-970e-11e9-9bdd-06fb8941e6f0 providerStatus: apiVersion: awsproviderconfig.openshift.io/v1beta1 conditions: - lastProbeTime: "2019-06-25T05:54:34Z" lastTransitionTime: "2019-06-25T05:54:34Z" message: machine successfully created reason: MachineCreationSucceeded status: "True" type: MachineCreation instanceId: i-02479ec98a9e04896 instanceState: running kind: AWSMachineProviderStatus $ oc delete machine zhsun2-2rhzr-worker-us-west-2a-4czd4 machine.machine.openshift.io "zhsun2-2rhzr-worker-us-west-2a-4czd4" deleted
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922