+++ This bug was initially created as a clone of Bug #1713010 +++ Description of problem: If a cloud instance backing a machine has stopped, and the machine is reconciled again later for some reason, the stopped instance will be deleted and a new instance will be created in its place. This behavior is undocumented, likely unexpected, and probably something we should remove. --- Additional comment from Michael Gugino on 2019-06-07 11:42:25 UTC --- Merged in master.
https://github.com/openshift/cluster-api-provider-aws/pull/222
How to verify QE: Prior to this patch: 1) Stop a worker instance in AWS console. 2) Wait for node to go unready. 3) After node is unready, in a minute or two you should see a new instance provisioned in AWS console with same tag.Name as instance you stopped. 4) Old instance will be terminated. 1) Stop a worker instance in AWS console. 2) Wait for node to go unready. 3) After node is unready, after a few minutes, verify there are no new instances with same tag.Name in AWS console as the instnace you stopped. 4) Instance will not be terminated and can be successfully restarted.
Verified. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.0-0.nightly-2019-06-27-204847 True False 39m Cluster version is 4.1.0-0.nightly-2019-06-27-204847 Stop a worker instance in AWS console. Node status becomes NotReady. After a few minutes, no new instances were provisioned with same tag.Name in AWS console as stoped . If restarted the stoped instance, the node will become ready. $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-131-22.us-east-2.compute.internal NotReady worker 55m v1.13.4+c9e4f28ff ip-10-0-136-24.us-east-2.compute.internal Ready master 60m v1.13.4+c9e4f28ff ip-10-0-157-50.us-east-2.compute.internal Ready worker 55m v1.13.4+c9e4f28ff ip-10-0-158-200.us-east-2.compute.internal Ready master 60m v1.13.4+c9e4f28ff ip-10-0-168-191.us-east-2.compute.internal Ready master 60m v1.13.4+c9e4f28ff $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-131-22.us-east-2.compute.internal Ready worker 58m v1.13.4+c9e4f28ff ip-10-0-136-24.us-east-2.compute.internal Ready master 64m v1.13.4+c9e4f28ff ip-10-0-157-50.us-east-2.compute.internal Ready worker 58m v1.13.4+c9e4f28ff ip-10-0-158-200.us-east-2.compute.internal Ready master 63m v1.13.4+c9e4f28ff ip-10-0-168-191.us-east-2.compute.internal Ready master 63m v1.13.4+c9e4f28ff
Hi Michael, Any idea when this fix will land in 4.1.1 or 4.1.2?
(In reply to Raz Tamir from comment #7) > Hi Michael, > Any idea when this fix will land in 4.1.1 or 4.1.2? I believe it's now targeted for 4.1.4.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1635
*** Bug 1724968 has been marked as a duplicate of this bug. ***