Bug 1718265 - AWS provider removes stopped instances when reconciling machines
Summary: AWS provider removes stopped instances when reconciling machines
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.z
Assignee: Michael Gugino
QA Contact: Jianwei Hou
URL:
Whiteboard: 4.1.4
: 1724968 (view as bug list)
Depends On: 1713010
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-07 11:43 UTC by Michael Gugino
Modified: 2019-10-17 08:39 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1713010
Environment:
Last Closed: 2019-07-04 09:01:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:1635 0 None None None 2019-07-04 09:01:33 UTC

Description Michael Gugino 2019-06-07 11:43:18 UTC
+++ This bug was initially created as a clone of Bug #1713010 +++

Description of problem:

If a cloud instance backing a machine has stopped, and the machine is reconciled again later for some reason, the stopped instance will be deleted and a new instance will be created in its place.  This behavior is undocumented, likely unexpected, and probably something we should remove.

--- Additional comment from Michael Gugino on 2019-06-07 11:42:25 UTC ---

Merged in master.

Comment 2 Michael Gugino 2019-06-17 13:53:00 UTC
How to verify QE:

Prior to this patch:
1) Stop a worker instance in AWS console.
2) Wait for node to go unready.
3) After node is unready, in a minute or two you should see a new instance provisioned in AWS console with same tag.Name as instance you stopped.
4) Old instance will be terminated.

1) Stop a worker instance in AWS console.
2) Wait for node to go unready.
3) After node is unready, after a few minutes, verify there are no new instances with same tag.Name in AWS console as the instnace you stopped.
4) Instance will not be terminated and can be successfully restarted.

Comment 5 sunzhaohua 2019-06-28 03:33:28 UTC
Verified.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-06-27-204847   True        False         39m     Cluster version is 4.1.0-0.nightly-2019-06-27-204847

Stop a worker instance in AWS console. Node status becomes NotReady. After a few minutes, no new instances were provisioned with same tag.Name in AWS console as stoped .
If restarted the stoped instance, the node will become ready.

$ oc get node
NAME                                         STATUS     ROLES    AGE   VERSION
ip-10-0-131-22.us-east-2.compute.internal    NotReady   worker   55m   v1.13.4+c9e4f28ff
ip-10-0-136-24.us-east-2.compute.internal    Ready      master   60m   v1.13.4+c9e4f28ff
ip-10-0-157-50.us-east-2.compute.internal    Ready      worker   55m   v1.13.4+c9e4f28ff
ip-10-0-158-200.us-east-2.compute.internal   Ready      master   60m   v1.13.4+c9e4f28ff
ip-10-0-168-191.us-east-2.compute.internal   Ready      master   60m   v1.13.4+c9e4f28ff

$ oc get node
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-131-22.us-east-2.compute.internal    Ready    worker   58m   v1.13.4+c9e4f28ff
ip-10-0-136-24.us-east-2.compute.internal    Ready    master   64m   v1.13.4+c9e4f28ff
ip-10-0-157-50.us-east-2.compute.internal    Ready    worker   58m   v1.13.4+c9e4f28ff
ip-10-0-158-200.us-east-2.compute.internal   Ready    master   63m   v1.13.4+c9e4f28ff
ip-10-0-168-191.us-east-2.compute.internal   Ready    master   63m   v1.13.4+c9e4f28ff

Comment 7 Raz Tamir 2019-06-28 23:05:18 UTC
Hi Michael,
Any idea when this fix will land in 4.1.1 or 4.1.2?

Comment 8 Michael Gugino 2019-07-01 12:10:54 UTC
(In reply to Raz Tamir from comment #7)
> Hi Michael,
> Any idea when this fix will land in 4.1.1 or 4.1.2?

I believe it's now targeted for 4.1.4.

Comment 11 errata-xmlrpc 2019-07-04 09:01:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1635

Comment 12 Alberto 2019-07-26 14:19:05 UTC
*** Bug 1724968 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.