1718265 – AWS provider removes stopped instances when reconciling machines

Bug 1718265 - AWS provider removes stopped instances when reconciling machines

Summary: AWS provider removes stopped instances when reconciling machines

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	4.1.z
Assignee:	Michael Gugino
QA Contact:	Jianwei Hou
Docs Contact:
URL:
Whiteboard:	4.1.4
Duplicates (1):	1724968 (view as bug list)
Depends On:	1713010
Blocks:
TreeView+	depends on / blocked

Reported:	2019-06-07 11:43 UTC by Michael Gugino
Modified:	2019-10-17 08:39 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1713010
Environment:
Last Closed:	2019-07-04 09:01:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:1635	0	None	None	None	2019-07-04 09:01:33 UTC

Description Michael Gugino 2019-06-07 11:43:18 UTC

+++ This bug was initially created as a clone of Bug #1713010 +++

Description of problem:

If a cloud instance backing a machine has stopped, and the machine is reconciled again later for some reason, the stopped instance will be deleted and a new instance will be created in its place.  This behavior is undocumented, likely unexpected, and probably something we should remove.

--- Additional comment from Michael Gugino on 2019-06-07 11:42:25 UTC ---

Merged in master.

Comment 1 Alberto 2019-06-12 07:54:53 UTC

https://github.com/openshift/cluster-api-provider-aws/pull/222

Comment 2 Michael Gugino 2019-06-17 13:53:00 UTC

How to verify QE:

Prior to this patch:
1) Stop a worker instance in AWS console.
2) Wait for node to go unready.
3) After node is unready, in a minute or two you should see a new instance provisioned in AWS console with same tag.Name as instance you stopped.
4) Old instance will be terminated.

1) Stop a worker instance in AWS console.
2) Wait for node to go unready.
3) After node is unready, after a few minutes, verify there are no new instances with same tag.Name in AWS console as the instnace you stopped.
4) Instance will not be terminated and can be successfully restarted.

Comment 5 sunzhaohua 2019-06-28 03:33:28 UTC

Verified.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-06-27-204847   True        False         39m     Cluster version is 4.1.0-0.nightly-2019-06-27-204847

Stop a worker instance in AWS console. Node status becomes NotReady. After a few minutes, no new instances were provisioned with same tag.Name in AWS console as stoped .
If restarted the stoped instance, the node will become ready.

$ oc get node
NAME                                         STATUS     ROLES    AGE   VERSION
ip-10-0-131-22.us-east-2.compute.internal    NotReady   worker   55m   v1.13.4+c9e4f28ff
ip-10-0-136-24.us-east-2.compute.internal    Ready      master   60m   v1.13.4+c9e4f28ff
ip-10-0-157-50.us-east-2.compute.internal    Ready      worker   55m   v1.13.4+c9e4f28ff
ip-10-0-158-200.us-east-2.compute.internal   Ready      master   60m   v1.13.4+c9e4f28ff
ip-10-0-168-191.us-east-2.compute.internal   Ready      master   60m   v1.13.4+c9e4f28ff

$ oc get node
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-131-22.us-east-2.compute.internal    Ready    worker   58m   v1.13.4+c9e4f28ff
ip-10-0-136-24.us-east-2.compute.internal    Ready    master   64m   v1.13.4+c9e4f28ff
ip-10-0-157-50.us-east-2.compute.internal    Ready    worker   58m   v1.13.4+c9e4f28ff
ip-10-0-158-200.us-east-2.compute.internal   Ready    master   63m   v1.13.4+c9e4f28ff
ip-10-0-168-191.us-east-2.compute.internal   Ready    master   63m   v1.13.4+c9e4f28ff

Comment 7 Raz Tamir 2019-06-28 23:05:18 UTC

Hi Michael,
Any idea when this fix will land in 4.1.1 or 4.1.2?

Comment 8 Michael Gugino 2019-07-01 12:10:54 UTC

(In reply to Raz Tamir from comment #7)
> Hi Michael,
> Any idea when this fix will land in 4.1.1 or 4.1.2?

I believe it's now targeted for 4.1.4.

Comment 11 errata-xmlrpc 2019-07-04 09:01:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1635

Comment 12 Alberto 2019-07-26 14:19:05 UTC

*** Bug 1724968 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.