Bug 1653630 - [cloud] delete machine couldn't trigger requeue error
Summary: [cloud] delete machine couldn't trigger requeue error
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.1.0
Assignee: Jan Chaloupka
QA Contact: sunzhaohua
Depends On:
TreeView+ depends on / blocked
Reported: 2018-11-27 10:20 UTC by sunzhaohua
Modified: 2019-03-12 14:24 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-01-15 13:30:55 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description sunzhaohua 2018-11-27 10:20:58 UTC
Description of problem:
delete machine couldn't trigger requeue error

Version-Release number of selected component (if applicable):
$ ./openshift-install version
./openshift-install v0.4.0-8-gcc10f8027c30a37a8c3d78b793587126208c9d68-dirty
Terraform v0.11.8

$ ./terraform version
Terraform v0.11.8

$ oc version
oc v4.0.0-alpha.0+9750828-637
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

How reproducible:

Steps to Reproduce:
1. $ oc delete machine qe-zhsun-1-worker-us-east-2a-2
2. $ oc logs -f clusterapi-manager-controllers-5b4996fb88-bzst7 -c machine-controller

Actual results:
$ oc delete machine qe-zhsun-1-worker-us-east-2a-2
machine.cluster.k8s.io "qe-zhsun-1-worker-us-east-2a-2" deleted

$ oc logs -f clusterapi-manager-controllers-5b4996fb88-bzst7 -c machine-controller
I1126 09:26:22.779527       1 controller.go:113] Running reconcile Machine for qe-zhsun-1-worker-us-east-2a-2
I1126 09:26:22.779663       1 controller.go:136] reconciling machine object qe-zhsun-1-worker-us-east-2a-2 triggers delete.
I1126 09:26:22.779759       1 actuator.go:454] deleting machine
I1126 09:26:22.915894       1 utils.go:165] Cleaning up extraneous instance for machine: i-037313b896d114c3b, state: running, launchTime: 2018-11-26 08:41:23 +0000 UTC
I1126 09:26:22.915926       1 utils.go:169] Terminating i-037313b896d114c3b instance
I1126 09:26:23.015070       1 controller.go:143] machine object qe-zhsun-1-worker-us-east-2a-2 deletion successful, removing finalizer.

Expected results:
delete machine trigger requeue error

Additional info:

Comment 1 Jan Chaloupka 2018-12-07 13:30:50 UTC
Hi sunzhaohua,

are you saying the machine object is deleted before the aws instance is? Why would you expect the delete machine trigger requeue error? I don't see any error message in the logs saying the aws instance destruction failed.

Though, we have https://github.com/kubernetes-sigs/cluster-api/pull/598 that will re-queue a machine object in case the operation fails (for any reason).

Comment 2 sunzhaohua 2018-12-10 09:18:32 UTC
Hi Jan Chaloupka,

Compared with creating machine, while instance status is pending, machine status is also pending and return requeue error. So for deleting machine, while the instance status is shutting-down, I think we should set machine status is shutting-down and return requeue error until instance status changes to terminated. If I'm wrong, please correct me.

Comment 3 Jan Chaloupka 2019-01-15 13:30:55 UTC
> For deleting machine, while the instance status is shutting-down, I think we should set machine status is shutting-down and return requeue error until instance status changes to terminated.

Every aws instance that is in shutting-down state will get eventually deleted. Plus, based on the AWS documentation [1], instances that are not running are not charged.

Closing the bug as expected until we identify non-trivial long running processes that needs to be re-queued on deletion.

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/terminating-instances.html

Note You need to log in before you can comment on or make changes to this bug.