Bug 1662813
| Summary: | Local metadata and terraform files are not removed after run destroy cluster command | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | liujia <jiajliu> |
| Component: | Installer | Assignee: | Alex Crawford <crawford> |
| Installer sub component: | openshift-installer | QA Contact: | Johnny Liu <jialiu> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | wking, wking |
| Version: | 4.1.0 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-01-23 18:23:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
liujia
2019-01-02 04:27:31 UTC
> Local files should be removed after destroy cluster.
I disagree. Those assets and logs are immensely useful for debugging. Our docs suggest using a separate asset directory for each cluster to avoid this issue.
Contrast with another destroy scenario("destroy cluster" against a normal cluster):
1. Run "create cluster" to new a cluster successfully.
2. Run "destroy cluster" to destroy above cluster.
# ls -la auto/
total 1280
drwxr-xr-x. 2 root root 4096 Jan 7 01:45 .
drwxr-xr-x. 5 root root 20480 Jan 4 07:31 ..
-rw-r--r--. 1 root root 666288 Jan 7 01:44 .openshift_install.log
-rw-r--r--. 1 root root 611654 Jan 7 01:44 .openshift_install_state.json
There are only .openshift_install.log and .openshift_install_state.json left after run "destroy cluster" successfully(This was regarded as destroy complete).
About the issue in the bug,"destroy cluster" did not clean remaining files after resources were cleaned just like above.
1) Users choose "destroy" which means they do not want all of cluster. If they want to keep them, they will do debug or backup before "destroy". Since all resources have been cleaned during "destroy", those files were useless.
2) If we thought "Those assets and logs are immensely useful for debugging.", then why local files were cleaned in above scenario. What's the deference between these two "destroy"? The difference will mislead users re-run destroy because they think the destroy failed if local files(except .* file) left.
> There are only .openshift_install.log and .openshift_install_state.json left after run "destroy cluster" successfully(This was regarded as destroy complete). When you successfully destroy the cluster, there are no longer any remote cluster resources around. But when creation fails, there might still be remote resources. This is why having the Terraform state around after a failed 'create cluster' is useful, while having the Terraform state around after a successful 'destroy cluster' is not. .openshift_install.log is still useful after a successful 'destroy cluster' to address "but I still see $RESOURCE, why didn't 'destroy cluster' remove it?" issues. I personally don't care one way or the other about whether .openshift_install_state.json survives a successful 'destroy cluster', but see [1], which Abhinav said was intentional (although he didn't make that comment in the public GitHub PR). [1]: https://github.com/openshift/installer/pull/547#issuecomment-435565108 > When you successfully destroy the cluster, there are no longer any remote > cluster resources around. But when creation fails, there might still be > remote resources. This is why having the Terraform state around after a > failed 'create cluster' is useful, while having the Terraform state around > after a successful 'destroy cluster' is not. Thx for your reply. Agree what u said. But my scenario is as following: 1) When creation fails, and the Terraform state left(expectation). Users have to 'destroy cluster' because above creation fails and the cluster is semi-finished. 2) Run 'destroy cluster', the command finished without errors. # ./openshift-install destroy cluster --dir demo INFO Removed role jliu-master-role from instance profile jliu-master-profile INFO deleted profile jliu-master-profile INFO deleted role jliu-master-role INFO Removed role jliu-worker-role from instance profile jliu-worker-profile INFO deleted profile jliu-worker-profile INFO deleted role jliu-worker-role INFO Removed role jliu-bootstrap-role from instance profile jliu-bootstrap-profile INFO deleted profile jliu-bootstrap-profile INFO deleted role jliu-bootstrap-role INFO Emptied bucket name=terraform-20190102034846078300000001 INFO Deleted bucket name=terraform-20190102034846078300000001 Then my concern is that does this can be regarded as a successful 'destroy'? If yes(destroy successfully), installer should act the same as a successful destroy just like 'while having the Terraform state around > after a successful 'destroy cluster' is not'. If not(destroy fail), there is not any explicit hint or error output to note users the destroy fails. https://github.com/openshift/installer/pull/1086 is in flight to empty the state on a successful 'destroy cluster'. > If not(destroy fail), there is not any explicit hint or error output to note users the destroy fails. If the installer exits zero, it's a successful destroy (just like every other command-line command). For some failure modes, the installer will log a fatal error (both to .openshift_install.log and the terminal) when it fails. For "I tried to delete $RESOURCE but couldn't" errors, the installer will continue to attempt those deletions forever. The user will be clued in to the failed deletion by the fact that they need to kill the 'destroy cluster' process in order to get their terminal back. This should be fixed in 0.10.1. |