Bug 1806471 (OCPRHV-137-4.6)
Summary: | OCPRHV-137: Bootstrap node is left in the engine after destroying failed cluster | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jan Zmeskal <jzmeskal> |
Component: | Installer | Assignee: | Douglas Schilling Landgraf <dougsland> |
Installer sub component: | OpenShift on RHV | QA Contact: | Guilherme Santos <gdeolive> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | medium | CC: | dougsland, gshereme, gzaidman, jlee, plarsen, rgolan, wking |
Version: | 4.4 | Keywords: | Improvement |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 15:55:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Zmeskal
2020-02-24 09:48:06 UTC
*** Bug 1818529 has been marked as a duplicate of this bug. *** Also those "errors: %s%!(EXTRA <nil>)" prints should be cleaned up. > Also those "errors: %s%!(EXTRA <nil>)" prints should be cleaned up. This was handled for 4.5 via [1]. But if you want that backported to earlier 4.y, you should create a separate bug series, because the logging fix has nothing to do with the bootstrap cleanup issue this bug is about. [1]: https://github.com/openshift/installer/pull/3445 (In reply to W. Trevor King from comment #3) > > Also those "errors: %s%!(EXTRA <nil>)" prints should be cleaned up. > > This was handled for 4.5 via [1]. But if you want that backported to > earlier 4.y, you should create a separate bug series, because the logging > fix has nothing to do with the bootstrap cleanup issue this bug is about. > > [1]: https://github.com/openshift/installer/pull/3445 Well, I had opened Bug 1818529 but marked it as a dupe. You have a point -- it's probably not a dupe. *** Bug 1836342 has been marked as a duplicate of this bug. *** I don't see any problem with the code: installer/pkg/destroy/ovirt/destroyer.go What I have found: The bootstrap VM is not tagged as the tmp VM too. Working in a possible patch. Here a PR with an example in golang how to detect the vms are tagged or not: https://github.com/oVirt/ovirt-engine-sdk-go/pull/205/commits/0da9b64a27fedc3b41ec0057d42c80223b559dc9 (In reply to Greg Sheremeta from comment #2) > Also those "errors: %s%!(EXTRA <nil>)" prints should be cleaned up. +1 (In reply to Douglas Schilling Landgraf from comment #10) > (In reply to Greg Sheremeta from comment #2) > > Also those "errors: %s%!(EXTRA <nil>)" prints should be cleaned up. > > +1 per Comment 3, already done (In reply to Greg Sheremeta from comment #11) > (In reply to Douglas Schilling Landgraf from comment #10) > > (In reply to Greg Sheremeta from comment #2) > > > Also those "errors: %s%!(EXTRA <nil>)" prints should be cleaned up. > > > > +1 > > per Comment 3, already done Yep, I have just tried that in my local env. I can't see it anymore. moving back to assign to keep in my radar. We will need a second patch. I think the temp VM should have a different tag name than $cluster_id because that means its only going to be removed when destroying the cluster, so it we keep being a waste during the liftime of the cluster. Instead we should destroy it on destroy bootstrap, where it more logically belongs. To do that all we need is to tag it with $cluster_id-bootstrap, and then in the installer code remove all VMs by this tag. The bootstrap VM should also have the same tag, so both get removed. To overcome the problem where you can't define the tag twice and have it updated, what we need is to declare the tag once, but have it assigned with a list of vm id, just like masters: resource "ovirt_tag" "cluster_tag" { name = var.cluster_id vm_ids = [for instance in ovirt_vm.master.* : instance.id] } But instead of ovirt_vm.master.* we should replace it with ovirt_vm.bootstrap.* and conctat ovirt_vm.tmp_import_vm, so: resource "ovirt_tag" "cluster_bootstrap_tag" { name = "${var.cluster_id}-bootstrap" vm_ids = [concat(ovirt_vm.bootstrap.id, tmp_import_vm_id)] } you would need to pass the tmp_import_vm_id to the bootstrap module. (In reply to Roy Golan from comment #14) > I think the temp VM should have a different tag name than $cluster_id > because that means its > only going to be removed when destroying the cluster, so it we keep being a > waste during the liftime > of the cluster. > > Instead we should destroy it on destroy bootstrap, where it more logically > belongs. To do that > all we need is to tag it with $cluster_id-bootstrap, and then in the > installer code remove all VMs > by this tag. The bootstrap VM should also have the same tag, so both get > removed. > > To overcome the problem where you can't define the tag twice and have it > updated, what we need is > to declare the tag once, but have it assigned with a list of vm id, just > like masters: > > resource "ovirt_tag" "cluster_tag" { > name = var.cluster_id > vm_ids = [for instance in ovirt_vm.master.* : instance.id] > } > > But instead of ovirt_vm.master.* we should replace it with > ovirt_vm.bootstrap.* and conctat ovirt_vm.tmp_import_vm, so: > > resource "ovirt_tag" "cluster_bootstrap_tag" { > name = "${var.cluster_id}-bootstrap" > vm_ids = [concat(ovirt_vm.bootstrap.id, tmp_import_vm_id)] > } > > you would need to pass the tmp_import_vm_id to the bootstrap module. replied in the github. due to capacity constraints we will be revisiting this bug in the upcoming sprint due to capacity constraints we will be revisiting this bug in the upcoming sprint *** Bug 1855861 has been marked as a duplicate of this bug. *** Verified on: 4.6.0-0.nightly-2020-07-22-074636 Steps: 1. # openshift-install create cluster --log-level=debug --dir=resources 2. somehow, cancel the installation once it passes the "Creating infrastructure resources" step (CTRL+C, kill the process, turn off the DNS...) # openshift-install destroy cluster --dir=resources 3. check on engine UI if bootstrap vm is there (Compute -> Virtual Machines) Results: bootstrap vm deleted Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |