Bug 2021041
Summary: | [vsphere] Not found TagCategory when destroying ipi cluster | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | jima |
Component: | Installer | Assignee: | Rafael Fonseca <rdossant> |
Installer sub component: | openshift-installer | QA Contact: | jima |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | beth.white, rdossant |
Version: | 4.10 | ||
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: A bug in the vmware/govmomi library
Consequence: When destroying multiple clusters in parallel, one of the destroys can fail because of a tag belonging to another cluster was deleted in the meantime, resulting in a 404 error and thus aborting the destroy.
Fix: Ignore not found tags and continue with the destroy process.
Result: "openshift-installer destroy cluster" finishes without error.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 10:39:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2062748 |
Description
jima
2021-11-08 08:22:46 UTC
As far as I can tell, the destroyer is doing the correct thing. It appears that the call to GetCategory is using the correct category name but returning the incorrect ID. One note here is that the vSphere destroyer is incorrectly exiting when it encounters an error. Instead, the destroyer should continue attempting to destroy resources until the user cancels the destroy. The ID mismatch in this case is misleading and it's not the cause of the problem. When trying to get a category by name, the API does a GET request for all the existing categories [1] and one of those is resulting in a 404. But the way that error is reported [2] makes it appear as if the `openshift-*` category was the one to fail. [1] https://github.com/vmware/govmomi/blob/master/vapi/tags/categories.go#L151-L160 [2] https://github.com/openshift/installer/blob/master/pkg/destroy/vsphere/vsphere.go#L277-L279 Issue happens frequently on QE side when more than two clusters are destroyed at the same time. Verified on 4.11.0-0.ci-2022-02-28-224450 and passed, move bug to VERIFIED. 1. Install two clusters (A, B) 2. Destroy cluster A (by running command ./openshift-install destroy cluster --dir ...) 3. When installer find all objects attached with related tag on cluster A, starting to destroy cluster B 4. when installer tried to delete tag on cluster A, the process to destroy cluster B goes to "Find attached objects on tag", after tag on cluster A is delete, monitor that destroy process on cluster B find attached objects on tag and continue to delete resources, and didn't throw "404 Not Found" error any more Destroying log on cluster A: 03-01 13:44:40.130 level=debug msg=OpenShift Installer 4.11.0-0.ci-2022-02-28-224450 03-01 13:44:40.130 level=debug msg=Built from commit 5171f6b9ad5def883839990054f5068278232dd5 03-01 13:44:41.090 level=debug msg=Find attached objects on tag 03-01 13:46:02.645 level=debug msg=No VirtualMachines found 03-01 13:46:02.645 level=debug msg=No managed Folder found 03-01 13:46:02.645 level=debug msg=Delete tag 03-01 13:47:10.442 level=info msg=Destroyed Tag=jima0301bug01-559fk 03-01 13:47:10.442 level=debug msg=Delete tag category 03-01 13:49:31.937 level=info msg=Destroyed TagCategory=openshift-jima0301bug01-559fk 03-01 13:49:31.937 level=debug msg=Purging asset "Metadata" from disk 03-01 13:49:31.937 level=debug msg=Purging asset "Master Ignition Customization Check" from disk 03-01 13:49:31.937 level=debug msg=Purging asset "Worker Ignition Customization Check" from disk 03-01 13:49:31.937 level=debug msg=Purging asset "Terraform Variables" from disk 03-01 13:49:31.937 level=debug msg=Purging asset "Kubeconfig Admin Client" from disk 03-01 13:49:31.937 level=debug msg=Purging asset "Kubeadmin Password" from disk 03-01 13:49:31.937 level=debug msg=Purging asset "Certificate (journal-gatewayd)" from disk 03-01 13:49:31.937 level=debug msg=Purging asset "Cluster" from disk 03-01 13:49:31.937 level=info msg=Time elapsed: 4m39s Destroying log on cluster B: 03-01 13:46:21.866 level=debug msg=OpenShift Installer 4.11.0-0.ci-2022-02-28-224450 03-01 13:46:21.866 level=debug msg=Built from commit 5171f6b9ad5def883839990054f5068278232dd5 03-01 13:46:22.790 level=debug msg=Find attached objects on tag 03-01 13:47:44.222 level=debug msg=Find VirtualMachine objects 03-01 13:47:44.222 level=debug msg=Delete VirtualMachines 03-01 13:47:44.222 level=info msg=Destroyed VirtualMachine=jima0301bug02-xsnj6-rhcos 03-01 13:47:44.222 level=debug msg=Powered off VirtualMachine=jima0301bug02-xsnj6-master-0 03-01 13:47:44.222 level=info msg=Destroyed VirtualMachine=jima0301bug02-xsnj6-master-0 03-01 13:47:44.222 level=debug msg=Powered off VirtualMachine=jima0301bug02-xsnj6-master-2 03-01 13:47:44.222 level=info msg=Destroyed VirtualMachine=jima0301bug02-xsnj6-master-2 03-01 13:47:44.222 level=debug msg=Powered off VirtualMachine=jima0301bug02-xsnj6-master-1 03-01 13:47:44.222 level=info msg=Destroyed VirtualMachine=jima0301bug02-xsnj6-master-1 03-01 13:47:44.781 level=debug msg=Powered off VirtualMachine=jima0301bug02-xsnj6-worker-rf4m9 03-01 13:47:45.340 level=info msg=Destroyed VirtualMachine=jima0301bug02-xsnj6-worker-rf4m9 03-01 13:47:46.700 level=debug msg=Powered off VirtualMachine=jima0301bug02-xsnj6-worker-wknnv 03-01 13:47:47.270 level=info msg=Destroyed VirtualMachine=jima0301bug02-xsnj6-worker-wknnv 03-01 13:47:47.270 level=debug msg=Find Folder objects 03-01 13:47:47.270 level=debug msg=Delete Folder 03-01 13:47:47.836 level=info msg=Destroyed Folder=jima0301bug02-xsnj6 03-01 13:47:50.345 level=info msg=Destroyed StoragePolicy=openshift-storage-policy-jima0301bug02-xsnj6 03-01 13:47:50.345 level=debug msg=Delete tag 03-01 13:49:11.743 level=info msg=Destroyed Tag=jima0301bug02-xsnj6 03-01 13:49:11.743 level=debug msg=Delete tag category 03-01 13:51:18.147 level=info msg=Destroyed TagCategory=openshift-jima0301bug02-xsnj6 03-01 13:51:18.147 level=debug msg=Purging asset "Metadata" from disk 03-01 13:51:18.147 level=debug msg=Purging asset "Master Ignition Customization Check" from disk 03-01 13:51:18.147 level=debug msg=Purging asset "Worker Ignition Customization Check" from disk 03-01 13:51:18.147 level=debug msg=Purging asset "Terraform Variables" from disk 03-01 13:51:18.147 level=debug msg=Purging asset "Kubeconfig Admin Client" from disk 03-01 13:51:18.147 level=debug msg=Purging asset "Kubeadmin Password" from disk 03-01 13:51:18.147 level=debug msg=Purging asset "Certificate (journal-gatewayd)" from disk 03-01 13:51:18.147 level=debug msg=Purging asset "Cluster" from disk 03-01 13:51:18.147 level=info msg=Time elapsed: 4m54s Is it planning to be backported to previous release? IPI cluster destroy job fails frequently on QE CI due to this issue, and left over resources on VMC. (In reply to jima from comment #7) > Is it planning to be backported to previous release? IPI cluster destroy job > fails frequently on QE CI due to this issue, and left over resources on VMC. Yes. Thanks for the reminder. This should be backported. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |