Bug 2020480 - [vsphere] installation failure sometimes with error: the object 'vim.Folder:group-v594204' has already been deleted or has not been completely created
Summary: [vsphere] installation failure sometimes with error: the object 'vim.Folder:g...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Robert Bost
QA Contact: jima
URL:
Whiteboard:
Depends On:
Blocks: 2038925 2038926
TreeView+ depends on / blocked
 
Reported: 2021-11-05 02:56 UTC by jima
Modified: 2022-10-18 15:23 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-18 15:23:42 UTC
Target Upstream Version:


Attachments (Terms of Use)
openshift_install.log (1.69 MB, application/x-tar)
2021-11-05 02:56 UTC, jima
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5495 0 None open Bug 2020480: [vsphere] installation failure sometimes with error: the object 'vim.Folder:group-v******' has already bee... 2021-12-17 02:26:49 UTC

Description jima 2021-11-05 02:56:28 UTC
Created attachment 1839999 [details]
openshift_install.log

Version: issue happens occasionally on different release(4.5 ~ 4.10)

Platform: vsphere (ipi and upi)

What happened?
On QE CI and manual install, ipi installation fails sometimes with below error when importing ova template or deleting bootstrap server.

11-03 22:22:31.654  level=debug msg=vsphereprivate_import_ova.import: Still creating... [10s elapsed]
11-03 22:22:41.686  level=debug msg=vsphereprivate_import_ova.import: Still creating... [20s elapsed]
11-03 22:22:51.690  level=debug msg=vsphereprivate_import_ova.import: Still creating... [30s elapsed]
11-03 22:23:01.683  level=debug msg=vsphereprivate_import_ova.import: Still creating... [40s elapsed]
11-03 22:23:11.678  level=debug msg=vsphereprivate_import_ova.import: Still creating... [50s elapsed]
11-03 22:23:21.700  level=debug msg=vsphereprivate_import_ova.import: Still creating... [1m0s elapsed]
11-03 22:23:31.670  level=debug msg=vsphereprivate_import_ova.import: Still creating... [1m10s elapsed]
11-03 22:23:32.233  level=debug msg=vsphereprivate_import_ova.import: Creation complete after 1m11s [id=vm-594271]
11-03 22:23:32.233  level=debug msg=data.vsphere_virtual_machine.template: Refreshing state...
11-03 22:23:54.402  level=error
11-03 22:23:54.402  level=error msg=Error: error fetching virtual machine: ServerFaultCode: The object 'vim.Folder:group-v594204' has already been deleted or has not been completely created
11-03 22:23:54.402  level=error
11-03 22:23:54.402  level=error msg=  on ../../../../../../../../tmp/openshift-install-456326068/main.tf line 39, in data "vsphere_virtual_machine" "template":
11-03 22:23:54.402  level=error msg=  39: data "vsphere_virtual_machine" "template" {
11-03 22:23:54.402  level=error
11-03 22:23:54.402  level=error
11-03 22:23:54.402  level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change

Also searched similar issues in CI, we can see that it occurred on both upi and ipi:
https://search.ci.openshift.org/?search=+has+already+been+deleted+or+has+not+been+completely+created&maxAge=336h&context=0&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Attached `.openshift_install.log`

What did you expect to happen?

infrastructure should be created successfully, and installation is completed.

How to reproduce it (as minimally and precisely as possible)?
Install ipi/upi cluster on vsphere platform

Comment 4 Matthew Staebler 2022-01-07 20:56:49 UTC
I assume, given the conversation in the PR, that this BZ is not actually read for QA. I am going to move this back to ASSIGNED.

Comment 7 jima 2022-01-12 03:01:50 UTC
Based on comment 4 and issue also hit on QE UPI installation, move bug back to ASSIGNED.

Comment 8 Matthew Staebler 2022-01-12 16:52:28 UTC
(In reply to jima from comment #7)
> Based on comment 4 and issue also hit on QE UPI installation, move bug back
> to ASSIGNED.

How is this BZ related to failures seen in a UPI installation? This BZ is errors coming from the terraform, which is specific to IPI.

Comment 9 jima 2022-01-13 02:40:08 UTC
since the similar error is also reported in UPI installation, for example, below log is from recent failure on 4.10 nightly build.

01-11 12:42:19.948  + echo 'INFO: terraform version'
01-11 12:42:19.948  INFO: terraform version
01-11 12:42:19.948  + terraform-0.12 version
01-11 12:42:19.948  Terraform v0.12.24
01-11 12:42:20.884  + provider.aws v3.37.0
01-11 12:42:20.884  + provider.http v2.1.0
01-11 12:42:20.884  + provider.ignition (unversioned)
01-11 12:42:20.884  + provider.null v3.1.0
01-11 12:42:20.884  + provider.vsphere v1.26.0
01-11 12:42:20.884  
01-11 12:42:20.884  Your version of Terraform is out of date! The latest version
01-11 12:42:20.884  is 1.1.3. You can update by downloading from https://www.terraform.io/downloads.html
01-11 12:42:20.884  + terraform-0.12 apply -var-file=/home/jenkins/ws/workspace/ocp-common/Flexy-install/flexy/workdir/install-dir/terraform.tfvars -auto-approve /home/jenkins/ws/workspace/ocp-common/Flexy-install/flexy/workdir/install-dir/upi_on_vsphere-terraform-scripts
01-11 12:42:29.278  module.lb.data.ignition_systemd_unit.haproxy: Refreshing state...
01-11 12:42:29.278  module.lb.data.ignition_user.core: Refreshing state...
01-11 12:42:29.278  module.lb.data.ignition_file.podman: Refreshing state...
01-11 12:42:29.839  data.vsphere_datacenter.dc: Refreshing state...
01-11 12:42:29.839  data.vsphere_datastore.datastore: Refreshing state...
01-11 12:42:29.839  data.vsphere_network.network: Refreshing state...
01-11 12:42:29.839  data.vsphere_compute_cluster.compute_cluster: Refreshing state...
01-11 12:42:29.839  data.vsphere_virtual_machine.template: Refreshing state...
01-11 12:42:30.775  module.dns_cluster_domain.data.aws_route53_zone.base: Refreshing state...
01-11 12:43:02.857  
01-11 12:43:02.857  Error: error fetching virtual machine: ServerFaultCode: The object 'vim.Folder:group-v686107' has already been deleted or has not been completely created

QE use the same terraform code with CI. 
https://github.com/openshift/installer/tree/master/upi/vsphere

Comment 19 Robert Bost 2022-09-29 15:28:30 UTC
We found that this issue may be coming from vCenter when VM Info is requested at the same time as folders are being deleted (this is happening in CI very often). More detailed research can be found in https://issues.redhat.com/browse/SPLAT-683.

Comment 20 Patrick Dillon 2022-10-18 15:23:42 UTC
I am closing this bug. I am marking as WONTFIX but there is still work ongoing on this issue in https://issues.redhat.com/browse/SPLAT-683. I would mark as a duplicate, but that does not work with the transition to jira.

The latest update on that card indicates that this issue seems to arise when a folder is being deleted at the same time as the resource is being created. So this is an upstream/environmental issue that is only likely to emerge in high-volume environments like CI.


Note You need to log in before you can comment on or make changes to this bug.