Created attachment 1839999 [details] openshift_install.log Version: issue happens occasionally on different release(4.5 ~ 4.10) Platform: vsphere (ipi and upi) What happened? On QE CI and manual install, ipi installation fails sometimes with below error when importing ova template or deleting bootstrap server. 11-03 22:22:31.654 level=debug msg=vsphereprivate_import_ova.import: Still creating... [10s elapsed] 11-03 22:22:41.686 level=debug msg=vsphereprivate_import_ova.import: Still creating... [20s elapsed] 11-03 22:22:51.690 level=debug msg=vsphereprivate_import_ova.import: Still creating... [30s elapsed] 11-03 22:23:01.683 level=debug msg=vsphereprivate_import_ova.import: Still creating... [40s elapsed] 11-03 22:23:11.678 level=debug msg=vsphereprivate_import_ova.import: Still creating... [50s elapsed] 11-03 22:23:21.700 level=debug msg=vsphereprivate_import_ova.import: Still creating... [1m0s elapsed] 11-03 22:23:31.670 level=debug msg=vsphereprivate_import_ova.import: Still creating... [1m10s elapsed] 11-03 22:23:32.233 level=debug msg=vsphereprivate_import_ova.import: Creation complete after 1m11s [id=vm-594271] 11-03 22:23:32.233 level=debug msg=data.vsphere_virtual_machine.template: Refreshing state... 11-03 22:23:54.402 level=error 11-03 22:23:54.402 level=error msg=Error: error fetching virtual machine: ServerFaultCode: The object 'vim.Folder:group-v594204' has already been deleted or has not been completely created 11-03 22:23:54.402 level=error 11-03 22:23:54.402 level=error msg= on ../../../../../../../../tmp/openshift-install-456326068/main.tf line 39, in data "vsphere_virtual_machine" "template": 11-03 22:23:54.402 level=error msg= 39: data "vsphere_virtual_machine" "template" { 11-03 22:23:54.402 level=error 11-03 22:23:54.402 level=error 11-03 22:23:54.402 level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change Also searched similar issues in CI, we can see that it occurred on both upi and ipi: https://search.ci.openshift.org/?search=+has+already+been+deleted+or+has+not+been+completely+created&maxAge=336h&context=0&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job Attached `.openshift_install.log` What did you expect to happen? infrastructure should be created successfully, and installation is completed. How to reproduce it (as minimally and precisely as possible)? Install ipi/upi cluster on vsphere platform
https://github.com/hashicorp/terraform-provider-vsphere/search?q=has+already+been+deleted+or+has+not+been+completely+created&type=issues
I assume, given the conversation in the PR, that this BZ is not actually read for QA. I am going to move this back to ASSIGNED.
Based on comment 4 and issue also hit on QE UPI installation, move bug back to ASSIGNED.
(In reply to jima from comment #7) > Based on comment 4 and issue also hit on QE UPI installation, move bug back > to ASSIGNED. How is this BZ related to failures seen in a UPI installation? This BZ is errors coming from the terraform, which is specific to IPI.
since the similar error is also reported in UPI installation, for example, below log is from recent failure on 4.10 nightly build. 01-11 12:42:19.948 + echo 'INFO: terraform version' 01-11 12:42:19.948 INFO: terraform version 01-11 12:42:19.948 + terraform-0.12 version 01-11 12:42:19.948 Terraform v0.12.24 01-11 12:42:20.884 + provider.aws v3.37.0 01-11 12:42:20.884 + provider.http v2.1.0 01-11 12:42:20.884 + provider.ignition (unversioned) 01-11 12:42:20.884 + provider.null v3.1.0 01-11 12:42:20.884 + provider.vsphere v1.26.0 01-11 12:42:20.884 01-11 12:42:20.884 Your version of Terraform is out of date! The latest version 01-11 12:42:20.884 is 1.1.3. You can update by downloading from https://www.terraform.io/downloads.html 01-11 12:42:20.884 + terraform-0.12 apply -var-file=/home/jenkins/ws/workspace/ocp-common/Flexy-install/flexy/workdir/install-dir/terraform.tfvars -auto-approve /home/jenkins/ws/workspace/ocp-common/Flexy-install/flexy/workdir/install-dir/upi_on_vsphere-terraform-scripts 01-11 12:42:29.278 module.lb.data.ignition_systemd_unit.haproxy: Refreshing state... 01-11 12:42:29.278 module.lb.data.ignition_user.core: Refreshing state... 01-11 12:42:29.278 module.lb.data.ignition_file.podman: Refreshing state... 01-11 12:42:29.839 data.vsphere_datacenter.dc: Refreshing state... 01-11 12:42:29.839 data.vsphere_datastore.datastore: Refreshing state... 01-11 12:42:29.839 data.vsphere_network.network: Refreshing state... 01-11 12:42:29.839 data.vsphere_compute_cluster.compute_cluster: Refreshing state... 01-11 12:42:29.839 data.vsphere_virtual_machine.template: Refreshing state... 01-11 12:42:30.775 module.dns_cluster_domain.data.aws_route53_zone.base: Refreshing state... 01-11 12:43:02.857 01-11 12:43:02.857 Error: error fetching virtual machine: ServerFaultCode: The object 'vim.Folder:group-v686107' has already been deleted or has not been completely created QE use the same terraform code with CI. https://github.com/openshift/installer/tree/master/upi/vsphere
We found that this issue may be coming from vCenter when VM Info is requested at the same time as folders are being deleted (this is happening in CI very often). More detailed research can be found in https://issues.redhat.com/browse/SPLAT-683.
I am closing this bug. I am marking as WONTFIX but there is still work ongoing on this issue in https://issues.redhat.com/browse/SPLAT-683. I would mark as a duplicate, but that does not work with the transition to jira. The latest update on that card indicates that this issue seems to arise when a folder is being deleted at the same time as the resource is being created. So this is an upstream/environmental issue that is only likely to emerge in high-volume environments like CI.