Created attachment 1711354 [details] openshift_install.log Description of problem: This is certainly an issue but I am not quite sure yet what vSphere cluster configuration is causing this. While working with Chris (cahl) he was having issues with terraform crash when selecting "VM Network". While this network is valid it had no uplinks on the vSwitch. Log snipit: time="2020-08-12T14:26:53-04:00" level=debug msg="2020/08/12 14:26:53 [DEBUG] vsphereprivate_import_ova.import: applying the planned Create change" time="2020-08-12T14:26:53-04:00" level=debug msg="2020-08-12T14:26:53.949-0400 [DEBUG] plugin.terraform-provider-vsphereprivate: 2020/08/12 14:26:53 [DEBUG] /Users/cahl/Library/Caches/openshift-installer/image_cache/abc7fccbe43d10b0fa665c80e3865ac7: Beginning import ova create" time="2020-08-12T14:26:54-04:00" level=debug msg="2020-08-12T14:26:54.847-0400 [DEBUG] plugin.terraform-provider-vsphereprivate: panic: runtime error: invalid memory address or nil pointer dereference" time="2020-08-12T14:26:54-04:00" level=debug msg="2020-08-12T14:26:54.847-0400 [DEBUG] plugin.terraform-provider-vsphereprivate: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xa92f677]" time="2020-08-12T14:26:54-04:00" level=debug msg="2020-08-12T14:26:54.847-0400 [DEBUG] plugin.terraform-provider-vsphereprivate: " time="2020-08-12T14:26:54-04:00" level=debug msg="2020-08-12T14:26:54.847-0400 [DEBUG] plugin.terraform-provider-vsphereprivate: goroutine 61 [running]:" time="2020-08-12T14:26:54-04:00" level=debug msg="2020-08-12T14:26:54.847-0400 [DEBUG] plugin.terraform-provider-vsphereprivate: github.com/openshift/installer/pkg/terraform/exec/plugins/vsphereprivate.findImportOvaParams(0xc000257080, 0xc000b94210, 0xb, 0xc000b941f0, 0xe, 0xc000b94230, 0xd, 0xc000b94260, 0xa, 0xc000b96240, ...)" time="2020-08-12T14:26:54-04:00" level=debug msg="2020-08-12T14:26:54.847-0400 [DEBUG] plugin.terraform-provider-vsphereprivate: \t/Users/cahl/go/src/github.com/openshift/installer/pkg/terraform/exec/plugins/vsphereprivate/resource_vsphereprivate_import_ova.go:199 +0xb47" Port Groups for cluster: govc ls -L -t DistributedVirtualPortgroup '*' /CICD-Plot01/network/cluster-vlan-canary-i2e-lv /CICD-Plot01/network/Plot01 Cluster N-DVUplinks-376 govc ls -L -t Network '*' /CICD-Plot01/network/VM Network /CICD-Plot01/network/Public Network And where the crash happens: https://github.com/openshift/installer/blob/master/pkg/terraform/exec/plugins/vsphereprivate/resource_vsphereprivate_import_ova.go#L198-L203
Chris built the installer from the release-4.5, when I checked it was the most recent commit. I also asked for access to the cluster, we will see if I can get it.
Can you include the steps to reproduce this on our end? some details that can be useful is, - how is the environment setup? Any thing in the environment that might be causing this? - exact steps of how you installed the cluster - maybe also include the install-config.yaml _remove the password for vCenter_
The vsphere setup was done by a member of the advanced cluster management (ACM) team. One thing I noticed is even though the openshift-installer gives 4 options for choosing the network, (and as shown by govc commands previously) is on vSphere client UI the datacenter shows the following: Networks: Public Network VM Network Distributed Port Groups: cluster-vlan-canary-i2e-lv Uplink Port Groups: Plot01 Cluster N-DVUplinks-376 The cluster shows the following Networks: cluster-vlan-canary-i2e-lv Plot01 Cluster N-DVUplinks-376 Public Network The choice VM Network is not listed for the cluster. So maybe this is the reason for the exception Note that I am currently using the Public Network option and have been able to deploy.
Wondering if it might be due to the code never executing https://github.com/openshift/installer/blob/077850081b131c098a65b067a063fe8feb8e5d25/pkg/terraform/exec/plugins/vsphereprivate/resource_vsphereprivate_import_ova.go#L149-152 so importOvaParams.Network is never set and then this might causes a problem when line https://github.com/openshift/installer/blob/077850081b131c098a65b067a063fe8feb8e5d25/pkg/terraform/exec/plugins/vsphereprivate/resource_vsphereprivate_import_ova.go#L193 is executed
Can you help us by proving some reproduction steps? Things we can do to setup our environment to reproduce this.
I provided info previously that showed the VMware setup. Here it is again: ``` The vsphere setup was done by a member of the advanced cluster management (ACM) team. One thing I noticed is even though the openshift-installer gives 4 options for choosing the network, (and as shown by govc commands previously) is on vSphere client UI the datacenter shows the following: Networks: Public Network VM Network Distributed Port Groups: cluster-vlan-canary-i2e-lv Uplink Port Groups: Plot01 Cluster N-DVUplinks-376 The cluster shows the following Networks: cluster-vlan-canary-i2e-lv Plot01 Cluster N-DVUplinks-376 Public Network The choice VM Network is not listed for the cluster. So maybe this is the reason for the exception Note that I am currently using the Public Network option and have been able to deploy. ``` I was using `VM Network` option as it was one shown as valid, but picking that option caused the error. Only when I used `Public Network` did it work. So it appears that if there is a network defined for the datacenter that is not defined at the cluster, that the error occurs.
We plan to get to this in 4.7.
Given that we don't believe that it's possible to yield success in this scenario this is mostly improving an error message we're not marking this for 4.7 any longer.
Verified on Jeremiah's env since QE don't have such specific env, and passed. Reproduced issue on 4.7.0-0.nightly-2021-03-01-085007: Set TF_LOG with DEBUG, and run openshift_install to create cluster, and get same nil pointer error in Description. time="2021-03-02T07:10:04-05:00" level=debug msg="2021-03-02T07:10:04.434-0500 [DEBUG] plugin.terraform-provider-vsphereprivate: 2021/03/02 07:10:04 [DEBUG] /home/admin/.cache/openshift-installer/image_cache/3b90b8f621548d33b166787e8d70207d: Beginning import ova create" time="2021-03-02T07:10:04-05:00" level=debug msg="2021-03-02T07:10:04.485-0500 [DEBUG] plugin.terraform-provider-vsphereprivate: panic: runtime error: invalid memory address or nil pointer dereference" time="2021-03-02T07:10:04-05:00" level=debug msg="2021-03-02T07:10:04.485-0500 [DEBUG] plugin.terraform-provider-vsphereprivate: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xabc7ece]" ... time="2021-03-02T07:10:04-05:00" level=debug msg="2021/03/02 07:10:04 [DEBUG] vsphereprivate_import_ova.import: apply errored, but we're indicating that via the Error pointer rather than returning it: rpc error: code = Canceled desc = context canceled" time="2021-03-02T07:10:04-05:00" level=debug msg="2021/03/02 07:10:04 [ERROR] <root>: eval: *terraform.EvalApplyPost, err: rpc error: code = Canceled desc = context canceled" time="2021-03-02T07:10:04-05:00" level=debug msg="2021/03/02 07:10:04 [ERROR] <root>: eval: *terraform.EvalSequence, err: rpc error: code = Canceled desc = context canceled" time="2021-03-02T07:10:04-05:00" level=debug msg="2021-03-02T07:10:04.501-0500 [DEBUG] plugin: plugin process exited: path=/tmp/openshift-install-803644701/plugins/terraform-provider-vsphereprivate pid=2462 error=\"exit status 2\"" time="2021-03-02T07:10:04-05:00" level=debug msg="2021-03-02T07:10:04.501-0500 [WARN] plugin.stdio: received EOF, stopping recv loop: err=\"rpc error: code = Unavailable desc = transport is closing\"" time="2021-03-02T07:10:04-05:00" level=error time="2021-03-02T07:10:04-05:00" level=error msg="Error: rpc error: code = Canceled desc = context canceled" time="2021-03-02T07:10:04-05:00" level=error time="2021-03-02T07:10:04-05:00" level=error time="2021-03-02T07:10:04-05:00" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change" Then same command launched on nightly build 4.8.0-0.nightly-2021-03-01-143026, report detailed error message that why failed to create cluster. time="2021-03-02T06:54:03-05:00" level=debug msg="vsphereprivate_import_ova.import: Creating..." time="2021-03-02T06:54:03-05:00" level=error time="2021-03-02T06:54:03-05:00" level=error msg="Error: failed to find provided vSphere objects: failed to find a host in the cluster that contains the provided network" time="2021-03-02T06:54:03-05:00" level=error time="2021-03-02T06:54:03-05:00" level=error msg=" on ../../../tmp/openshift-install-781371026/main.tf line 43, in resource \"vsphereprivate_import_ova\" \"import\":" time="2021-03-02T06:54:03-05:00" level=error msg=" 43: resource \"vsphereprivate_import_ova\" \"import\" {" time="2021-03-02T06:54:03-05:00" level=error time="2021-03-02T06:54:03-05:00" level=error time="2021-03-02T06:54:03-05:00" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438