Bug 1868755
Summary: | [vsphere] terraform provider vsphereprivate crashes when network is unavailable on host | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Joseph Callen <jcallen> | ||||
Component: | Installer | Assignee: | Jeremiah Stuever <jstuever> | ||||
Installer sub component: | openshift-installer | QA Contact: | jima | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | medium | ||||||
Priority: | medium | CC: | adahiya, bleanhar, cahl, jima, mstaeble | ||||
Version: | 4.6 | ||||||
Target Milestone: | --- | ||||||
Target Release: | 4.8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1874240 (view as bug list) | Environment: | |||||
Last Closed: | 2021-07-27 22:32:47 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1874240 | ||||||
Attachments: |
|
Description
Joseph Callen
2020-08-13 18:01:03 UTC
Chris built the installer from the release-4.5, when I checked it was the most recent commit. I also asked for access to the cluster, we will see if I can get it. Can you include the steps to reproduce this on our end? some details that can be useful is, - how is the environment setup? Any thing in the environment that might be causing this? - exact steps of how you installed the cluster - maybe also include the install-config.yaml _remove the password for vCenter_ The vsphere setup was done by a member of the advanced cluster management (ACM) team. One thing I noticed is even though the openshift-installer gives 4 options for choosing the network, (and as shown by govc commands previously) is on vSphere client UI the datacenter shows the following: Networks: Public Network VM Network Distributed Port Groups: cluster-vlan-canary-i2e-lv Uplink Port Groups: Plot01 Cluster N-DVUplinks-376 The cluster shows the following Networks: cluster-vlan-canary-i2e-lv Plot01 Cluster N-DVUplinks-376 Public Network The choice VM Network is not listed for the cluster. So maybe this is the reason for the exception Note that I am currently using the Public Network option and have been able to deploy. Wondering if it might be due to the code never executing https://github.com/openshift/installer/blob/077850081b131c098a65b067a063fe8feb8e5d25/pkg/terraform/exec/plugins/vsphereprivate/resource_vsphereprivate_import_ova.go#L149-152 so importOvaParams.Network is never set and then this might causes a problem when line https://github.com/openshift/installer/blob/077850081b131c098a65b067a063fe8feb8e5d25/pkg/terraform/exec/plugins/vsphereprivate/resource_vsphereprivate_import_ova.go#L193 is executed Can you help us by proving some reproduction steps? Things we can do to setup our environment to reproduce this. I provided info previously that showed the VMware setup. Here it is again: ``` The vsphere setup was done by a member of the advanced cluster management (ACM) team. One thing I noticed is even though the openshift-installer gives 4 options for choosing the network, (and as shown by govc commands previously) is on vSphere client UI the datacenter shows the following: Networks: Public Network VM Network Distributed Port Groups: cluster-vlan-canary-i2e-lv Uplink Port Groups: Plot01 Cluster N-DVUplinks-376 The cluster shows the following Networks: cluster-vlan-canary-i2e-lv Plot01 Cluster N-DVUplinks-376 Public Network The choice VM Network is not listed for the cluster. So maybe this is the reason for the exception Note that I am currently using the Public Network option and have been able to deploy. ``` I was using `VM Network` option as it was one shown as valid, but picking that option caused the error. Only when I used `Public Network` did it work. So it appears that if there is a network defined for the datacenter that is not defined at the cluster, that the error occurs. We plan to get to this in 4.7. Given that we don't believe that it's possible to yield success in this scenario this is mostly improving an error message we're not marking this for 4.7 any longer. Verified on Jeremiah's env since QE don't have such specific env, and passed. Reproduced issue on 4.7.0-0.nightly-2021-03-01-085007: Set TF_LOG with DEBUG, and run openshift_install to create cluster, and get same nil pointer error in Description. time="2021-03-02T07:10:04-05:00" level=debug msg="2021-03-02T07:10:04.434-0500 [DEBUG] plugin.terraform-provider-vsphereprivate: 2021/03/02 07:10:04 [DEBUG] /home/admin/.cache/openshift-installer/image_cache/3b90b8f621548d33b166787e8d70207d: Beginning import ova create" time="2021-03-02T07:10:04-05:00" level=debug msg="2021-03-02T07:10:04.485-0500 [DEBUG] plugin.terraform-provider-vsphereprivate: panic: runtime error: invalid memory address or nil pointer dereference" time="2021-03-02T07:10:04-05:00" level=debug msg="2021-03-02T07:10:04.485-0500 [DEBUG] plugin.terraform-provider-vsphereprivate: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xabc7ece]" ... time="2021-03-02T07:10:04-05:00" level=debug msg="2021/03/02 07:10:04 [DEBUG] vsphereprivate_import_ova.import: apply errored, but we're indicating that via the Error pointer rather than returning it: rpc error: code = Canceled desc = context canceled" time="2021-03-02T07:10:04-05:00" level=debug msg="2021/03/02 07:10:04 [ERROR] <root>: eval: *terraform.EvalApplyPost, err: rpc error: code = Canceled desc = context canceled" time="2021-03-02T07:10:04-05:00" level=debug msg="2021/03/02 07:10:04 [ERROR] <root>: eval: *terraform.EvalSequence, err: rpc error: code = Canceled desc = context canceled" time="2021-03-02T07:10:04-05:00" level=debug msg="2021-03-02T07:10:04.501-0500 [DEBUG] plugin: plugin process exited: path=/tmp/openshift-install-803644701/plugins/terraform-provider-vsphereprivate pid=2462 error=\"exit status 2\"" time="2021-03-02T07:10:04-05:00" level=debug msg="2021-03-02T07:10:04.501-0500 [WARN] plugin.stdio: received EOF, stopping recv loop: err=\"rpc error: code = Unavailable desc = transport is closing\"" time="2021-03-02T07:10:04-05:00" level=error time="2021-03-02T07:10:04-05:00" level=error msg="Error: rpc error: code = Canceled desc = context canceled" time="2021-03-02T07:10:04-05:00" level=error time="2021-03-02T07:10:04-05:00" level=error time="2021-03-02T07:10:04-05:00" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change" Then same command launched on nightly build 4.8.0-0.nightly-2021-03-01-143026, report detailed error message that why failed to create cluster. time="2021-03-02T06:54:03-05:00" level=debug msg="vsphereprivate_import_ova.import: Creating..." time="2021-03-02T06:54:03-05:00" level=error time="2021-03-02T06:54:03-05:00" level=error msg="Error: failed to find provided vSphere objects: failed to find a host in the cluster that contains the provided network" time="2021-03-02T06:54:03-05:00" level=error time="2021-03-02T06:54:03-05:00" level=error msg=" on ../../../tmp/openshift-install-781371026/main.tf line 43, in resource \"vsphereprivate_import_ova\" \"import\":" time="2021-03-02T06:54:03-05:00" level=error msg=" 43: resource \"vsphereprivate_import_ova\" \"import\" {" time="2021-03-02T06:54:03-05:00" level=error time="2021-03-02T06:54:03-05:00" level=error time="2021-03-02T06:54:03-05:00" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |