Description of the problem: Installation of cluster version 4.10.11 via ACM fails with error: level=fatal msg=failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": platform.vsphere.network: Invalid value: "My-network_ds": could not find vSphere cluster at /Y/host/Z: cluster '/Y/host/Z' not found The correct path is: "/X/Y/Z" Structure is: Vcenter-> X -> Y -> Z Relevant config is: platform: vsphere: clusterOSImage: xxxxxx vCenter: xxxxxxxxx username: xxxxxx password:xxxx datacenter: X/Y folder: /X/Y/vm/Openshift defaultDatastore: Datastore cluster: Z apiVIP: xxxxx ingressVIP: 1xxxx network: xxxx The bug seems similar to 1882022 and 2063829. Installation of version 4.10.8 works fine with the same configuration. Release version: * ACM 2.4 Operator snapshot version: OCP version: * OCP 4.10.9 to 4.10.11 Browser Info: Steps to reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
@efried Could you help to take a look?
Sorry for the delay. Can we please involve the installer team here? (I'll say in general if the same version of hive succeeds on one Z and fails on another, it'll be less likely to be a hive problem, and thus more expeditious to start with OCP engineering.) I did a quick skim based on the error message. This *might* be related to https://github.com/openshift/installer/pull/5773 (backport of https://github.com/openshift/installer/pull/5673). Regardless, it looks like the author of that PR may be the SME in this space, and a good person to consult. @rbost would you mind having a look?
It looks like the installer is failing at the following line which changed in 4.10.11 in the PR that Eric mentioned in the previous comment: https://github.com/openshift/installer/blob/release-4.10/pkg/asset/installconfig/vsphere/client.go#L88-L93 In the original pull request we acknowledged the line as a risk and did not change it since similar lines were used elsewhere (and we hadn't heard of reports of failure for those similar lines of code). Given this bug report, we probably need to address it! I've reviewed the case and see that the customer does indeed have a Datacenter embedded in a Folder which would cause the error. Leaving needinfo.
Thanks @efried and @rbost I will transfer the issue to installer team.
In one QE env, we also reproduced the same issue on 4.11, when datacenter embeded in a folder, cluster will be deployed failed due to unable to find expected vSphere cluster. FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": platform.vsphere.network: Invalid value: "VM Network": could not find vSphere cluster at /Datacenter/host/jima/reliability: cluster '/Datacenter/host/jima/reliability' not found
Dropping my needinfo since someone else submitted a fix to this bug (https://github.com/openshift/installer/pull/6105).
verified on 4.12.0-0.nightly-2022-07-17-215842 and passed, move bug to VERIFIED. Install cluster successfully on env where datacenter embedded into folder $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-07-17-215842 True False 70m Cluster version is 4.12.0-0.nightly-2022-07-17-215842 $ oc get cm cloud-provider-config -n openshift-config -o yaml apiVersion: v1 data: config: | [Global] secret-name = "vsphere-creds" secret-namespace = "kube-system" insecure-flag = "1" [Workspace] server = "xxx" datacenter = "qedc/sub-qe-dc/Datacenter" default-datastore = "datastore3" folder = "/qedc/sub-qe-dc/Datacenter/vm/jima23a-6qv4d" [VirtualCenter "dhcp-8-30-198.lab.eng.rdu2.redhat.com"] datacenters = "qedc/sub-qe-dc/Datacenter" kind: ConfigMap metadata: creationTimestamp: "2022-07-19T10:25:10Z" name: cloud-provider-config namespace: openshift-config resourceVersion: "1912" uid: 57ee3323-fd7e-401a-ac2b-6e8d1bf7686b
Thank you for fixing this bug. Just a question... When this fix will be backported to 4.10 and 4.11?
I plan on doing the backport to 4.10 as well.
Please let us know if you have more details ETA for backport.
We are waiting on 4.11.z to open so we can merge the changes.
How it looks with 4.10.Z backport? It is urgent for customer now. --mheppler
(In reply to mheppler from comment #14) > How it looks with 4.10.Z backport? It is urgent for customer now. > > --mheppler The change has to make its way in to 4.11 first. At the moment it's pending verification by the QE team. You can follow the current status here https://bugzilla.redhat.com/show_bug.cgi?id=2110482
FYI, the fix has been merged into the installer 4.10 branch.
Hi, please, which version of 4.10 will contain fix? Thanks...
(In reply to mheppler from comment #17) > Hi, > > please, which version of 4.10 will contain fix? > > Thanks... From https://bugzilla.redhat.com/show_bug.cgi?id=2111258#c6, it's 4.10.31 onwards.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399