Bug 2097691 - [vsphere] failed to create cluster if datacenter is embedded in a Folder
Summary: [vsphere] failed to create cluster if datacenter is embedded in a Folder
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.12.0
Assignee: OCP Installer
QA Contact: jima
URL:
Whiteboard:
Depends On:
Blocks: 2110482
TreeView+ depends on / blocked
 
Reported: 2022-06-16 10:17 UTC by mheppler
Modified: 2023-01-17 19:50 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
* Previously, when installing a cluster on vSphere using a datacenter that is embedded inside a folder, the installation program could not locate the datacenter object, causing the installation to fail. In this update, the installation program can traverse the directory that contains the datacenter object, allowing the installation to succeed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2097691[*BZ2097691*])
Clone Of:
Environment:
Last Closed: 2023-01-17 19:50:02 UTC
Target Upstream Version:
Embargoed:
efried: needinfo-
efried: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 6105 0 None open Bug 2097691: vsphere installconfig: use full dc path in network validation 2022-07-12 16:00:26 UTC
Github stolostron backlog issues 23575 0 None None None 2022-06-16 12:50:59 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:50:34 UTC

Description mheppler 2022-06-16 10:17:53 UTC
Description of the problem:

Installation of cluster version 4.10.11 via ACM fails with error:

level=fatal msg=failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": platform.vsphere.network: Invalid value: "My-network_ds": could not find vSphere cluster at /Y/host/Z: cluster '/Y/host/Z' not found

The correct path is: "/X/Y/Z"

Structure is:  Vcenter-> X -> Y -> Z

Relevant config is: 

platform:
  vsphere:
    clusterOSImage: xxxxxx
    vCenter: xxxxxxxxx
    username: xxxxxx
    password:xxxx
    datacenter: X/Y
    folder: /X/Y/vm/Openshift
    defaultDatastore: Datastore
    cluster: Z
    apiVIP: xxxxx
    ingressVIP: 1xxxx
    network: xxxx

The bug seems similar to 1882022 and 2063829.

Installation of version 4.10.8 works fine with the same configuration.


Release version:

  * ACM 2.4

Operator snapshot version:

OCP version:

  * OCP 4.10.9 to 4.10.11

Browser Info:

Steps to reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

Comment 1 daliu 2022-06-20 03:09:03 UTC
@efried Could you help to take a look?

Comment 2 Eric Fried 2022-06-23 13:42:51 UTC
Sorry for the delay. Can we please involve the installer team here? (I'll say in general if the same version of hive succeeds on one Z and fails on another, it'll be less likely to be a hive problem, and thus more expeditious to start with OCP engineering.)

I did a quick skim based on the error message. This *might* be related to https://github.com/openshift/installer/pull/5773 (backport of https://github.com/openshift/installer/pull/5673). Regardless, it looks like the author of that PR may be the SME in this space, and a good person to consult. @rbost would you mind having a look?

Comment 3 Robert Bost 2022-06-23 14:30:28 UTC
It looks like the installer is failing at the following line which changed in 4.10.11 in the PR that Eric mentioned in the previous comment:

  https://github.com/openshift/installer/blob/release-4.10/pkg/asset/installconfig/vsphere/client.go#L88-L93

In the original pull request we acknowledged the line as a risk and did not change it since similar lines were used elsewhere (and we hadn't heard of reports of failure for those similar lines of code). Given this bug report, we probably need to address it!

I've reviewed the case and see that the customer does indeed have a Datacenter embedded in a Folder which would cause the error. 

Leaving needinfo.

Comment 4 daliu 2022-06-24 00:39:45 UTC
Thanks @efried and @rbost 
I will transfer the issue to installer team.

Comment 5 jima 2022-06-24 07:59:32 UTC
In one QE env, we also reproduced the same issue on 4.11, when datacenter embeded in a folder, cluster will be deployed failed due to unable to find expected vSphere cluster.

FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": platform.vsphere.network: Invalid value: "VM Network": could not find vSphere cluster at /Datacenter/host/jima/reliability: cluster '/Datacenter/host/jima/reliability' not found

Comment 6 Robert Bost 2022-07-12 18:54:44 UTC
Dropping my needinfo since someone else submitted a fix to this bug (https://github.com/openshift/installer/pull/6105).

Comment 8 jima 2022-07-19 12:16:57 UTC
verified on 4.12.0-0.nightly-2022-07-17-215842 and passed, move bug to VERIFIED.

Install cluster successfully on env where datacenter embedded into folder 

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-07-17-215842   True        False         70m     Cluster version is 4.12.0-0.nightly-2022-07-17-215842


$ oc get cm cloud-provider-config -n openshift-config -o yaml
apiVersion: v1
data:
  config: |
    [Global]
    secret-name = "vsphere-creds"
    secret-namespace = "kube-system"
    insecure-flag = "1"

    [Workspace]
    server = "xxx"
    datacenter = "qedc/sub-qe-dc/Datacenter"
    default-datastore = "datastore3"
    folder = "/qedc/sub-qe-dc/Datacenter/vm/jima23a-6qv4d"

    [VirtualCenter "dhcp-8-30-198.lab.eng.rdu2.redhat.com"]
    datacenters = "qedc/sub-qe-dc/Datacenter"
kind: ConfigMap
metadata:
  creationTimestamp: "2022-07-19T10:25:10Z"
  name: cloud-provider-config
  namespace: openshift-config
  resourceVersion: "1912"
  uid: 57ee3323-fd7e-401a-ac2b-6e8d1bf7686b

Comment 9 mheppler 2022-07-25 08:05:54 UTC
Thank you for fixing this bug. Just a question... When this fix will be backported to 4.10 and 4.11?

Comment 11 Rafael Fonseca 2022-07-26 20:57:13 UTC
I plan on doing the backport to 4.10 as well.

Comment 12 Gellert Kis 2022-08-02 13:26:34 UTC
Please let us know if you have more details ETA for backport.

Comment 13 Rafael Fonseca 2022-08-02 14:53:29 UTC
We are waiting on 4.11.z to open so we can merge the changes.

Comment 14 mheppler 2022-08-16 08:28:36 UTC
How it looks with 4.10.Z backport? It is urgent for customer now.

--mheppler

Comment 15 Rafael Fonseca 2022-08-16 12:03:49 UTC
(In reply to mheppler from comment #14)
> How it looks with 4.10.Z backport? It is urgent for customer now.
> 
> --mheppler

The change has to make its way in to 4.11 first. At the moment it's pending verification by the QE team. You can follow the current status here https://bugzilla.redhat.com/show_bug.cgi?id=2110482

Comment 16 Rafael Fonseca 2022-08-26 08:38:06 UTC
FYI, the fix has been merged into the installer 4.10 branch.

Comment 17 mheppler 2022-10-10 12:13:23 UTC
Hi,

please, which version of 4.10 will contain fix?

Thanks...

Comment 18 Rafael Fonseca 2022-10-10 12:33:02 UTC
(In reply to mheppler from comment #17)
> Hi,
> 
> please, which version of 4.10 will contain fix?
> 
> Thanks...

From https://bugzilla.redhat.com/show_bug.cgi?id=2111258#c6, it's 4.10.31 onwards.

Comment 21 errata-xmlrpc 2023-01-17 19:50:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.