Bug 1955697

Summary: [vsphere] If there are multiple datacenters with the same name installation fails
Product: OpenShift Container Platform Reporter: Jeremiah Stuever <jstuever>
Component: InstallerAssignee: OCP Installer <ocp-installer>
Installer sub component: openshift-installer QA Contact: jima
Status: CLOSED DEFERRED Docs Contact:
Severity: medium    
Priority: medium CC: bleanhar, jcallen, jima, morgan.peterman, nstielau, padillon
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1918005 Environment:
Last Closed: 2023-03-09 01:28:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1981941    
Bug Blocks:    

Comment 2 jima 2021-05-20 02:36:03 UTC
Use Jeremiah's embedded vsphere env on VMC to verify the bug since QE don't have such env.

There are two datacenters under different folder in this embedded vsphere env:
SDDC-Datacenter
foo/SDDC-Datacenter

Reproduced the issue on nightly build 4.8.0-0.nightly-2021-05-17-075254:
In install-config.yaml:
platform:
  vsphere:
    apiVIP: 192.168.1.2
    cluster: bar/Cluster-1
    datacenter: SDDC-Datacenter
    defaultDatastore: WorkloadDatastore
    ingressVIP: 192.168.1.3
    network: internal

Error reported when running command "openhsift-install create cluster --dir ipi":

ERROR                                              
ERROR Error: error fetching datacenter: path 'SDDC-Datacenter' resolves to multiple datacenters 
ERROR                                              
ERROR   on ../../../../tmp/openshift-install-308467941/main.tf line 20, in data "vsphere_datacenter" "datacenter": 
ERROR   20: data "vsphere_datacenter" "datacenter" { 


Verified on nightly build 4.8.0-0.nightly-2021-05-18-205323 with fix, use same install-config yaml file to create cluster, not hit above error any more, but worker nodes are unable to be created with same error in MAC pod log:

E0520 02:25:10.494534       1 controller.go:281] jstuevervcsa-v8fbg-master-0: failed to check if machine exists: jstuevervcsa-v8fbg-master-0: failed to create scope for machine: failed to create vSphere session: unable to find datacenter "SDDC-Datacenter": path 'SDDC-Datacenter' resolves to multiple datacenters
E0520 02:25:10.528348       1 controller.go:302] controller-runtime/manager/controller/machine_controller "msg"="Reconciler error" "error"="jstuevervcsa-v8fbg-master-0: failed to create scope for machine: failed to create vSphere session: unable to find datacenter \"SDDC-Datacenter\": path 'SDDC-Datacenter' resolves to multiple datacenters" "name"="jstuevervcsa-v8fbg-master-0" "namespace"="openshift-machine-api" 
I0520 02:25:10.528435       1 controller.go:174] jstuevervcsa-v8fbg-master-1: reconciling Machine 

cluster is still there and not destroyed, you can access to check.

Comment 4 Jeremiah Stuever 2021-08-02 16:54:11 UTC
If I recall, we paused on this bug because there isn't currently a combination that works for this value between the vsphere-private Terraform provider and the in-cluster cloud controller. We were discussing upgrading the community Terraform VSphere provider to a newer version which will obsolete the vsphere-private code. However, that change is blocked by the work to split terraform templates into stages to enable us to upgrade to newer versions.

https://issues.redhat.com/browse/CORS-1696

Comment 5 Russell Teague 2021-08-24 17:32:11 UTC
Waiting on terraform upgrade.

Comment 9 jima 2022-06-24 08:15:54 UTC
After terraform upgraded (https://issues.redhat.com/browse/CORS-1696), issue still exist when multiple datacenters with the same name.
tested on 4.11.0-0.nightly-2022-06-21-151125

$ ./openshift-install create cluster --dir ipi-dc
? SSH Public Key /home/jima/.ssh/ssh.pub
? Platform vsphere
? vCenter xxxxxx
? Username openshift-qe
? Password [? for help] **********
INFO Connecting to vCenter xxxxxx
INFO Defaulting to only available datacenter: Datacenter 
? Cluster jima/reliability
? Default Datastore datastore3
? Network VM Network
? Virtual IP Address for API 10.8.33.103
? Virtual IP Address for Ingress 10.8.33.104
? Base Domain qe.devcluster.openshift.com
? Cluster Name jima23a
? Pull Secret [? for help] ****************
ERROR                                              
ERROR Error: error fetching datacenter: path 'Datacenter' resolves to multiple datacenters 
ERROR                                              
ERROR   with data.vsphere_datacenter.datacenter,   
ERROR   on main.tf line 20, in data "vsphere_datacenter" "datacenter": 
ERROR   20: data "vsphere_datacenter" "datacenter" { 
ERROR                                              
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "pre-bootstrap" stage: failed to create cluster: failed to apply Terraform: exit status 1 
ERROR                                              
ERROR Error: error fetching datacenter: path 'Datacenter' resolves to multiple datacenters 
ERROR                                              
ERROR   with data.vsphere_datacenter.datacenter,   
ERROR   on main.tf line 20, in data "vsphere_datacenter" "datacenter": 
ERROR   20: data "vsphere_datacenter" "datacenter" { 
ERROR                                              
ERROR

Comment 11 Shiftzilla 2023-03-09 01:28:22 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9699