Bug 1918005 - [vsphere] If there are multiple port groups with the same name installation fails
Summary: [vsphere] If there are multiple port groups with the same name installation f...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Nobody
QA Contact: jima
URL:
Whiteboard:
Depends On: 1981941
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-19 19:52 UTC by Joseph Callen
Modified: 2022-08-10 10:36 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the installer passed an ambiguous name for networks to the terraform provider Consequence: if the terraform provider found more than one network, it could not decide which was correct and would fail Fix: the installer now passes the id for the network Result: the terraform provider knows exactly which network to use and install succeeds. There is no difference in behavior for users (they still provide the same information as before).
Clone Of:
: 1955697 (view as bug list)
Environment:
Last Closed: 2022-08-10 10:35:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5673 0 None open Bug 1918005: vsphere: Use Managed Object ID for networks instead of potentially duplicate name. 2022-03-10 18:53:03 UTC
Github openshift machine-api-operator pull 961 0 None Merged Bug 1918005: Use known vSphere cluster to uniquely identify networks. 2022-03-10 18:52:22 UTC
Red Hat Knowledge Base (Solution) 5720901 0 None None None 2021-01-20 20:00:01 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:36:00 UTC

Description Joseph Callen 2021-01-19 19:52:58 UTC
If two port groups exist with the same name on the same vCenter instance but existing in two different datacenters terraform will fail with:

error fetching network: path '3214-pxe' resolves to multiple networks, Please specify

See https://registry.terraform.io/providers/hashicorp/vsphere/latest/docs/data-sources/network#distributed_virtual_switch_uuid

for requirements if multiple port groups exist with the same name.

Will need to determine the distributed virtual switch name (if dvs are being used)
Provide name to terraform
Add arg to vsphere_network - distributed_virtual_switch_uuid

Comment 2 Joseph Callen 2021-01-19 19:55:25 UTC
It might be easier to fix the terraform provider. There is no reason why you should need to provide the distributed switch uuid if you already provide the datacenter that the port group belongs to.

Comment 3 Nick Stielau 2021-02-03 19:16:25 UTC
Seems edge-casey enought that I'm confident that it isn't a release blocker.

Comment 4 Matthew Staebler 2021-02-03 19:21:59 UTC
There is a workaround for this, so lowering the severity to medium. Nick beat me to setting "blocker-".

Comment 5 Brenton Leanhardt 2021-02-04 18:45:07 UTC
We're still planning to fix this.

Comment 6 Jeremiah Stuever 2021-04-30 16:58:34 UTC
I split out the datacenter portion of this bug because it will be significantly easier to fix. As for the port groups (networks), our vsphereprivate Terraform provider complicates the fix because we are matching the name of the network without regard to path. Still working to find a solution here.

Comment 7 Jeremiah Stuever 2021-05-19 17:21:21 UTC
I have looked at this from various angles. The difficulty is that the vsphereprivate provider selects a host that has both the data store and the network. However, during this process we only have the network name (excluding path). At this point, I think the best path forward is to move forward with CORS-1476 and deprecate vsphereprivate with import ova from upstream provider. However, I believe this is blocked by the work in CORS-1511 to upgrade Terraform in general.

https://issues.redhat.com/browse/CORS-1476
https://issues.redhat.com/browse/CORS-1511

Comment 8 Russell Teague 2021-07-12 17:32:20 UTC
Still waiting for terraform upgrade.

Comment 9 Russell Teague 2021-08-02 17:21:58 UTC
Still waiting for terraform upgrade.

Comment 19 Patrick Dillon 2022-03-11 19:31:17 UTC
Fix is still in progress.

Comment 22 jima 2022-04-08 01:07:16 UTC
Verified on QE local vsphere env where has a standard port group and distributed port group with same name (VM Network).

Reproduced the issue on 4.11.0-0.nightly-2022-03-26-130745.

$ ./openshift-install create cluster --dir ipi --log-level debug
...
INFO Creating infrastructure resources...         
...           
DEBUG [INFO] running Terraform command: /tmp/openshift-install-pre-bootstrap-909263706/bin/terraform init -no-color -force-copy -input=false -backend=true -get=true -upgrade=false -plugin-dir=/tmp/openshift-install-pre-bootstrap-909263706/plugins 
..
DEBUG                                              
DEBUG Terraform has been successfully initialized! 
DEBUG [INFO] running Terraform command: /tmp/openshift-install-pre-bootstrap-909263706/bin/terraform apply -no-color -auto-approve -input=false -var-file=/tmp/openshift-install-pre-bootstrap-909263706/terraform.tfvars.json -var-file=/tmp/openshift-install-pre-bootstrap-909263706/terraform.platform.auto.tfvars.json -lock=true -parallelism=10 -refresh=true 
ERROR                                              
ERROR Error: error fetching network: path 'VM Network' resolves to multiple networks, Please specify 
ERROR                                              
ERROR   with data.vsphere_network.network,         
ERROR   on main.tf line 38, in data "vsphere_network" "network": 
ERROR   38: data "vsphere_network" "network" {     
ERROR                                              
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: exit status 1 
FATAL                                              
FATAL Error: error fetching network: path 'VM Network' resolves to multiple networks, Please specify 
FATAL                                              
FATAL   with data.vsphere_network.network,         
FATAL   on main.tf line 38, in data "vsphere_network" "network": 
FATAL   38: data "vsphere_network" "network" {     
FATAL                                              
FATAL                                              


Verified on 4.11.0-0.nightly-2022-04-06-213816 and installer part is passed.
$ ./openshift-install create cluster --dir ipi1 --log-level debug
INFO Creating infrastructure resources...         
...                       
DEBUG [INFO] running Terraform command: /tmp/openshift-install-pre-bootstrap-3262909223/bin/terraform init -no-color -force-copy -input=false -backend=true -get=true -upgrade=false -plugin-dir=/tmp/openshift-install-pre-bootstrap-3262909223/plugins 
...
DEBUG Terraform has been successfully initialized! 
DEBUG [INFO] running Terraform command: /tmp/openshift-install-pre-bootstrap-3262909223/bin/terraform apply -no-color -auto-approve -input=false -var-file=/tmp/openshift-install-pre-bootstrap-3262909223/terraform.tfvars.json -var-file=/tmp/openshift-install-pre-bootstrap-3262909223/terraform.platform.auto.tfvars.json -lock=true -parallelism=10 -refresh=true 
DEBUG                                              
DEBUG Terraform used the selected providers to generate the following execution 
DEBUG plan. Resource actions are indicated with the following symbols: 
DEBUG   + create                                   
DEBUG  <= read (data resources)                    
DEBUG                                              
DEBUG Terraform will perform the following actions: 
DEBUG                                              

This BZ also contains machine-api PR(machine-api-operator#961), need to verify on machine-api side before moving bug to "VERIFIED" status.
Installation on local env is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=2073021 now, will check machine-api function when BZ#2073021 is fixed.

Comment 23 Patrick Dillon 2022-04-11 19:54:24 UTC
The PR for this BZ also fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2063829

Comment 24 sunzhaohua 2022-04-19 04:15:25 UTC
Verified for machine-api part.Run some regression testing for machine scale up/down, all works well.
add workload
$ oc get machine                                                                                                                                                        [11:50:16]
NAME                                PHASE     TYPE   REGION   ZONE   AGE
reliability01-vb569-master-0        Running                          109m
reliability01-vb569-master-1        Running                          109m
reliability01-vb569-master-2        Running                          109m
reliability01-vb569-worker-8sthp    Running                          96m
reliability01-vb569-worker-b2nxs    Running                          96m
reliability01-vb569-worker-x4h5x    Running                          96m
reliability01-vb569-worker1-255bx   Running                          12m
reliability01-vb569-worker1-c9fn4   Running                          12m
reliability01-vb569-worker1-snwc4   Running                          24m

remove workload
$ oc get machine                                                                                                                                                        [11:50:47]
NAME                                PHASE     TYPE   REGION   ZONE   AGE
reliability01-vb569-master-0        Running                          116m
reliability01-vb569-master-1        Running                          116m
reliability01-vb569-master-2        Running                          116m
reliability01-vb569-worker-8sthp    Running                          103m
reliability01-vb569-worker-b2nxs    Running                          103m
reliability01-vb569-worker-x4h5x    Running                          103m
reliability01-vb569-worker1-255bx   Running                          19m

Comment 25 jima 2022-04-19 05:20:40 UTC
Thanks Zhaohua, based on comment#22 and comment#24, move bug to VERIFIED.

Comment 34 errata-xmlrpc 2022-08-10 10:35:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.