Bug 1918005
Summary: | [vsphere] If there are multiple port groups with the same name installation fails | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Joseph Callen <jcallen> | |
Component: | Installer | Assignee: | Nobody <nobody> | |
Installer sub component: | openshift-installer | QA Contact: | jima | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | medium | CC: | asadawar, bleanhar, morgan.peterman, nstielau, padillon, rbost, snetting, zhsun | |
Version: | 4.8 | |||
Target Milestone: | --- | |||
Target Release: | 4.11.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: the installer passed an ambiguous name for networks to the terraform provider
Consequence: if the terraform provider found more than one network, it could not decide which was correct and would fail
Fix: the installer now passes the id for the network
Result: the terraform provider knows exactly which network to use and install succeeds. There is no difference in behavior for users (they still provide the same information as before).
|
Story Points: | --- | |
Clone Of: | ||||
: | 1955697 (view as bug list) | Environment: | ||
Last Closed: | 2022-08-10 10:35:38 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1981941 | |||
Bug Blocks: |
Description
Joseph Callen
2021-01-19 19:52:58 UTC
It might be easier to fix the terraform provider. There is no reason why you should need to provide the distributed switch uuid if you already provide the datacenter that the port group belongs to. Seems edge-casey enought that I'm confident that it isn't a release blocker. There is a workaround for this, so lowering the severity to medium. Nick beat me to setting "blocker-". We're still planning to fix this. I split out the datacenter portion of this bug because it will be significantly easier to fix. As for the port groups (networks), our vsphereprivate Terraform provider complicates the fix because we are matching the name of the network without regard to path. Still working to find a solution here. I have looked at this from various angles. The difficulty is that the vsphereprivate provider selects a host that has both the data store and the network. However, during this process we only have the network name (excluding path). At this point, I think the best path forward is to move forward with CORS-1476 and deprecate vsphereprivate with import ova from upstream provider. However, I believe this is blocked by the work in CORS-1511 to upgrade Terraform in general. https://issues.redhat.com/browse/CORS-1476 https://issues.redhat.com/browse/CORS-1511 Still waiting for terraform upgrade. Still waiting for terraform upgrade. Fix is still in progress. Verified on QE local vsphere env where has a standard port group and distributed port group with same name (VM Network). Reproduced the issue on 4.11.0-0.nightly-2022-03-26-130745. $ ./openshift-install create cluster --dir ipi --log-level debug ... INFO Creating infrastructure resources... ... DEBUG [INFO] running Terraform command: /tmp/openshift-install-pre-bootstrap-909263706/bin/terraform init -no-color -force-copy -input=false -backend=true -get=true -upgrade=false -plugin-dir=/tmp/openshift-install-pre-bootstrap-909263706/plugins .. DEBUG DEBUG Terraform has been successfully initialized! DEBUG [INFO] running Terraform command: /tmp/openshift-install-pre-bootstrap-909263706/bin/terraform apply -no-color -auto-approve -input=false -var-file=/tmp/openshift-install-pre-bootstrap-909263706/terraform.tfvars.json -var-file=/tmp/openshift-install-pre-bootstrap-909263706/terraform.platform.auto.tfvars.json -lock=true -parallelism=10 -refresh=true ERROR ERROR Error: error fetching network: path 'VM Network' resolves to multiple networks, Please specify ERROR ERROR with data.vsphere_network.network, ERROR on main.tf line 38, in data "vsphere_network" "network": ERROR 38: data "vsphere_network" "network" { ERROR FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: exit status 1 FATAL FATAL Error: error fetching network: path 'VM Network' resolves to multiple networks, Please specify FATAL FATAL with data.vsphere_network.network, FATAL on main.tf line 38, in data "vsphere_network" "network": FATAL 38: data "vsphere_network" "network" { FATAL FATAL Verified on 4.11.0-0.nightly-2022-04-06-213816 and installer part is passed. $ ./openshift-install create cluster --dir ipi1 --log-level debug INFO Creating infrastructure resources... ... DEBUG [INFO] running Terraform command: /tmp/openshift-install-pre-bootstrap-3262909223/bin/terraform init -no-color -force-copy -input=false -backend=true -get=true -upgrade=false -plugin-dir=/tmp/openshift-install-pre-bootstrap-3262909223/plugins ... DEBUG Terraform has been successfully initialized! DEBUG [INFO] running Terraform command: /tmp/openshift-install-pre-bootstrap-3262909223/bin/terraform apply -no-color -auto-approve -input=false -var-file=/tmp/openshift-install-pre-bootstrap-3262909223/terraform.tfvars.json -var-file=/tmp/openshift-install-pre-bootstrap-3262909223/terraform.platform.auto.tfvars.json -lock=true -parallelism=10 -refresh=true DEBUG DEBUG Terraform used the selected providers to generate the following execution DEBUG plan. Resource actions are indicated with the following symbols: DEBUG + create DEBUG <= read (data resources) DEBUG DEBUG Terraform will perform the following actions: DEBUG This BZ also contains machine-api PR(machine-api-operator#961), need to verify on machine-api side before moving bug to "VERIFIED" status. Installation on local env is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=2073021 now, will check machine-api function when BZ#2073021 is fixed. The PR for this BZ also fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2063829 Verified for machine-api part.Run some regression testing for machine scale up/down, all works well. add workload $ oc get machine [11:50:16] NAME PHASE TYPE REGION ZONE AGE reliability01-vb569-master-0 Running 109m reliability01-vb569-master-1 Running 109m reliability01-vb569-master-2 Running 109m reliability01-vb569-worker-8sthp Running 96m reliability01-vb569-worker-b2nxs Running 96m reliability01-vb569-worker-x4h5x Running 96m reliability01-vb569-worker1-255bx Running 12m reliability01-vb569-worker1-c9fn4 Running 12m reliability01-vb569-worker1-snwc4 Running 24m remove workload $ oc get machine [11:50:47] NAME PHASE TYPE REGION ZONE AGE reliability01-vb569-master-0 Running 116m reliability01-vb569-master-1 Running 116m reliability01-vb569-master-2 Running 116m reliability01-vb569-worker-8sthp Running 103m reliability01-vb569-worker-b2nxs Running 103m reliability01-vb569-worker-x4h5x Running 103m reliability01-vb569-worker1-255bx Running 19m Thanks Zhaohua, based on comment#22 and comment#24, move bug to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |