Description of problem: Using nightly https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/latest-4.5/openshift-install-linux-4.5.0-0.nightly-2020-06-05-214616.tar.gz Similar to https://bugzilla.redhat.com/show_bug.cgi?id=1833256 Worker Machines stuck: status: lastUpdated: "2020-06-29T20:57:26Z" phase: Provisioning providerStatus: conditions: - lastProbeTime: "2020-06-29T20:57:27Z" lastTransitionTime: "2020-06-29T20:57:27Z" message: 'unable to get resource pool for <nil>: default resource pool resolves to multiple instances, please specify' reason: MachineCreationFailed status: "False" type: MachineCreation Version-Release number of the following components: How reproducible: Only one install performed. Error suggests likely to reproduce on every go. Steps to Reproduce: Run Openshift IPI installer on vpshere with inputs given from install-config.yaml Actual results: masters come up but workers fail to provision Expected results: Working cluster.
Moving this to machine-api operator This vcenter has multiple datacenters/clusters. It looks like the machine-api operator is failing to find a default resource pool because there are multiple default resource pools. Issue could probably be resolved by finding the default resource pool for the datacenter/cluster in the provider spec. The error "'unable to get resource pool for <nil>: default resource pool resolves to multiple instances, please specify'" seems to come from the call to this function: https://github.com/openshift/machine-api-operator/blob/release-4.5/pkg/controller/vsphere/reconciler.go#L473 The resourcePoolPath argument is read from the provider, which is generated from the installer, which omits the optional resourcePool field: https://github.com/openshift/machine-api-operator/blob/release-4.5/pkg/controller/vsphere/reconciler.go#L454 https://github.com/openshift/installer/blob/master/pkg/asset/machines/vsphere/machines.go#L87-L91 So the value for resourcePoolPath is empty. MAO is passing that empty value through to Finder.ResourcePoolOrDefault which seems to fail with above errors when there are multiple dcs/clusters. Workaround was successfully achieved by specifying path /SDDC-Datacenter/host/Cluster-1/Resources in machineset: $ oc get machinesets.machine.openshift.io -n openshift-machine-api jcallen-d2q7j-worker --template '{{.spec.template.spec.providerSpec.value.workspace.resourcePool}}' /SDDC-Datacenter/host/Cluster-1/Resources% $ oc get machinesets.machine.openshift.io NAME DESIRED CURRENT READY AVAILABLE AGE jcallen-d2q7j-worker 3 3 9m45s MAO could potentially construct a similar path from the provider spec.
A relevant bug is https://bugzilla.redhat.com/show_bug.cgi?id=1833256, there is a work around but is not user friendly. As a user experience improvement, I suggest there should be a way to specify a resource pool at installing time since we can't assume customer has only one resource pool in their vsphere configuration.
Thanks for reporting. This is currently expected behaviour. >Issue could probably be resolved by finding the default resource pool for the datacenter/cluster in the provider spec? We'll explore doing this.
On closer inspection, MAO would still be very limited in determining default resource pool because no cluster is provided in the machine provider. Therefore if there are multiple clusters in a datacenter MAO will not be able to resolve the issue. Moving back to installer, which should populate the resource pool to the root resource pool in the provided cluster, which the installer has access to. Providing a non-root resource pool would be a new feature and would require changes to terraform and vsphere provider.
This is under active code review, but possibly will not merge today so we are adding UpcomingSprint.
Hi - back on September (last time I tried using the Terraform install (not thru the openshift-install binary)which was available on the openshift-installer project), terraform would create the resource pool and put all the VM's inside of it. I think this would be the regular behavior now that it's done thru openshift-install. This would imply that also documentation on the user being able to create resource pools needs to be added as well.
Verified on 4.6.0-0.nightly-2020-07-25-091217 and passed. $ oc get machinesets.machine.openshift.io -n openshift-machine-api wduan0729a-5zjt4-worker --template '{{.spec.template.spec.providerSpec.value.workspace.resourcePool}}' /dc1/host/devel/Resources
*** Bug 1861954 has been marked as a duplicate of this bug. ***
(In reply to David Barreda from comment #12) > Hi - back on September (last time I tried using the Terraform install (not > thru the openshift-install binary)which was available on the > openshift-installer project), terraform would create the resource pool and > put all the VM's inside of it. I think this would be the regular behavior > now that it's done thru openshift-install. > > This would imply that also documentation on the user being able to create > resource pools needs to be added as well. openshift-install uses the root resource pool for the cluster designated in the install-config.
We have same problem in an installation on vSphere using installer 4.5.4. We solved it by placing the relative path on resourcepool setting on machineset. The complete DC path didnt worked. === This didnt worked === (...) workspace: datacenter: DC datastore: DATASTORE05 folder: /DC/vm/prd-47q4m resourcepool: /DC/Cluster/Resources server: customervcenter.com.br (...) === This worked === (...) workspace: datacenter: DC datastore: DATASTORE05 folder: /DC/vm/prd-47q4m resourcepool: Cluster/Resources server: customervcenter.com.br (...)
FYI, we ran into this bug also when no resource pool has been explicitly created and the customer is just using the default resource pool. We tried: resourcepool: Resources /<full path>/Resources vmware id: ResourcePool-resgroup-194 root Finally we created a resourcepool OpenShift-RP which then worked. Working config looked like workspace: datacenter: <DC> datastore: <DATASTORE> folder: ocppoc-ld5s7 resourcepool: OpenShift-RP server: <SERVER>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196