1825323 – vSphere IPI Failure to clone VMs on vSAN storage

Bug 1825323 - vSphere IPI Failure to clone VMs on vSAN storage

Summary: vSphere IPI Failure to clone VMs on vSAN storage

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Patrick Dillon
QA Contact:	jima
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-17 17:28 UTC by davis phillips
Modified:	2020-07-23 12:26 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: upstream vsphere terraform provider shows that terraform plan is inconsistent when installing to a vcenter that forces thin provisioning Consequence: terraform plan is inconsistent and install fails during provisioning Fix: patched upstream provider so that disk type is not pre-set but instead computed on apply Result: disk type will be set based on policy (defaults to thick) and install succeeds
Clone Of:
Environment:
Last Closed:	2020-07-13 17:28:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift installer pull 3603	None	closed	Bug 1825323: replace terraform-provider-vsphere with OpenShift fork	2021-02-10 21:22:31 UTC
Github	openshift terraform-provider-vsphere pull 1	None	closed	carry patch for skipping diskDiffOperation when cloning, partial fix of Bug 1825323	2021-02-10 21:22:31 UTC
Red Hat Product Errata	RHBA-2020:2409	None	None	None	2020-07-13 17:28:58 UTC

Description davis phillips 2020-04-17 17:28:17 UTC

Description of problem:
After launching command "./openshift-install create cluster" to create cluster, master and bootstrap instances have already been cloned, and wait for Kubernets API up, the task of cloning master vm are launched again and again without stopping

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

Client Version: 4.5.0-0.nightly-2020-04-17-114625
Server Version: 4.5.0-0.nightly-2020-04-17-114625
Kubernetes Version: v1.18.0-rc.1
[root@rh8-tools ipi]# ./openshift-install version
./openshift-install 4.5.0-0.nightly-2020-04-17-114625
built from commit 19d22e67d372145f24bed3030aa049da9ebc398e
release image quay.io/openshift-release-dev/ocp-release-nightly@sha256:2357cbed27ba57912fc92392ef7cacb1a79546587946771ab6a71fbf397056a3

Default vSAN storage policy:
https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-C228168F-6807-4C2A-9D74-E584CAF49A2A.html

How reproducible:


Steps to Reproduce:
1.create install_config.yaml target vSAN backed datastore.
2.run command "./openshift-install create cluster 
3.clone will fail vSAN default storage policy is thin provisioned.

Actual results:
VM clone fails

Expected results:
Clone is successful

Comment 1 davis phillips 2020-04-17 22:25:12 UTC

ERROR
ERROR Error: Provider produced inconsistent final plan
ERROR
ERROR When expanding the plan for module.master.vsphere_virtual_machine.vm[1] to
ERROR include new values learned so far during apply, provider
ERROR "registry.terraform.io/-/vsphere" produced an invalid new value for
ERROR .disk[0].thin_provisioned: was cty.False, but now cty.True.
ERROR
ERROR This is a bug in the provider, which should be reported in the provider's own
ERROR issue tracker.

Comment 2 Patrick Dillon 2020-05-06 20:45:33 UTC

Terraform plan is producing a plan inconsistent with the values found during apply. The plan shows the VM will be thick-provisioned, but the vSAN storage policy enforces a thin-provision. 

These docs provide background over the general problem: https://www.terraform.io/docs/extend/terraform-0.12-compatibility.html#inaccurate-plans

The document points out that the solution is: "If you see either of these errors, the remedy is the same: implement CustomizeDiff for the resource type that is causing the problem, and write logic to more accurately predict the outcome of any changes to Computed attributes."

Interestingly, the vSphere provider recently merged a pull request which implements import OVA functionality similar to our private vSphere provider. In that pull request they skip CustomizeDiff in the case where they import the OVA: https://github.com/terraform-providers/terraform-provider-vsphere/blob/master/vsphere/resource_vsphere_virtual_machine.go#L904-L907

Skipping this CustomizeDiff allows a proof of concept to install successfully.

Comment 3 Patrick Dillon 2020-05-14 15:47:34 UTC

Upstream PR: https://github.com/terraform-providers/terraform-provider-vsphere/pull/1075
OpenShift Fork: https://github.com/openshift/terraform-provider-vsphere

Comment 4 Patrick Dillon 2020-05-14 15:54:29 UTC

https://github.com/openshift/terraform-provider-vsphere/pull/1 is a partial fix. Will need to vendor it in to installer once merged.

Comment 6 Abhinav Dahiya 2020-05-26 16:44:03 UTC

The installer PR has merged.

Comment 7 jima 2020-06-01 06:58:41 UTC

@davis phillips, there is no vSAN storage on QE vsphere env now. Do you have any vsphere server with vSAN storage to let me verify the issue?

Comment 8 jima 2020-06-10 02:40:01 UTC

verified the issue on nightly build 4.5.0-0.nightly-2020-06-09-030606，the task of cloning master vm is only launched once.

Comment 10 errata-xmlrpc 2020-07-13 17:28:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.