Bug 1862290 - [vSphere][IPI] cannot change the value of "thin_provisioned" - (old: true new: false)"
Summary: [vSphere][IPI] cannot change the value of "thin_provisioned" - (old: true new...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Russell Teague
QA Contact: jima
URL:
Whiteboard:
Depends On:
Blocks: 1873537
TreeView+ depends on / blocked
 
Reported: 2020-07-30 22:48 UTC by Robert Bost
Modified: 2020-10-27 16:21 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The DiskPostCloneOperation of the Terraform vSphere provider performed a check on the "thin_provisioned" and "eagerly_scrub" properties of VMs cloned from the uploaded RHCOS OVA. Consequence: The check failed because the underlying provisioning type silently changed during cloning and did not match the source provisioning type. Fix: A patch was made to the Terraform vSphere provider to ignore the "thin_provisioned" and "eagerly_scrub" properties during DiskPostCloneOperation because Terraform should not be opinionated about these properties since it does not support modifying them. Result: Cloning of the RHCOS OVA succeeds and the installer is able to proceed with deploying OpenShift.
Clone Of:
: 1873537 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:21:22 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4060 0 None closed Bug 1862290: vendor/terraform-provider-vsphere: DiskPostCloneOperation patch carry 2021-01-26 20:11:15 UTC
Red Hat Knowledge Base (Solution) 5298191 0 None None None 2020-08-05 16:27:30 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:21:49 UTC

Description Robert Bost 2020-07-30 22:48:45 UTC
OpenShift 4.5.4 IPI on vSphere failing due to:

Error: error reconfiguring virtual machine: error processing disk changes post-clone: disk.0: cannot change the value of "thin_provisioned" - (old: true new: false)"

Full install log incoming.

Comment 2 Robert Bost 2020-07-30 22:50:23 UTC
Just wanted to note this error is inverse of what was addressed in https://github.com/openshift/installer/commit/d37b4b9d62243605139d638c45d4ca3397a7d345

Comment 5 Abhinav Dahiya 2020-08-03 16:39:14 UTC
Can you include some steps to reproduce this bug?

- Can you include details about your environment like defaults that could cause this to happen?
- Anything else that we can do on our side to reproduce this..

Comment 8 Michael Washer 2020-08-05 04:18:42 UTC
@Abhinav, Is there anything else that we can provide to push this along?

Comment 20 Russell Teague 2020-08-07 18:16:02 UTC
Given that cloning the VM in the vSphere UI reported an unexpected change in provisioning type, it will be important to get some feedback from VMware on possible causes.  If you already have contacts with VMware on this please let us know.  Ultimately we need to understand this behavior before we can reliably provide a workaround.

I have been able to test a potential workaround by patching the Terraform vSphere provider.  This change allowed me to successfully clone VMs when the thin_provisioned value of the cloned VM was different than the source VM.  The patches have been submitted here [1] and here [2] for feedback from the vendor and internally.

Once we have the requested install log with TF_LOG=debug we can review that output to see if there could be any problems with this patch.  Additionally, we could provide a test build of openshift-install if the customer would be willing to test it in a development environment.

[1] https://github.com/hashicorp/terraform-provider-vsphere/pull/1161
[2] https://github.com/openshift/terraform-provider-vsphere/pull/3

Comment 23 Russell Teague 2020-08-07 19:30:08 UTC
The logs are confirming what we are seeing in the vSphere UI and the suspected underlying cause of failure.

The terraform provider is reporting that the uploaded OVA resulted in a thick provisioned VM [thin_provisioned:false].

DEBUG 2020-08-07T11:13:17.829-0700 [DEBUG] plugin.terraform-provider-vsphere: 2020/08/07 11:13:17 [DEBUG] ReadDiskAttrsForDataSource: Attributes returned: [map[eagerly_scrub:true size:16 thin_provisioned:false]]


And that the cloned VM (in this case the bootstrap VM) resulted in a thin provisioned VM ["thin_provisioned" - (old: true...] and the terraform provider errored when trying to set it to thick [...new: false)], which is the same as source.

DEBUG 2020/08/07 11:13:23 [DEBUG] module.bootstrap.vsphere_virtual_machine.vm: apply errored, but we're indicating that via the Error pointer rather than returning it: error reconfiguring virtual machine: error processing disk changes post-clone: disk.0: cannot change the value of "thin_provisioned" - (old: true new: false)


Given that it *is* possible for the disk provisioning type to change for a VM when cloning (i.e. vSAN storage policies or underlying storage APIs), the terraform provider should not be opinionated about 'thin_provisioned', especially since it does not support changing the disk provisioning type.

If VMware/NetApp can resolve the issue of the unexpected change in disk provisioning type, the existing openshift-installer should proceed as expected when cloning VMs.  If it is determined that the change in disk provisioning type is expected and desired, the referenced terraform provider patch would be necessary to resolve the cloning issue during installation.

Comment 36 jima 2020-08-21 09:53:25 UTC
Checking with Davis to get env to verify the bug

Comment 51 Russell Teague 2020-08-26 20:58:13 UTC
Now that we know that we are dealing with Nutanix storage, we can get a bit more understanding.

https://portal.nutanix.com/page/documents/kbs/details?targetId=kA0600000008eKFCAY
"When cloning any type of disk format (thin, thick lazy zeroed or thick eager zeroed) to the same Nutanix datastore, the resulting VM will have a thin disk regardless of the explicit choice of a disk format in the vSphere client."

This at least explains why the provisioning type was changing silently.  It was due to the Nutanix storage API.

For more detailed information:
https://next.nutanix.com/how-it-works-22/so-is-it-thin-or-thick-after-all-vm-disks-provisioning-37987
"Nutanix is different"

Comment 52 jima 2020-08-28 02:06:12 UTC
move bug to "VERIFIED" since the fix has been applied on customer env and passed, and normal installation is successful without any cloning vm issue with payload including the fix.

Comment 55 errata-xmlrpc 2020-10-27 16:21:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.