Bug 2162095 - [vSphere on Nutanix] cannot change the value of "thin_provisioned" - (old: true newValue: false) [NEEDINFO]
Summary: [vSphere on Nutanix] cannot change the value of "thin_provisioned" - (old: tr...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Cluster Lifecycle
Version: rhacm-2.6
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Le Yang
QA Contact: Hui Chen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-18 18:57 UTC by Tyler Bevan
Modified: 2024-11-01 15:22 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:
padillon: needinfo-
tyler.bevan: needinfo? (daliu)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 27133 0 None None None 2023-01-18 22:55:02 UTC
Red Hat Bugzilla 1862290 0 high CLOSED [vSphere][IPI] cannot change the value of "thin_provisioned" - (old: true new: false)" 2024-03-25 16:14:41 UTC

Description Tyler Bevan 2023-01-18 18:57:02 UTC
Description of the problem:
Issue described in https://bugzilla.redhat.com/show_bug.cgi?id=1862290 appears to affect ACM's installer on ACM 2.6 and OCP 4.11.
vSphere provider for terraform fails to delete a vm when it was cloned from a thick provisioned template on a Nutanix storage backend.

Release version: 2.6.3

Operator snapshot version: 2.6.3

OCP version: 4.11.18

Browser Info: Firefox 108

Steps to reproduce:
1. Deploy a new cluster via ACM using the vsphere provider against a cluster with nutanix backed vmware storage.
2. Installation deploys cluster fine, but the job fails when it can't remove the bootstrap node.

Actual results: Installation failure on bootstrap cleanup.

Expected results: Installation success and import into acm management.

Additional info:
Logs from the hive container:

time="2023-01-18T17:24:29Z" level=debug msg="Bootstrap status: complete"
time="2023-01-18T17:24:29Z" level=info msg="Destroying the bootstrap resources..."
time="2023-01-18T17:24:29Z" level=debug msg="creating /output/terraform/bin/terraform file"
time="2023-01-18T17:24:29Z" level=debug msg="creating /output/terraform/plugins/openshift/local/vsphere directory"
time="2023-01-18T17:24:29Z" level=debug msg="creating /output/terraform/plugins/openshift/local/vsphere/terraform-provider-vsphere_1.0.0_linux_amd64.zip file"
time="2023-01-18T17:24:29Z" level=debug msg="creating /output/terraform/plugins/openshift/local/vsphereprivate directory"
time="2023-01-18T17:24:29Z" level=debug msg="creating /output/terraform/plugins/openshift/local/vsphereprivate/terraform-provider-vsphereprivate_1.0.0_linux_amd64.zip file"
time="2023-01-18T17:24:29Z" level=debug msg="[INFO] running Terraform command: /output/terraform/bin/terraform version -json"
time="2023-01-18T17:24:29Z" level=debug msg="{"
time="2023-01-18T17:24:29Z" level=debug msg="  \"terraform_version\": \"1.0.11\","
time="2023-01-18T17:24:29Z" level=debug msg="  \"platform\": \"linux_amd64\","
time="2023-01-18T17:24:29Z" level=debug msg="  \"provider_selections\": {},"
time="2023-01-18T17:24:29Z" level=debug msg="  \"terraform_outdated\": true"
time="2023-01-18T17:24:29Z" level=debug msg="}"
time="2023-01-18T17:24:29Z" level=debug msg="[INFO] running Terraform command: /output/terraform/bin/terraform init -no-color -force-copy -input=false -backend=true -get=true -upgrade=false -plugin-dir=/output/terraform/plugins"
time="2023-01-18T17:24:29Z" level=debug
time="2023-01-18T17:24:29Z" level=debug msg="Initializing the backend..."
time="2023-01-18T17:24:29Z" level=debug
time="2023-01-18T17:24:29Z" level=debug msg="Initializing provider plugins..."
time="2023-01-18T17:24:29Z" level=debug msg="- Finding latest version of openshift/local/vsphere..."
time="2023-01-18T17:24:29Z" level=debug msg="- Installing openshift/local/vsphere v1.0.0..."
time="2023-01-18T17:24:29Z" level=debug msg="- Installed openshift/local/vsphere v1.0.0 (unauthenticated)"
time="2023-01-18T17:24:29Z" level=debug
time="2023-01-18T17:24:29Z" level=debug msg="Terraform has created a lock file .terraform.lock.hcl to record the provider"
time="2023-01-18T17:24:29Z" level=debug msg="selections it made above. Include this file in your version control repository"
time="2023-01-18T17:24:29Z" level=debug msg="so that Terraform can guarantee to make the same selections by default when"
time="2023-01-18T17:24:29Z" level=debug msg="you run \"terraform init\" in the future."
time="2023-01-18T17:24:29Z" level=debug
time="2023-01-18T17:24:29Z" level=debug msg="Terraform has been successfully initialized!"
time="2023-01-18T17:24:29Z" level=debug msg="[INFO] running Terraform command: /output/terraform/bin/terraform destroy -no-color -auto-approve -input=false -lock-timeout=0s -var-file=/tmp/openshift-install-bootstrap-2782656941/terraform.tfvars.json -var-file=/tmp/openshift-install-bootstrap-2782656941/terraform.platform.auto.tfvars.json -var-file=/tmp/openshift-install-bootstrap-2782656941/pre-bootstrap.tfvars.json -var-file=/tmp/openshift-install-bootstrap-2782656941/bootstrap.tfvars.json -var-file=/tmp/openshift-install-bootstrap-2782656941/master.tfvars.json -lock=true -parallelism=10 -refresh=true"
time="2023-01-18T17:24:31Z" level=debug msg="vsphere_virtual_machine.vm_bootstrap: Refreshing state... [id=421986c6-c60c-6c6f-024c-a41506351310]"
time="2023-01-18T17:24:31Z" level=error
time="2023-01-18T17:24:31Z" level=error msg="Error: disk.0: virtual disk \"disk0\": cannot change the value of \"thin_provisioned\" - (old: true newValue: false)"
time="2023-01-18T17:24:31Z" level=error
time="2023-01-18T17:24:31Z" level=error msg="  with vsphere_virtual_machine.vm_bootstrap,"
time="2023-01-18T17:24:31Z" level=error msg="  on main.tf line 12, in resource \"vsphere_virtual_machine\" \"vm_bootstrap\":"
time="2023-01-18T17:24:31Z" level=error msg="  12: resource \"vsphere_virtual_machine\" \"vm_bootstrap\" {"
time="2023-01-18T17:24:31Z" level=error
time="2023-01-18T17:24:31Z" level=fatal msg="terraform destroy: failed doing terraform destroy: exit status 1\n\nError: disk.0: virtual disk \"disk0\": cannot change the value of \"thin_provisioned\" - (old: true newValue: false)\n\n  with vsphere_virtual_machine.vm_bootstrap,\n  on main.tf line 12, in resource \"vsphere_virtual_machine\" \"vm_bootstrap\":\n  12: resource \"vsphere_virtual_machine\" \"vm_bootstrap\" {\n\n"
time="2023-01-18T17:24:32Z" level=error msg="error after waiting for command completion" error="exit status 1" installID=mpwn4n4j
time="2023-01-18T17:24:32Z" level=error msg="error provisioning cluster" error="exit status 1" installID=mpwn4n4j
time="2023-01-18T17:24:32Z" level=error msg="error running openshift-install, running deprovision to clean up" error="exit status 1" installID=mpwn4n4j

Comment 1 Jakob 2023-01-24 16:24:09 UTC
Retagging to cluster-lifecycle as it relates to cluster provisioning

Comment 2 daliu 2023-01-29 02:39:47 UTC
@efried Could you help to take a look?

Comment 3 Eric Fried 2023-01-30 19:03:02 UTC
This falls squarely in the installer's wheelhouse. @padillon could you find a pair of eyes for this please?

Comment 4 daliu 2023-02-01 09:17:27 UTC
@tyler.bevan 
I have create a discussion in openshift-installer channel about this issue. https://redhat-internal.slack.com/archives/CH06KMDRV/p1675213341255269
And Could you help to try the workaround.

"Change install-config diskType to thin"
"Making the assumption there is a storage policy that is forcing thin underneath terraform. Which it does not support well"

Note: In ACM env, in order to update install-config, you need to update the secret named <Cluster Namespace>/<Cluster Namespace>-install-config

Comment 5 Tyler Bevan 2023-02-01 19:43:31 UTC
An install attempt using "diskType: thin" on both machine pools did not resolve the issue. The error message was identical to the original log.
I don't see any way to customize the bootstrap vm's specifications in the documentation for install-config, which seems to be where the failure is.

As a side note, the documentation at https://github.com/openshift/installer/blob/master/docs/user/vsphere/customization.md#machine-pools is a bit unclear on if the key is disk_type or diskType. I did test it both ways to be sure.

Tyler

Comment 6 Tyler Bevan 2023-02-01 20:11:28 UTC
For context here is the sanitized install-config:

---
apiVersion: v1
metadata:
  name: ocp-lab
baseDomain:xxxxxxxxxx
controlPlane:
  name: master
  architecture: amd64
  hyperthreading: Enabled
  replicas: 3
  platform:
    vsphere:
      cpus: 8
      coresPerSocket: 8
      memoryMB: 16384
      osDisk:
        diskSizeGB: 120
      disk_type: thin
compute:
  - name: worker
    hyperthreading: Enabled
    architecture: amd64
    replicas: 0
    platform:
      vsphere:
        cpus: 4
        coresPerSocket: 2
        memoryMB: 16384
        osDisk:
          diskSizeGB: 120
        disk_type: thin
platform:
  vsphere:
    vCenter: xxxxxxxxxxxx
    username: xxxxxxxxxx
    password: xxxxxxxxxxxxxxxxxxx
    datacenter: xxxxxxxxxxxxxxx
    defaultDatastore: xxxxxxxxxxxx
    folder: /xxxxxxxx/vm/Openshift/ocp-lab
    cluster: xxxxxxxx
    apiVIP: 10.10.x.x
    ingressVIP: 10.10.x.x
    network: xxxxxxxxxxxxxxxxx

Comment 7 daliu 2023-02-02 02:38:03 UTC
@jcallen Any more suggestion ?

Comment 10 Joseph Callen 2023-02-10 18:27:12 UTC
Sorry I didn't see the reporter was not RH employee, repeating what I already stated in a private comment...

The install-config was incorrect, diskType is not at the machinepool, it is in the platform spec

https://github.com/openshift/installer/blob/master/pkg/types/vsphere/platform.go#L92

e.g.
platform:
  vsphere:
    vCenter: xxxxxxxxxxxx
    username: xxxxxxxxxx
    password: xxxxxxxxxxxxxxxxxxx
    datacenter: xxxxxxxxxxxxxxx
    defaultDatastore: xxxxxxxxxxxx
    folder: /xxxxxxxx/vm/Openshift/ocp-lab
    cluster: xxxxxxxx
    apiVIP: 10.10.x.x
    ingressVIP: 10.10.x.x
    network: xxxxxxxxxxxxxxxxx
    diskType: thin

If this still fails we will need a bug created in Jira assigned to installer. Please link the bug here.
We previously had issues with the vsphere terraform provider but I had thought those were resolved.


This is also probably related to: https://kb.vmware.com/s/article/68107
Changing state of objects that terraform created is a good way of having a failure.

Comment 11 daliu 2023-02-13 01:18:36 UTC
@tyler.bevan Could you help to try again follow https://bugzilla.redhat.com/show_bug.cgi?id=2162095#c10

Comment 12 Tyler Bevan 2023-02-15 16:36:23 UTC
@daliu That change does appear to fix the problem, as the provision finished as expected.
So, should we just presume that if you're on VMWare with a Nutanix storage backend that adding diskType: thin to the platform spec is mandatory?

Comment 13 Joseph Callen 2023-02-15 16:48:53 UTC
I can put a bug in to fix it (will be changed for 4.13 once the PR below is merged)

Just a single line change for terraform to ignore disk type state issue.

https://github.com/openshift/installer/pull/6770/files#diff-b4dbed356c5acdaefe0d1716089c2fa5efacfa5cb6ca4ad2000e1b5a5ddb7194R55

Comment 14 Joseph Callen 2023-02-15 18:18:59 UTC
Not the owner of this BZ I believe it can be closed. I will work this one in jira:
https://issues.redhat.com/browse/OCPBUGS-7551


Note You need to log in before you can comment on or make changes to this bug.