Bug 1875939

Summary: vsphere - cloning a Windows 2019 (1909) template fails with Invalid operation for device '0'.
Product: OpenShift Container Platform Reporter: Joseph Callen <jcallen>
Component: Cloud ComputeAssignee: Danil Grigorev <dgrigore>
Cloud Compute sub component: Other Providers QA Contact: Milind Yadav <miyadav>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aravindh, dgrigore, mimccune, zhsun
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:37:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joseph Callen 2020-09-04 16:41:16 UTC
Description of problem:

When creating a machineset for a windows guest in vSphere the Windows template fails to clone with 
"Failed to clone 1909-template on 10.2.32.6, WorkloadDatastore in SDDC-Datacenter to windows-worker-skss8 on 10.2.32.9, WorkloadDatastore in jcallen-test-mxcwh in SDDC-Datacenter: Invalid operation for device '0'."

This has to do with the deviceChange:
https://github.com/openshift/machine-api-operator/blob/master/pkg/controller/vsphere/reconciler.go#L582

I am guessing but I think the process of removing and creating network interfaces contributes to this error. It could also be the disk change but since that is a minor change I don't think its the cause.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Joseph Callen 2020-09-04 16:43:19 UTC
I have a branch that I have been using to test and modify to support a customization spec. I confirmed with that branch that deviceChange does cause the "invalid operation for device '0'.
See:
https://github.com/openshift/machine-api-operator/compare/master...jcpowermac:vsphere_windows_customization

Please reach out for additional context.

Comment 2 Joseph Callen 2020-09-04 16:44:28 UTC
Also tried to remove the cd-rom from the template but that didn't work. I have seen that cause issues in cloning processes previously.

Comment 3 Aravindh Puthiyaparambil 2020-09-04 17:24:34 UTC
@mimccune this needed for Windows Container to GA in the 4.6 timeframe.

Comment 4 Michael McCune 2020-09-04 18:18:59 UTC
thanks for the update Aravindh, i wasn't sure =)

Comment 5 Danil Grigorev 2020-09-10 23:18:23 UTC
The issue appears when a user requires to clone a disk from the template (probably snapshot as well) and specifies a lesser size than the one set on the template. This requires a specific `shrink` operation on the volume in advance. The fix would include the check for the template disk size before requesting "edit" operation on it's size.

Comment 7 Milind Yadav 2020-09-24 09:05:57 UTC
VERIFIED on:
4.6.0-0.nightly-2020-09-23-022756

Steps :
1.Created a machineset with below specs :
https://gist.github.com/miyadav/1944a899291c96126dc793b1ded5162b ( used template as jcallen-win-test )

Actual and expected - machineset created successfully 

2.Machine is provisioned state 
As expected :
[miyadav@miyadav vsphere]$ oc get machines --config vsp
Flag --config has been deprecated, use --kubeconfig instead
NAME                                 PHASE         TYPE   REGION   ZONE   AGE
jimavmc2401-99gxm-master-0           Running                              4h49m
jimavmc2401-99gxm-master-1           Running                              4h49m
jimavmc2401-99gxm-master-2           Running                              4h49m
jimavmc2401-99gxm-worker-4qtnh       Running                              4h40m
jimavmc2401-99gxm-worker-dkv2q       Running                              4h40m
jimavmc2401-99gxm-worker-srwpf       Running                              4h40m
jimavmc2401-99gxm-worker-win-8xvgq   Provisioned                          20m


Additional info :
Moving to VERIFIED

Comment 10 errata-xmlrpc 2020-10-27 16:37:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196