Bug 2072696

Summary: Creating ESX compute resource on vcenter 7.x fails with InvalidArgument: A specified parameter was not correct: deviceChange[1].device.key
Product: Red Hat Satellite Reporter: Pablo Hess <phess>
Component: Compute Resources - VMWareAssignee: Chris Roberts <chrobert>
Status: CLOSED ERRATA QA Contact: Lukáš Hellebrandt <lhellebr>
Severity: high Docs Contact:
Priority: high    
Version: 6.10.0CC: bbuckingham, chrobert, mhulan, mjia, pcreech, pratshar, rlavi, sadas, wclark, zhunting
Target Milestone: 6.12.0Keywords: Regression, Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: rubygem-fog-vsphere-3.5.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2093401 2112349 (view as bug list) Environment:
Last Closed: 2022-11-16 13:33:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Trivial patch to always add a random integer as device key to new devices.
none
Hotfix RPM for Satellite 6.11.1 on RHEL7
none
Hotfix RPM for Satellite 6.11.1 on RHEL8 none

Description Pablo Hess 2022-04-06 19:07:07 UTC
Created attachment 1871138 [details]
Trivial patch to always add a random integer as device key to new devices.

Description of problem:
Creating a new VM from a template on VMware vCenter 7.x fails with ESX throwing an error and foreman rolling back the task. The error logged by foreman in production.log is:

"InvalidArgument: A specified parameter was not correct: deviceChange[1].device.key"


Version-Release number of selected component (if applicable):
This potentially affects Satellite versions newer than 6.8 but it has only been verified on 6.8.

tfm-rubygem-rbvmomi-2.2.0-3.el7sat.noarch
Potentially also:
tfm-rubygem-rbvmomi-2.2.0-4.el7sat.noarch


How reproducible:
100% of VM creations that depend on cloning a VM i.e. creating from a template talking to VMware ESX 7.0.

Steps to Reproduce:
1. Create a VM on Satellite cloning it from a template.

Actual results:

It fails and /var/log/foreman/production.log logs this error and backtrace:
~~~
2022-04-05T09:19:18 [I|app|dd11837a] Started POST "/api/hosts" for 192.168.1.15 at 2022-04-05 09:19:18 +0100
   (...some lines...)
2022-04-05T09:19:20 [I|app|dd11837a] Adding Compute instance for new-host.example.com
2022-04-05T09:19:28 [W|app|dd11837a] Failed to create a compute My_Cluster (VMware) instance new-host.example.com: InvalidArgument: A specified parameter was not correct: deviceChange[1].device.key
 dd11837a |  
2022-04-05T09:19:28 [I|app|dd11837a] Backtrace for 'Failed to create a compute My_Cluster (VMware) instance new-host.example.com: InvalidArgument: A specified parameter was not correct: deviceChange[1].device.key
 dd11837a |  ' error (RbVmomi::Fault): InvalidArgument: A specified parameter was not correct: deviceChange[1].device.key
 dd11837a | /opt/theforeman/tfm/root/usr/share/gems/gems/rbvmomi-2.2.0/lib/rbvmomi/vim/Task.rb:14:in `wait_for_completion'
 dd11837a | /opt/theforeman/tfm/root/usr/share/gems/gems/fog-vsphere-3.4.0/lib/fog/vsphere/requests/compute/vm_clone.rb:725:in `vm_clone'
 dd11837a | /usr/share/foreman/app/models/compute_resources/foreman/model/vmware.rb:567:in `clone_vm'
 dd11837a | /usr/share/foreman/app/models/compute_resources/foreman/model/vmware.rb:485:in `create_vm'
   (...long traceback...)
~~~


Expected results:

The new VM would be created.

Additional info:
I can find many references on the Web to VMware's API changes from 6.x to 7.0 where the device key field is no longer ignored. Instead, no 2 devices on the same VM can have the same key. New devices used to be created with a key of 0 (zero) but now they cannot share the same key. I have built a trivial patch (attached, it adds one single line to one single file) to have rbvmomi always include a newly-generated random integer in the 1..99999 range as device key whenever a new device is added to a VM. I have not tested it, though. The patch is to be applied from the rbvmomi dir at /opt/theforeman/tfm/root/usr/share/gems/gems/rbvmomi-2.2.0.

Comment 10 Pablo Hess 2022-05-12 20:33:22 UTC
Created issue and PR on fog-vsphere github.
Issue: https://github.com/fog/fog-vsphere/issues/272
PR: https://github.com/fog/fog-vsphere/pull/273

Hopefully this will pass all the tests. If not, please let me know what issues exists and I will gladly pursue their resolution.

Comment 11 Chris Roberts 2022-05-16 18:13:05 UTC
Packaging PR to bring it into Foreman Nightly:

https://github.com/theforeman/foreman-packaging/pull/7919/files

Brad can this get into a zstream?

Comment 13 Lukáš Hellebrandt 2022-06-02 13:01:00 UTC
Tried with Sat 6.11 snap 22.0 and vSphere Client version 7.0.3.00500 (thanks Chris Roberts for the instance).

Added the VMWare as CR, created an Image based on a Template existing on VMWare, created a new host from that template. The host and VM got created and the correct template was used, correct interfaces were created.

To verify, this conversation needs to be resolved: https://github.com/fog/fog-vsphere/pull/273

Comment 14 Lukáš Hellebrandt 2022-06-14 10:49:11 UTC
Since my comments haven't been resolved yet, I'm inclined to failing this as I think the solution may not scale or at least is not optimal. Even though this fix works in my environment, I suggest looking into it again. Any thoughts on this?

Comment 15 Lukáš Hellebrandt 2022-06-14 16:24:54 UTC
Failed QA.

This usually works in low scale, see comment 13. However, looking at the code, there is an ID being randomly selected in an interval of arbitrary size 5000 while its uniqueness is not being checked.

According to developers, the IDs only need to be unique for a single VM. First limitation that occurs here is that a VM can only have 5000 interfaces which I think is unnecessarily low but acceptable, a VM with more interfaces would be quite rare.

The important issue is, however, that the IDs are not being checked for uniqueness. When a VM has more than one interface, their IDs can be randomly generated the same, leading to an error. Chances of this are not small, for example, it would happen for every 5000th VM with 2 interfaces. Some of the larger customers are almost sure to hit this issue.

While this workaround isn't a real fix and doesn't scale, it makes VM provisioning on ESX 7 possible in most cases on small scale with reasonable amount of interfaces. Therefore, I think this shouldn't be considered fixed but also shouldn't be a release blocker, given the workaround code is present in the release.

Comment 17 Brad Buckingham 2022-06-27 14:00:09 UTC
*** Bug 2093401 has been marked as a duplicate of this bug. ***

Comment 18 Ron Lavi 2022-06-30 09:38:08 UTC
Upstream PR got merged and rubygem-fog-vsphere-3.5.2 is released,
moving to POST

Comment 19 wclark 2022-07-27 20:15:40 UTC
Created attachment 1899801 [details]
Hotfix RPM for Satellite 6.11.1 on RHEL7

INSTALL INSTRUCTIONS:

1. Take a complete backup or snapshot of Satellite 6.11.1 server

2. Download the hotfix RPM for Satellite 6.11.1 on RHEL7 attached to this BZ and copy it to Satellite server

3. # yum install ./tfm-rubygem-fog-vsphere-3.5.2-1.el7sat.noarch.rpm --disableplugin=foreman-protector

4. # satellite-maintain service restart

NOTE: This hotfix additionally contains the fix for https://bugzilla.redhat.com/show_bug.cgi?id=2101986

Comment 20 wclark 2022-07-27 20:17:27 UTC
Created attachment 1899802 [details]
Hotfix RPM for Satellite 6.11.1 on RHEL8

INSTALL INSTRUCTIONS:

1. Take a complete backup or snapshot of Satellite 6.11.1 server

2. Download the hotfix RPM for Satellite 6.11.1 on RHEL8 attached to this BZ and copy it to Satellite server

3. # dnf install ./rubygem-fog-vsphere-3.5.2-1.el8sat.noarch.rpm --disableplugin=foreman-protector

4. # satellite-maintain service restart

NOTE: This hotfix additionally contains the fix for https://bugzilla.redhat.com/show_bug.cgi?id=2101986

Comment 21 Lukáš Hellebrandt 2022-09-08 14:57:14 UTC
Verified with Sat 6.12 snap 9.0.

Successfully created a VM with two interfaces from template through Satellite on ESXI 7. The VM has been created, running, had correct interfaces and used the correct template. Also, the fix [0] is present in /usr/share/gems/gems/fog-vsphere-3.5.2/lib/fog/vsphere/requests/compute/vm_clone.rb .

[0] https://github.com/fog/fog-vsphere/pull/275/files

Comment 25 errata-xmlrpc 2022-11-16 13:33:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.12 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8506

Comment 26 Red Hat Bugzilla 2023-09-18 04:34:58 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days