Bug 1518058 - Provision Error "A specified parameter was not correct: spec. nicSettingMap.adapter.ip" under VMware after VM cloning from template.
Summary: Provision Error "A specified parameter was not correct: spec. nicSettingMap.a...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Automate
Version: 5.7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.10.0
Assignee: Tina Fitzgerald
QA Contact: Dmitry Misharov
URL:
Whiteboard:
: 1422903 (view as bug list)
Depends On:
Blocks: 1524617 1525551 1549629
TreeView+ depends on / blocked
 
Reported: 2017-11-28 04:38 UTC by Sachin
Modified: 2021-03-11 16:25 UTC (History)
13 users (show)

Fixed In Version: 5.10.0.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1524617 1525551 1549629 (view as bug list)
Environment:
Last Closed: 2018-06-21 20:43:12 UTC
Category: ---
Cloudforms Team: CFME Core
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Minimal reproducer domain and steps (4.52 KB, application/zip)
2018-01-04 14:18 UTC, Milan Falešník
no flags Details

Description Sachin 2017-11-28 04:38:23 UTC
Description of problem:

After the hotfix was applied in BZ#1481867, provisioning failed with below error:

~~~
Error: A specified parameter was not correct: spec.nicSettingMap.adapter.ip
~~~

In the automation log:
~~~
[----] I, [2017-11-21T14:48:00.445959 #13906:7137e8]  INFO -- : Q-task_id([miq_provision_30000000000020]) Validating Notification type: automate_user_error
[----] I, [2017-11-21T14:48:00.449901 #13906:7137e8]  INFO -- : Q-task_id([miq_provision_30000000000020]) Calling Create Notification type: automate_user_error subject type
: MiqRequest id: 30000000000015 options: {:message=>"VM Provision Error: [IMC4 VMware Mgt ST] VM [CLDST4LAP5002] Step [CheckProvisioned] Status [A specified parameter was n
ot correct: spec.nicSettingMap.adapter.ip] Message [Error: A specified parameter was not correct: spec.nicSettingMap.adapter.ip] "}
[----] I, [2017-11-21T14:48:00.657061 #13906:719134]  INFO -- : Q-task_id([miq_provision_30000000000020]) <AEMethod [/CPA/Infrastructure/VM/Provisioning/StateMachines/VMPro
vision_VM/update_provision_status]> Ending
[----] I, [2017-11-21T14:48:00.657248 #13906:719134]  INFO -- : Q-task_id([miq_provision_30000000000020]) Method exited with rc=MIQ_OK
~~~




Version-Release number of selected component (if applicable):

# rpm -qa|grep cfme
cfme-appliance-5.7.3.2-1.el7cf.x86_64
cfme-gemset-5.7.3.2-2.el7cf.x86_64
cfme-5.7.3.2-14.el7cf.x86_64
rh-ruby23-rubygem-redhat_access_cfme-1.1.0-1.el7cf.noarch


How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

CU have custom datastore involved. This works well on dev environment where HF is not applied. with exactly same code as mentioned by CU

Comment 5 Greg McCullough 2017-11-28 12:59:39 UTC
Sachin, along with the automate domains can we get logs showing the "Config spec:" and "Customization spec:" and "Prov Options:" logging for the failed provisioning task.

This logging will show exactly what is being passed to VMware.

Comment 6 Andes Chau 2017-11-29 06:51:32 UTC
Hello Sachin and Greg

The logs have uploaded.

Andes

Comment 9 Tina Fitzgerald 2017-11-30 18:52:44 UTC
Hi Sachin,

As Greg mentioned in comment 8, the customer log shows the CPA/Infrastructure/VM/Provisioning/StateMachines/Methods/setup_networking custom Automate method with the log messages:

[----] I, [2017-11-29T14:23:07.577945 #1813:e94e7c]  INFO -- : Q-task_id([miq_provision_30000000000036]) <AEMethod setup_networking> vm_provisioning_setup_network - Selected primary network: Non-Prod-TN|Non-Prod-AP|Green-2314-IMC3-EPG
[----] I, [2017-11-29T14:23:07.579135 #1813:e94e7c]  INFO -- : Q-task_id([miq_provision_30000000000036]) <AEMethod setup_networking> vm_provisioning_setup_network - Adding network with idx: 0, network: Non-Prod-TN|Non-Prod-AP|Green-2314-IMC3-EPG, deviceType: VirtualVmxnet3, ipAddress: , subnetMask: , gateway: , dvs: true

The above log lines are generated from the custom setup_networking method code snippet:

    @task.set_option(:network_usage, "Data")
    $evm.instantiate '/Integration/IPAM/FusionLayerInfinity/StateMachines/IpReserve?suffix=primary'
    # TODO add friendly error message
    exit MIQ_ABORT unless $evm.root['ae_result'] == 'ok'
    ipAddress = @task.get_option :ipam_fusionlayer_infinity_ip_address_primary
    subnetMask = @task.get_option :network_subnet_mask
    gateway = @task.get_option :network_gateway
    idx = 0
    network = selectNetwork zone, dataCenter, env, usage
>>>>>>    log :info, "Selected primary network: #{network}"
    addNetwork idx, network, deviceType, ipAddress, subnetMask, gateway

.....
   def addNetwork(idx, network, deviceType, ipAddress, subnetMask, gateway)
>>>>>>     log :info, "Adding network with idx: #{idx}, network: #{network}, 
     deviceType: #{deviceType}, ipAddress: #{ipAddress}, subnetMask: #
     {subnetMask}, gateway: #{gateway}, dvs: #{@is_dvs}"
     @task.set_network_adapter idx, {network: network, devicetype: deviceType, 
     is_dvs: @is_dvs}
     @task.set_nic_settings idx, {ip_addr: ipAddress, subnet_mask: subnetMask, 
     gateway: gateway, addr_mode: ['static', 'Static']}
   end

I have some questions/concerns about the $evm.instantiate
'/Integration/IPAM/FusionLayerInfinity/StateMachines/IpReserve?suffix=primary'

The IpReserve state machine instance calls the network_query method on_entry, then calls the ip_reserve method as shown below:

IpReserve state machine instance YAML
....
 - step1:
      value: method::ip_reserve
      on_entry: network_query

There are log messages in both these custom methods that I don't see in the attached logs. Both method have this line immediately following the begin:

  if ($evm.root['vmdb_object_type'] == 'miq_provision') || ($evm.root['vmdb_object_type'] == 'service_template_provision_task')

And there is no else condition which means the methods would do nothing if the above condition was false.

Can you ask the customer:
1. Modify both these Automate methods, adding an else condition with a log message stating that the code is not being executed.
2. Add a log message right after the above condition to say that it is executing the code.
3. Rerun the provision, supplying the automation.log and evm.log?

Let me know if you have any questions.

Thanks,
Tina

Comment 10 Tina Fitzgerald 2017-11-30 20:07:05 UTC
Please ask the customer to add $evm.root['vmdb_object_type'] to the log messages in steps 1 and 2 above.

Comment 11 Tina Fitzgerald 2017-11-30 20:07:17 UTC
Please ask the customer to add $evm.root['vmdb_object_type'] to the log messages in steps 1 and 2 above.

Comment 14 Tina Fitzgerald 2017-12-04 18:48:27 UTC
Update from cf-prio email thread
...

On Tue, 5 Dec 2017 at 12:52 AM, Sadique Puthen <sputhenp> wrote:
On Tue, Dec 5, 2017 at 12:37 AM, Tina Fitzgerald <tfitzger> wrote:
Hi Sadique,

Thank you for setting up the session.

What is the difference in versions/fixes between when provision worked properly and when the customer encountered this issue?

 Andes will be able to provide exact version details tomorrow morning HKT. This code works in the production CFME 5.7.1, but not in Staging 5.7.3.2-14 where the latter has a higher version of rest client. 

working:
cfme-5.7.1.3-1
[root@hkgx3ecf5001 ~]# gem list | grep rest-client
rest-client (2.0.1)

Not working: Staging CFME -  cfme-5.7.3.2-14
[root@hkgx4ecf0010 log]# gem list | grep rest-client
rest-client (2.0.2)

Can you ask the customer to send us a set of logs from a successful provision?

Andes,  

Can you provide this?

How much longer will you be available today?

Already 1:00 AM here. Don't think going to remain online for more time.

Regards,


Thanks,
Tina

Comment 15 Tina Fitzgerald 2017-12-04 18:50:30 UTC
We have a session setup with customer for Dec 5 and are waiting for the customer to provide a set logs to show a successful provisioning.

Comment 16 Tina Fitzgerald 2017-12-04 18:52:37 UTC
Andes validated the provided versions.

---------- Forwarded message ----------
From: Andes Chau <andeschau>
Date: Mon, Dec 4, 2017 at 11:54 AM
Subject: Re: [cf-prio] [Cathay Pacific Airways][SEV2][BZ:1518058][Automate] Error: A specified parameter was not correct: spec.nicSettingMap.adapter.ip
To: Sadique Puthen <sputhenp>
Cc: Csaba Eber <ceber>, Daniel Wessels <daniel.wessels>, Greg McCullough <gmccullo>, Madhu Kanoor <mkanoor>, Peggy Li <peli>, Sachin Patil <psachin>, Tina Fitzgerald <tfitzger>, cf-prio <cf-prio>


Hello Sadique,

Your provided version is correct.

Andes

Comment 17 Tina Fitzgerald 2017-12-04 21:55:07 UTC
Hi Sadique,

In addition to our current request for a set of logs from the production environment showing a successful provision, can you ask them to send us the production environment Automate datastore?

Thanks,
Tina

Comment 18 Sadique Puthen 2017-12-05 05:22:52 UTC
(In reply to Tina Fitzgerald from comment #17)
> Hi Sadique,
> 
> In addition to our current request for a set of logs from the production
> environment showing a successful provision, can you ask them to send us the
> production environment Automate datastore?
> 
> Thanks,
> Tina

All the requested logs are on collab-shell.

/cases/01977736

Datastore: evm_current_hkgx3ecf5001.nwow001.corp.ete.cathaypacific.com_20171205_091110.tar.

Log:

/cases/01977736/x-text/*

Comment 20 Sadique Puthen 2017-12-06 06:44:12 UTC
I was not able to follow up the call completely. I heard from Andes that we managed to find a solution for this during the call and provisioning is working after that.

I would appreciate if we can add details on whether the problem was with code or product and how we managed to solve it.

Comment 21 Tina Fitzgerald 2017-12-06 19:35:38 UTC
---------- Forwarded message ----------
From: Tina Fitzgerald <tfitzger>
Date: Wed, Dec 6, 2017 at 12:36 PM
Subject: Re: [cf-prio] [Cathay Pacific Airways][SEV2][BZ:1518058][Automate] Error: A specified parameter was not correct: spec.nicSettingMap.adapter.ip
To: Daniel Wessels <daniel.wessels>
Cc: Sachin Patil <psachin>, Andes Chau <andeschau>, Sadique Puthen <sputhenp>, Csaba Eber <ceber>, Greg McCullough <gmccullo>, Madhu Kanoor <mkanoor>, Peggy Li <peli>, cf-prio <cf-prio>, rpai


Hi Sachin,
The root cause is that we don't support imported/exported passwords between different environments and the encryption key was different between the environments.
The decryption failure stopped the current leg of resolution without logging the failure. I'll update the BZ for this issue with the ticket details for adding the log message.
We noticed the /Integration/IPAM/FusionLayerInfinity/StateMachines class schema contains attributes and states. We strongly advise against defining attributes in State Machine classes.
Many customer implementations make use of a "config" Automate domain where all passwords and other configuration settings are stored. The separate domain makes it easier to identify the password fields to modify for each environment.  The customer might want to investigate modifying the Automate model to move all passwords to a separate domain.
We're working on a new Automate enhancement that would enable substitution to access objects anywhere in the workspace. This enhancement would allow the customer to resolve/instantiate a config domain instance(containing password information) once, and be able to reference it from anywhere in the resolution path. We currently support substitution only within the same leg of resolution. 
Thanks,
Tina

Comment 22 Tina Fitzgerald 2017-12-06 19:53:25 UTC
Decryption error logging:
https://bugzilla.redhat.com/show_bug.cgi?id=1422903

Ticket for Substitution Enhancement:
https://bugzilla.redhat.com/show_bug.cgi?id=1350560

Comment 24 CFME Bot 2017-12-11 14:05:11 UTC
New commit detected on ManageIQ/manageiq-automation_engine/master:
https://github.com/ManageIQ/manageiq-automation_engine/commit/f51eb01f88f6e8173be0a1ebbc6a3e9f3490ccd2

commit f51eb01f88f6e8173be0a1ebbc6a3e9f3490ccd2
Author:     mkanoor <mkanoor>
AuthorDate: Wed Dec 6 15:17:39 2017 -0500
Commit:     mkanoor <mkanoor>
CommitDate: Wed Dec 6 16:21:32 2017 -0500

    Log encryption failures
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1518058
    
    Customers might export/import models between different environments.
    There typically is a different key for each environment. If the export
    deck has encrypted fields, they are imported into the new environment
    unchanged. This field cannot be decrypted at runtime in the Automate
    engine, because of key mismatch.
    During $evm.instantiate we were not logging any error messages and
    customer has a tough time debugging the issue.
    
    This PR logs the encryption error.

 .../engine/miq_ae_engine/miq_ae_object.rb               | 10 +++++++++-
 .../engine/miq_ae_method_service/miq_ae_service.rb      |  1 +
 spec/miq_ae_object_spec.rb                              | 17 +++++++++++++++++
 3 files changed, 27 insertions(+), 1 deletion(-)

Comment 27 Tina Fitzgerald 2017-12-11 18:33:04 UTC
Hi Satoe,

Yes, thanks. It is okay to backport the fix to the Fine/Euwe branches.

Thanks,
Tina

Comment 29 mkanoor 2018-01-03 21:22:37 UTC
*** Bug 1422903 has been marked as a duplicate of this bug. ***

Comment 30 Milan Falešník 2018-01-04 14:18:02 UTC
Created attachment 1376986 [details]
Minimal reproducer domain and steps

Import the domain, make sure it is enabled, in console, make sure to do:
tail -fn0 log/automation.log | egrep -i 'TEST|ERROR'

In Simulation, put this into Request (assuming all other fields are default): BZ1518058

Simulate and watch the tail.

This bug should be verified, if you see something like this in the output:
ERROR -- : Error decrypting password can not decrypt v2_key encrypted string. Possible cause: Password value was encrypted with a different encryption key


Note You need to log in before you can comment on or make changes to this bug.