Created attachment 1366312 [details] provision test script
Created attachment 1370497 [details] new messages in miq_provision_virt_workflow.rb from 5.8.2.3
https://github.com/ManageIQ/manageiq/pull/16702
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/9c94f30b232d09be49b6d0952bf25f0243573bb1 commit 9c94f30b232d09be49b6d0952bf25f0243573bb1 Author: Lucy Fu <lufu> AuthorDate: Wed Dec 20 09:50:58 2017 -0500 Commit: Lucy Fu <lufu> CommitDate: Thu Dec 21 10:29:51 2017 -0500 Fix allowed_vlans to call preload correctly. https://bugzilla.redhat.com/show_bug.cgi?id=1510069 app/models/miq_provision_virt_workflow.rb | 2 +- spec/models/miq_provision_virt_workflow_spec.rb | 6 ++++++ 2 files changed, 7 insertions(+), 1 deletion(-)
Because of the network latency between the 2 zones we are seeing the service provisioning process not getting started within the 10 minute window allocated for a task to complete. Since CFME is highly asynchronous in task management it has strict constraints about how long a task can run. And that timeout is 10 minutes. For longer running tasks we can make them asynchronous by exiting at apropos points called states in Automate Model. The states can be restarted at different times. So the suggestion is to force a retry in pre4/pre5 before we start the service provisioning. In the following instance CBTS-Public/Service/Provisioning/StateMachines/ServiceProvision_Template/VMware_Build_VMProvisionRequest Find an open slot before the provision state provision /Service/Provisioning/StateMachines/Methods/Provision I am guessing it would be pre4 or pre5 add set the value to METHOD::set_retry_once What this does is calls a method that triggers a restart of the state satisfying the task controllers from not terminating non responsive tasks. Add a ruby new method called set_retry_once in the class CBTS-Public/Service/Provisioning/StateMachines/ServiceProvision_Template the ruby method set_retry_once looks like # # Description: This method sets the retry once to force a break in the # processing of long running tasks. # $evm.log(:info, "Checking if retry needs to be set") if $evm.state_var_exist?('retry_once') ae_result = 'ok' else $evm.set_state_var('retry_once', '1') ae_result = 'retry' $evm.log(:info, 'setting a retry once in the beginning') end $evm.root['ae_result'] = ae_result $evm.root['retry_interval'] = 1.minute
Once the change has been made in the Automate Database you dont have to restart the servers, the Automate Model changes are picked up automatically at runtime.
I am not sure what the recommended way of doing this is. Can you check with Josh. One way I can think of is to turn off the Automate Role in the CIN Zone which will end up routing all the Automate work to CAR. It might overwhelm the CAR zone so it might have to be done during off peak hours. How did this get tested with Lucys fix, what was done to route the work to CAR zone during that testing
Since the work starts as a generic service it is not tied to any zone, just the automate role. As suggested above one possibility would be to disable the Automate Role in the other zone to force the work to the CAR zone. Maybe this is something the customer could attempt during off-hours to avoid performance issues.
Hi Michael, Any updates on this one from the customer where they able to test the suggestions. Thanks, Madhu
Moving to POST based on performance changes from Comment #19.