Bug 1657934 - [RFE] SCVMM Provision Failing With 600 Second Timeout CheckProvisioned State
Summary: [RFE] SCVMM Provision Failing With 600 Second Timeout CheckProvisioned State
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.9.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.12.0
Assignee: Adam Grare
QA Contact: Sudhir Mallamprabhakara
Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-10 18:51 UTC by myoder
Modified: 2019-12-05 18:40 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-05 18:40:43 UTC
Category: Bug
Cloudforms Team: CFME Core
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3747611 None None None 2018-12-10 19:47:46 UTC

Description myoder 2018-12-10 18:51:30 UTC
Description of problem:

Seeing this error during VM template provisions for SCVMM provider:

[----] E, [2018-12-07T10:17:57.997351 #7836:7f5a58] ERROR -- : Q-task_id([miq_provision_85]) <AEMethod check_provisioned> ProvisionCheck error <Error: timed out after 604.941330321 seconds.  Timeout threshold [600]>


We see the last time we pick up the automate task for the CheckProvisioned state retry (current retries at 8, and state started on 2018-12-07 10:06:57 UTC):

[----] I, [2018-12-07T10:17:47.235527 #7836:803108]  INFO -- : MIQ(MiqGenericWorker::Runner#get_message_via_drb) Message id: [431382], MiqWorker id: [1900], Zone: [default], Role: [automate], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [MiqAeEngine.deliver], Timeout: [3600], Priority: [100], State: [dequeue], Deliver On: [2018-12-07 10:17:41 UTC], Data: [], Args: [{:object_type=>"ManageIQ::Providers::Microsoft::InfraManager::Provision", :object_id=>85, :attrs=>{"request"=>"vm_provision"}, :instance_name=>"AUTOMATION", :user_id=>1, :miq_group_id=>2, :tenant_id=>1, :state=>"CheckProvisioned", :ae_fsm_started=>nil, :ae_state_started=>"2018-12-07 10:06:57 UTC", :ae_state_retries=>8, :ae_state_previous=>"---\n\"/ManageIQ/Infrastructure/VM/Provisioning/StateMachines/VMProvision_vm/template\":\n  ae_state: CheckProvisioned\n  ae_state_retries: 8\n  ae_state_started: 2018-12-07 10:06:57 UTC\n  ae_state_max_retries: 100\n"}], Dequeued in: [65.321968155] seconds



Last time we process the CheckProvisioned state and error out:

[----] I, [2018-12-07T10:17:52.422505 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) Processing State=[CheckProvisioned]
[----] I, [2018-12-07T10:17:52.422796 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) Following Relationship [miqaedb:/Infrastructure/VM/Provisioning/StateMachines/Methods/CheckProvisioned#create]
[----] I, [2018-12-07T10:17:54.732781 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) Updated namespace [miqaedb:/Infrastructure/VM/Provisioning/StateMachines/Methods/CheckProvisioned#create  ManageIQ/Infrastructure/VM/Provisionin
[----] I, [2018-12-07T10:17:57.127233 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) Updated namespace [Infrastructure/VM/Provisioning/StateMachines/Methods/check_provisioned  ManageIQ/Infrastructure/VM/Provisioning/StateMachines
[----] I, [2018-12-07T10:17:57.152331 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) Invoking [inline] method [/ManageIQ/Infrastructure/VM/Provisioning/StateMachines/Methods/check_provisioned] with inputs [{}]
[----] I, [2018-12-07T10:17:57.155485 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) <AEMethod [/ManageIQ/Infrastructure/VM/Provisioning/StateMachines/Methods/check_provisioned]> Starting 
[----] I, [2018-12-07T10:17:57.981266 #7836:7f5a58]  INFO -- : Q-task_id([miq_provision_85]) <AEMethod check_provisioned> ProvisionCheck returned <error> for state <finished> and status <Error>
[----] E, [2018-12-07T10:17:57.997351 #7836:7f5a58] ERROR -- : Q-task_id([miq_provision_85]) <AEMethod check_provisioned> ProvisionCheck error <Error: timed out after 604.941330321 seconds.  Timeout threshold [600]>
[----] I, [2018-12-07T10:17:58.041484 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) <AEMethod [/ManageIQ/Infrastructure/VM/Provisioning/StateMachines/Methods/check_provisioned]> Ending
[----] I, [2018-12-07T10:17:58.041787 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) Method exited with rc=MIQ_OK
[----] I, [2018-12-07T10:17:58.042656 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) Followed  Relationship [miqaedb:/Infrastructure/VM/Provisioning/StateMachines/Methods/CheckProvisioned#create]
[----] I, [2018-12-07T10:17:58.042942 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) Processed State=[CheckProvisioned] with Result=[error]
[----] W, [2018-12-07T10:17:58.043204 #7836:803108]  WARN -- : Q-task_id([miq_provision_85]) Error in State=[CheckProvisioned]
[----] I, [2018-12-07T10:17:58.043558 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) In State=[CheckProvisioned], invoking [on_error] method=[update_provision_status(status => 'Error Creating VM')]
[----] I, [2018-12-07T10:18:00.316995 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) Updated namespace [Infrastructure/VM/Provisioning/StateMachines/VMProvision_vm/update_provision_status  ManageIQ/Infrastructure/VM/Provisioning/
[----] I, [2018-12-07T10:18:00.347770 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) Invoking [inline] method [/ManageIQ/Infrastructure/VM/Provisioning/StateMachines/VMProvision_VM/update_provision_status] with inputs [{"status"=
[----] I, [2018-12-07T10:18:00.350763 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) <AEMethod [/ManageIQ/Infrastructure/VM/Provisioning/StateMachines/VMProvision_VM/update_provision_status]> Starting 
[----] E, [2018-12-07T10:18:01.783943 #7836:7f5a44] ERROR -- : Q-task_id([miq_provision_85]) <AEMethod update_provision_status> VM Provision Error: [] VM [scc009] Step [CheckProvisioned] Status [Error Creating VM] Message [Error: timed o
[----] I, [2018-12-07T10:18:01.805802 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) <AEMethod [/ManageIQ/Infrastructure/VM/Provisioning/StateMachines/VMProvision_VM/update_provision_status]> Ending
[----] I, [2018-12-07T10:18:01.805963 #7836:803108]  INFO -- : Q-task_id([miq_provision_85]) Method exited with rc=MIQ_OK



Seems like we are erroring out of 10 cumulative minutes in the CheckProvisioned state, rather than 10 minutes in the current CheckProvisioned state.

Version-Release number of selected component (if applicable):
CFME 5.9.5

How reproducible:
custom environment

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

There is a single zone 2 appliance POC.  We only have logs from one appliance, gathering logs from second appliance and will attach to the case.

Comment 3 Tina Fitzgerald 2018-12-10 20:29:13 UTC
Hi Yoder,

The customer supplied evm.log contains many authentication errors as show below.  
We need to resolve the authentication issue before looking at the provisioning error because they're likely directly related.

Let me know if you have any questions.

Thanks,
Tina 

----] I, [2018-12-07T03:27:41.384124 #7836:803108]  INFO -- : MIQ(MiqQueue#deliver) Message id: [423106], Delivering...
[----] E, [2018-12-07T03:27:41.645958 #7836:803108] ERROR -- : An error occurred while invoking remote tasks...Requeueing for 1 minute from now.
[----] E, [2018-12-07T03:27:41.646537 #7836:803108] ERROR -- : [RuntimeError]: Invalid System Authentication Token specified  Method:[block in method_missing]
[----] E, [2018-12-07T03:27:41.646987 #7836:803108] ERROR -- : /opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.0/lib/manageiq/api/client/connection.rb:108:in `check_response'
/opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.0/lib/manageiq/api/client/connection.rb:102:in `send_request'
/opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.0/lib/manageiq/api/client/connection.rb:24:in `get'
/opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.0/lib/manageiq/api/client/client.rb:37:in `load_definitions'
/opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.0/lib/manageiq/api/client/client.rb:62:in `reconnect'
/opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.0/lib/manageiq/api/client/client.rb:33:in `initialize'
/var/www/miq/vmdb/app/models/mixins/inter_region_api_method_relay.rb:65:in `new'
/var/www/miq/vmdb/app/models/mixins/inter_region_api_method_relay.rb:65:in `api_client_connection_for_region'
/var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:64:in `block in invoke_tasks_remote'
/var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:60:in `each'
/var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:60:in `invoke_tasks_remote'
/var/www/miq/vmdb/app/models/miq_queue.rb:451:in `block in dispatch_method'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout'
/var/www/miq/vmdb/app/models/miq_queue.rb:450:in `dispatch_method'
/var/www/miq/vmdb/app/models/miq_queue.rb:427:in `block in deliver'
/var/www/miq/vmdb/app/models/user.rb:254:in `with_user'
/var/www/miq/vmdb/app/models/user.rb:269:in `with_user_group'
/var/www/miq/vmdb/app/models/miq_queue.rb:427:in `deliver'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:104:in `deliver_queue_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loo

Comment 22 Adam Grare 2019-07-25 18:08:48 UTC
We should investigate using a playbook to provision a VM with ansible-runner.  This will allow us to start the provision and easily check back on the progress/status in a well structured way, and we already have a library (lib/ansible/runner) to help us out.

There are ansible modules for winrm and hyper-v that we can look at using (e.g. https://docs.ansible.com/ansible/latest/plugins/connection/winrm.html and https://github.com/tsailiming/ansible-hyperv).

This will be a good deal of refactoring but also the only "good" way to fix this IMO.


Note You need to log in before you can comment on or make changes to this bug.