Bug 1692892 - [v2v][OSP] Migration stuck in refresh inventory state
Summary: [v2v][OSP] Migration stuck in refresh inventory state
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: V2V
Version: 5.10.2
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: GA
: 5.11.0
Assignee: Marek Aufart
QA Contact: Yadnyawalk Tale
Red Hat CloudForms Documentation
URL:
Whiteboard: v2v
Depends On: 1743438
Blocks: 1694190 1712049 1734338
TreeView+ depends on / blocked
 
Reported: 2019-03-26 16:12 UTC by Yadnyawalk Tale
Modified: 2019-12-13 14:57 UTC (History)
10 users (show)

Fixed In Version: 5.11.0.23
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1694190 (view as bug list)
Environment:
Last Closed: 2019-12-13 14:57:13 UTC
Category: Bug
Cloudforms Team: V2V
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
expected_throttling.png (96.25 KB, image/png)
2019-07-29 12:54 UTC, Yadnyawalk Tale
no flags Details
Plan has no mapping (116.52 KB, image/png)
2019-08-20 07:13 UTC, Fabien Dupont
no flags Details

Description Yadnyawalk Tale 2019-03-26 16:12:53 UTC
Description of problem:
Migration stuck at refresh inventory state when we migrate via SSH transformation method


Version-Release number of selected component (if applicable):
CFME 5.10.2.2.20190319204241_4304a7d
OSP 13
Vmware 65-nested and 67-baremetal
(Tried with QE's env setup)


How reproducible:
100%


Steps to Reproduce:
1. Migrate vm with SSH transformation method


Actual results:
Migration will stuck with refresh inventory state, if you refresh inventory manually migration should pass or it should fail with timeout error in next 30min. (failed 48min in my case) 


Expected results:
Migration should pass without manual inventory refresh.


Additional info:

Comment 3 Fabien Dupont 2019-03-26 16:18:01 UTC
I confirm the same problem when using VDDK. This only happens when migrating to OpenStack, not RHV.

@marek, any idea of what may have changed since https://github.com/ManageIQ/manageiq-providers-openstack/pull/433 ?

Comment 5 Fabien Dupont 2019-03-28 11:24:45 UTC
I have done some extra tests in our lab to try to identify the root cause. I have used two providers : VMware and OpenStack, to compare the behaviours provider-wise. I have also used to CloudForms appliances: 5.10.1.1 and 5.10.2.2, to compare the behaviours CloudForms-wise. Both CloudForms appliances are connected to the two providers.

As the current investigations led me to think the problem was linked to OpenStack events, I did the following manipulations directly from the providers and looked at CloudForms logs:

1. Create an instance on OpenStack
  => The instance succesfully starts on OpenStack
  => CFME 5.10.1.1 catches the instance.create event and runs a refresh
  => CFME 5.10.2.2 doesn't catch any event within the 5 minutes following the instance launch
  => CFME 5.10.2.2 find the instance after a manual refresh of OpenStack provider

2. Start a powered off VM on VMware
  => The VM successfully starts on VMware
  => CFME 5.10.1.1 catches the VM power on event
  => CFME 5.10.2.2 catches the VM power on event

From these tests, I'm keen to think that the problem is that CloudForms is not able to catch the OpenStack provider events.
This has an impact on V2V, but I doubt it is due to V2V.

Comment 6 Nenad Peric 2019-03-28 11:52:45 UTC
Just to add one more version with the same problem. 
I tried migrating 20 VMs to OSP, using CFME 5.10.2.1 (cfme-rhos-5.10.2.1-1.x86_64.qcow2) and am hitting the same issue. 
Will start downgrading CFME to see which is the first version I manage to get the migration completed. 

A dangerous effect of this failure is that the environment ended up with the same VMs powered on both on VMware AND OpenStack.

Comment 9 CFME Bot 2019-03-29 15:47:43 UTC
New commit detected on ManageIQ/manageiq-providers-openstack/master:

https://github.com/ManageIQ/manageiq-providers-openstack/commit/3091bbdf8ff21124cff04b3c0730f4be89ec8a0e
commit 3091bbdf8ff21124cff04b3c0730f4be89ec8a0e
Author:     Marek Aufart <maufart>
AuthorDate: Thu Mar 28 11:36:29 2019 -0400
Commit:     Marek Aufart <maufart>
CommitDate: Thu Mar 28 11:36:29 2019 -0400

    Parse OpenStack Event Timestamps as UTC

    Timestamps returned in Events from Ceilometer/Panko service doesn't
    contain Timezone information, but OpenStack expects it to use UTC [1].

    It should be parsed as UTC to ensure it will not be considered as MIQ machine
    local Timezone which could break time-based OpenStack Event queries.

    Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1692892

    [1] https://specs.openstack.org/openstack/api-wg/guidelines/time.html

 lib/manageiq/providers/openstack/legacy/events/openstack_ceilometer_event_monitor.rb | 10 +-
 1 file changed, 7 insertions(+), 3 deletions(-)

Comment 11 Yadnyawalk Tale 2019-07-29 12:54:30 UTC
Created attachment 1594259 [details]
expected_throttling.png

Comment 12 Shveta 2019-08-20 05:41:12 UTC
Migration is still failing with refresh inventory issue with both VDDK and SSH mode.
Appliance : https://10.8.197.205
Plan -  plan_HVIt80G6dT

Comment 13 Fabien Dupont 2019-08-20 07:13:41 UTC
Created attachment 1605998 [details]
Plan has no mapping

I connected to the appliance and the migration plan "plan_HVIt80G6dT" has an error message: "Infrastructure mapping doesn't exist" (see screenshot).

So, I checked using Rails console and it's confirmed:

irb> ServiceTemplateTransformationPlan.find_by(:name => 'plan_HVIt80G6dT').transformation_mapping
=> nil

And the automation.log is consistent with that error:

----] I, [2019-08-20T01:29:47.449163 #7863:2ad4283545b8]  INFO -- : Q-task_id([r1_service_template_transformation_plan_task_1]) <AEMethod [/ManageIQ/Transformation/Infrastructure/VM/Common/CheckVmInInventory]> Starting 
[----] E, [2019-08-20T01:29:47.901814 #7863:2ad4360959b8] ERROR -- : The following error occurred during instance method <destination_ems> for AR object <#<ServiceTemplateTransformationPlanTask id: 1, description: "Transforming VM [test-v2v-a-cdf6]", state: "migrat
e", request_type: "transformation_plan", userid: "admin", options: {:dialog=>nil, :workflow_settings=>{:resource_action_id=>44}, :initiator=>nil, :src_id=>1, :request_options=>{:submit_workflow=>true, :init_defaults=>false}, :cart_state=>"ordered", :requester_group
=>"EvmGroup-super_administrator", :executed_on_servers=>[1], :delivered_on=>2019-08-20 04:58:47 UTC, :infra_conversion_job_id=>1, :progress=>{"current_state"=>"/State11/State8", "current_description"=>"Refresh inventory", "percent"=>96.34, "states"=>{"/State5"=>{"s
tatus"=>"finished", "weight"=>1, "description"=>"Waiting for PreflightCheck", "message"=>"State5 is finished.", "started_on"=>2019-08-20 04:59:20 UTC, "percent"=>100.0, "updated_on"=>2019-08-20 04:59:21 UTC}, "/State8"=>{"status"=>"finished", "weight"=>2, "descript
ion"=>"Pre Transform VM", "message"=>"State8 is finished.", "started_on"=>2019-08-20 04:59:18 UTC, "percent"=>100.0, "updated_on"=>2019-08-20 04:59:57 UTC}, "/State8/State2"=>{"status"=>"finished", "weight"=>30, "description"=>"Run pre-migration playbook", "message
"=>"State2 is finished.", "started_on"=>2019-08-20 04:59:18 UTC, "percent"=>100.0, "updated_on"=>2019-08-20 04:59:22 UTC}, "/State2"=>{"status"=>"finished", "weight"=>20, "description"=>"Launch pre migration playbook", "message"=>"State2 is finished.", "started_on"
=>2019-08-20 04:59:19 UTC, "percent"=>100.0, "updated_on"=>2019-08-20 04:59:20 UTC}, "/State8/State5"=>{"status"=>"finished", "weight"=>40, "description"=>"Power off", "message"=>"State5 is finished.", "started_on"=>2019-08-20 04:59:22 UTC, "percent"=>100.0, "updat
ed_on"=>2019-08-20 04:59:55 UTC}, "/State8/State8"=>{"status"=>"finished", "weight"=>40, "description"=>"Collapse Snapshots", "message"=>"State8 is finished.", "started_on"=>2019-08-20 04:59:55 UTC, "percent"=>100.0, "updated_on"=>2019-08-20 04:59:57 UTC}, "/State1
1"=>{"status"=>"active", "weight"=>85, "description"=>"Transform VM", "message"=>"State11 is not finished yet [225/0 retries].", "started_on"=>2019-08-20 04:59:58 UTC, "percent"=>Infinity, "updated_on"=>2019-08-20 05:29:27 UTC}, "/State11/State2"=>{"status"=>"finis
hed", "weight"=>5, "description"=>"Convert disks", "message"=>"State2 is finished.", "started_on"=>2019-08-20 04:59:58 UTC, "percent"=>100.0, "updated_on"=>2019-08-20 05:00:00 UTC}, "/State11/State5"=>{"status"=>"finished", "weight"=>80, "description"=>"Convert dis
ks", "message"=>"Disks transformation succeeded.", "started_on"=>2019-08-20 05:00:01 UTC, "percent"=>100.0, "updated_on"=>2019-08-20 05:24:31 UTC}, "/State11/State8"=>{"status"=>"active", "weight"=>15, "description"=>"Refresh inventory", "message"=>"State8 is not f
inished yet [14/200 retries].", "started_on"=>2019-08-20 05:24:31 UTC, "percent"=>7.000000000000001, "updated_on"=>2019-08-20 05:29:26 UTC}}}, :conversion_host_name=>"auto-conv-instance2", :source_vm_power_state=>"on", :virtv2v_disks=>[{:path=>"[NFS_Datastore_1] te
st-v2v-a-cdf6/test-v2v-a-cdf6.vmdk", :size=>8589934592, :percent=>100, :weight=>100.0}], :network_mappings=>[{:source=>"VM Network", :destination=>"fff214ec-87eb-4814-992d-75ffcc6115dd", :mac_address=>"00:50:56:be:b6:61"}], :virtv2v_wrapper=>{"wrapper_log"=>"/var/l
og/virt-v2v/v2v-import-20190820T010028-11079-wrapper.log", "v2v_log"=>"/var/log/virt-v2v/v2v-import-20190820T010028-11079.log", "state_file"=>"/tmp/v2v-import-20190820T010028-11079.state", "throttling_file"=>"/tmp/v2v-import-20190820T010028-11079.throttle"}, :virtv
2v_started_on=>"2019-08-20 04:59:59", :virtv2v_status=>"succeeded", :virtv2v_pid=>11095, :virtv2v_finished_on=>"2019-08-20 05:24:28"}, created_on: "2019-08-20 04:58:47", updated_on: "2019-08-20 05:29:27", message: "Migrating", status: "Ok", type: "ServiceTemplateTr
ansformationPlanTask", miq_request_id: 1, source_id: 63, source_type: "VmOrTemplate", destination_id: nil, destination_type: nil, miq_request_task_id: nil, phase: nil, phase_context: {}, tenant_id: 1, cancelation_status: nil, conversion_host_id: 6>>
[----] E, [2019-08-20T01:29:47.902148 #7863:2ad4360959b8] ERROR -- : MiqAeServiceModelBase.ar_method raised: <RuntimeError>: <[test-v2v-a-cdf6] Cluster  has no mapping.>
[----] E, [2019-08-20T01:29:47.902247 #7863:2ad4360959b8] ERROR -- : /var/www/miq/vmdb/app/models/service_template_transformation_plan_task.rb:68:in `destination_cluster'
/var/www/miq/vmdb/app/models/service_template_transformation_plan_task.rb:77:in `destination_ems'
/opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:333:in `public_send'
/opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:333:in `block in object_send'
/opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:352:in `ar_method'
/opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:362:in `ar_method'
/opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:331:in `object_send'
/opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:182:in `block (3 levels) in expose'
/var/www/miq/vmdb/app/models/user.rb:283:in `with_user'
/opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:182:in `block (2 levels) in expose'
/usr/share/ruby/drb/drb.rb:1630:in `perform_without_block'
/usr/share/ruby/drb/drb.rb:1590:in `perform'
/usr/share/ruby/drb/drb.rb:1674:in `block (2 levels) in main_loop'
/usr/share/ruby/drb/drb.rb:1670:in `loop'
/usr/share/ruby/drb/drb.rb:1670:in `block in main_loop'
[----] E, [2019-08-20T01:29:47.906632 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) <AEMethod checkvmininventory> The following error occurred during method evaluation:
[----] E, [2019-08-20T01:29:47.907095 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) <AEMethod checkvmininventory>   RuntimeError: [test-v2v-a-cdf6] Cluster  has no mapping.
[----] E, [2019-08-20T01:29:47.908134 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) <AEMethod checkvmininventory>   (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /var/www/miq/vmdb/app/models/service_template_transformation_plan_task.rb:68:in `destination_cluster'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /var/www/miq/vmdb/app/models/service_template_transformation_plan_task.rb:77:in `destination_ems'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:333:in `public_send'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:333:in `block in object_send'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:352:in `ar_method'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:362:in `ar_method'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:331:in `object_send'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:182:in `block (3 levels) in expose'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /var/www/miq/vmdb/app/models/user.rb:283:in `with_user'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:182:in `block (2 levels) in expose'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /usr/share/ruby/drb/drb.rb:1630:in `perform_without_block'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /usr/share/ruby/drb/drb.rb:1590:in `perform'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /usr/share/ruby/drb/drb.rb:1674:in `block (2 levels) in main_loop'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /usr/share/ruby/drb/drb.rb:1670:in `loop'
(drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /usr/share/ruby/drb/drb.rb:1670:in `block in main_loop'
/ManageIQ/Transformation/Infrastructure/VM/Common/CheckVmInInventory:15:in `main'
[----] E, [2019-08-20T01:29:47.911973 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR: (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /var/www/miq/vmdb/app/models/service_template_transformation_plan_task.rb:68:in `destination_cluster': [test-v2v-a-cdf6] Cluster  has no mapping. (RuntimeError)
[----] E, [2019-08-20T01:29:47.912463 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /var/www/miq/vmdb/app/models/service_template_transformation_plan_task.rb:77:in `destination_ems'
[----] E, [2019-08-20T01:29:47.912821 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:333:in `public_send'
[----] E, [2019-08-20T01:29:47.913201 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:333:in `block in object_send'
[----] E, [2019-08-20T01:29:47.913573 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:352:in `ar_method'
[----] E, [2019-08-20T01:29:47.914040 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:362:in `ar_method'
[----] E, [2019-08-20T01:29:47.914446 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:331:in `object_send'
[----] E, [2019-08-20T01:29:47.914820 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:182:in `block (3 levels) in expose'
[----] E, [2019-08-20T01:29:47.915223 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /var/www/miq/vmdb/app/models/user.rb:283:in `with_user'
[----] E, [2019-08-20T01:29:47.915705 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /opt/rh/cfme-gemset/bundler/gems/cfme-automation_engine-960830d3f106/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:182:in `block (2 levels) in expose'
[----] E, [2019-08-20T01:29:47.916200 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /usr/share/ruby/drb/drb.rb:1630:in `perform_without_block'
[----] E, [2019-08-20T01:29:47.916731 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /usr/share/ruby/drb/drb.rb:1590:in `perform'
[----] E, [2019-08-20T01:29:47.917101 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /usr/share/ruby/drb/drb.rb:1674:in `block (2 levels) in main_loop'
[----] E, [2019-08-20T01:29:47.917473 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /usr/share/ruby/drb/drb.rb:1670:in `loop'
[----] E, [2019-08-20T01:29:47.917846 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from (drbunix:///tmp/automation_engine20190820-7863-17ch7hr) /usr/share/ruby/drb/drb.rb:1670:in `block in main_loop'
[----] E, [2019-08-20T01:29:47.918399 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from /ManageIQ/Transformation/Infrastructure/VM/Common/CheckVmInInventory:15:in `main'
[----] E, [2019-08-20T01:29:47.918916 #7863:2ad4350689c8] ERROR -- : Q-task_id([r1_service_template_transformation_plan_task_1]) Method STDERR:         from /ManageIQ/Transformation/Infrastructure/VM/Common/CheckVmInInventory:34:in `<main>'
[----] I, [2019-08-20T01:29:47.939016 #7863:2ad4283545b8]  INFO -- : Q-task_id([r1_service_template_transformation_plan_task_1]) <AEMethod [/ManageIQ/Transformation/Infrastructure/VM/Common/CheckVmInInventory]> Ending

Comment 14 Shveta 2019-08-20 14:44:23 UTC
Fabien , 

The mapping was deleted after the test failed by some automation.
Please check  plan_x68MIiG24k which has mapping and failed due at refresh inventory state.

Thanks, 
Shveta

Comment 15 Fabien Dupont 2019-08-20 20:19:47 UTC
Thanks @shveta. Indeed it failed with a timeout on inventory refresh. Can you reset the environment ? I mean remove the instance in OSP and recreate the migration plans. I'd like to troubleshoot it live. So, I'll start the plan myself.

Comment 16 Shveta 2019-08-20 20:35:48 UTC
You can start the plan 'test_plan' at  https://10.8.197.205.

Comment 21 Fabien Dupont 2019-08-28 15:12:44 UTC
What I can see on the reproducer appliance.
When I connected, migration was still waiting for inventory refresh for 'test-v2v-a-262h' and VMDB was consistent with that observation:

irb> Vm.find_by(:name => 'test-v2v-a-262h', :vendor => 'openstack')
PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0
=> nil

irb> Vm.where(:name => 'test-v2v-a-262h').length
=> 1

When looking at evm.log, there were a very high number of events from the OpenStack provider.
I manually refreshed the provider from the UI and in a few seconds the VM was in the inventory and the migration resumed and succeeded. The VMDB was consistent:

irb> Vm.find_by(:name => 'test-v2v-a-262h', :vendor => 'openstack')
=> #<ManageIQ::Providers::Openstack::CloudManager::Vm id: 88, vendor: "openstack", format: nil, version: nil, name: "test-v2v-a-262h", description: nil, location: "unknown", config_xml: nil, autostart: nil, host_id: nil, last_sync_on: nil, created_on: "2019-08-28 15:02:36", updated_on: "2019-08-28 15:03:08", storage_id: nil, guid: "e5759b2a-e055-4bd0-a22b-7fccff4fb23d", ems_id: 37, last_scan_on: nil, last_scan_attempt_on: nil, uid_ems: "105e4f5c-418c-41c4-8e8f-1474a141f4e2", retires_on: nil, retired: nil, boot_time: nil, tools_status: nil, standby_action: nil, power_state: "on", state_changed_on: "2019-08-28 15:02:36", previous_state: nil, connection_state: "connected", last_perf_capture_on: nil, registered: nil, busy: nil, smart: nil, memory_reserve: nil, memory_reserve_expand: nil, memory_limit: nil, memory_shares: nil, memory_shares_level: nil, cpu_reserve: nil, cpu_reserve_expand: nil, cpu_limit: nil, cpu_shares: nil, cpu_shares_level: nil, cpu_affinity: nil, ems_created_on: nil, template: false, evm_owner_id: nil, ems_ref_obj: nil, miq_group_id: 1, linked_clone: nil, fault_tolerance: nil, type: "ManageIQ::Providers::Openstack::CloudManager::Vm", ems_ref: "105e4f5c-418c-41c4-8e8f-1474a141f4e2", ems_cluster_id: nil, retirement_warn: nil, retirement_last_warn: nil, vnc_port: nil, flavor_id: 4, availability_zone_id: 1, cloud: true, retirement_state: nil, cloud_network_id: nil, cloud_subnet_id: nil, cloud_tenant_id: 1, raw_power_state: "ACTIVE", publicly_available: nil, orchestration_stack_id: nil, retirement_requester: nil, tenant_id: 1, resource_group_id: nil, deprecated: nil, storage_profile_id: nil, cpu_hot_add_enabled: nil, cpu_hot_remove_enabled: nil, memory_hot_add_enabled: nil, memory_hot_add_limit: nil, memory_hot_add_increment: nil, hostname: nil>

irb> Vm.where(:name => 'test-v2v-a-262h').length
=> 2

What I had already seen in MigEng lab is that adding an OpenStack provider generates an event storm, because all events from the history are processed by CloudForms.
The inventory refresh generated by an event seems to be queued with all the other event actions. So, it will only happen once the event storm is finished, while a manual refresh will be queued in another queue and processed faster.

@marek, is my assumption correct ? In MigEng lab, I simply waited for event storm to finish before launching my migrations. Could QE monitor the events queue to postpone the migration plan run ?

Comment 22 Marek Aufart 2019-08-28 15:53:04 UTC
@Fabien, you are right! The events catch-up time on current environment looks to take even more than hour. I discussed with Loic, that it would be okay add an option to Settings which will allow skip events from time before OpenStack was added as provider to CF. QE can set this option as part of scenario. Would that work?

Comment 23 Fabien Dupont 2019-08-28 16:47:23 UTC
@Marek, that would be a great option, also for demos ;)

Comment 28 Shveta 2019-08-28 18:00:28 UTC
(In reply to Marek Aufart from comment #22)
> @Fabien, you are right! The events catch-up time on current environment
> looks to take even more than hour. I discussed with Loic, that it would be
> okay add an option to Settings which will allow skip events from time before
> OpenStack was added as provider to CF. QE can set this option as part of
> scenario. Would that work?

Marek , what if we set this option by default in settings,  means CFME comes with this option set to True ?

Comment 29 Marek Aufart 2019-08-30 09:45:12 UTC
This option is (for consistency with original behaviour) called :event_skip_history, which is false by default. Setting it to true will start loading events that were generated after OpenStack was added as provider to CF (so skip all old events).

Note: this will work only for providers added _after_ the :event_skip_history option was set to true. If there was existing provider with Events before, its Event timeline will continue from point where it stopped last time.

Comment 30 CFME Bot 2019-08-30 16:52:56 UTC
New commit detected on ManageIQ/manageiq-providers-openstack/master:

https://github.com/ManageIQ/manageiq-providers-openstack/commit/d2f8b96ee4d3fd2f2c411d0bba8d4ae4d6efc16d
commit d2f8b96ee4d3fd2f2c411d0bba8d4ae4d6efc16d
Author:     Marek Aufart <maufart>
AuthorDate: Wed Aug 28 13:10:06 2019 -0400
Commit:     Marek Aufart <maufart>
CommitDate: Wed Aug 28 13:10:06 2019 -0400

    Allow skip historical Events from OpenStack

    Multiple issues related to not processing of new events appeared, partially
    caused by slow processing of historical Events from OpenStack. Adding settings option
    :event_skip_history to allow skip all events generated in OpenStack before OpenStack
    was added as provider to ManageIQ.

    Event backread was decreased from 15 to 5 seconds to speedup catchup of OpenStack Events.

    Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1692892

 config/settings.yml | 3 +-
 lib/manageiq/providers/openstack/legacy/events/openstack_ceilometer_event_monitor.rb | 8 +-
 spec/legacy/events/openstack_ceilometer_event_monitor_spec.rb | 40 +
 3 files changed, 49 insertions(+), 2 deletions(-)

Comment 31 CFME Bot 2019-08-30 23:42:13 UTC
New commit detected on ManageIQ/manageiq-providers-openstack/ivanchuk:

https://github.com/ManageIQ/manageiq-providers-openstack/commit/b2b97186890f65509354105ea90c09f09eb55cdd
commit b2b97186890f65509354105ea90c09f09eb55cdd
Author:     Ladislav Smola <lsmola>
AuthorDate: Fri Aug 30 12:52:11 2019 -0400
Commit:     Ladislav Smola <lsmola>
CommitDate: Fri Aug 30 12:52:11 2019 -0400

    Merge pull request #503 from aufi/event_history_catchup

    Allow skip historical Events from OpenStack

    (cherry picked from commit 1a8bc1a679038e81f476e2d25c346c7a6f3cdf1d)

    https://bugzilla.redhat.com/show_bug.cgi?id=1692892

 config/settings.yml | 3 +-
 lib/manageiq/providers/openstack/legacy/events/openstack_ceilometer_event_monitor.rb | 8 +-
 spec/legacy/events/openstack_ceilometer_event_monitor_spec.rb | 40 +
 3 files changed, 49 insertions(+), 2 deletions(-)

Comment 32 Yadnyawalk Tale 2019-09-09 20:01:40 UTC
We tried event_skip_history=false initially which did showed older events in provider event history and migration also stucked at refresh inventory issue which was expected. Then we did event_skip_history=true and provider just did not showed us older event and migration also worked without refresh inventory error.

This is going to be really useful feature least for us moving forward.
Thanks to @Shveta for her help in testing it and @Marek + @Fabien for their inputs.

Verified on - 5.11.0.23.20190904213640_d113674.


Note You need to log in before you can comment on or make changes to this bug.