Description of problem: Migration of 20 VMs from VMware to RHV, each 100 GB disk, were migrated to 2 available conversion hosts (one host: 16 VMs, second host: 4 VMs) successfully. However, one VM failed the migration in CFME UI. There was not indication on the failure cause. v2v import log did not indicate a problem, that I could find. This is the error in evm.log: [----] I, [2019-06-11T11:48:26.938636 #33063:111ef58] INFO -- : Q-task_id([job_dispatcher]) MIQ(ServiceTemplateTransformationPl anTask#get_conversion_state) InfraConversionJob get_conversion_state to update_options: {} [----] E, [2019-06-11T11:48:27.014069 #33063:111ef58] ERROR -- : Q-task_id([job_dispatcher]) [RuntimeError]: Could not parse con version state data from file '/tmp/v2v-import-20190611T143104-24138.state': Method:[block (2 levels) in <class:LogProxy>] [----] E, [2019-06-11T11:48:27.014317 #33063:111ef58] ERROR -- : Q-task_id([job_dispatcher]) /var/www/miq/vmdb/app/models/conver sion_host.rb:140:in `rescue in get_conversion_state' /var/www/miq/vmdb/app/models/conversion_host.rb:134:in `get_conversion_state' /var/www/miq/vmdb/app/models/service_template_transformation_plan_task.rb:210:in `get_conversion_state' /var/www/miq/vmdb/app/models/infra_conversion_job.rb:94:in `poll_conversion' /var/www/miq/vmdb/app/models/job/state_machine.rb:34:in `signal' /var/www/miq/vmdb/app/models/miq_queue.rb:455:in `block in dispatch_method' /usr/share/ruby/timeout.rb:93:in `block in timeout' /usr/share/ruby/timeout.rb:33:in `block in catch' /usr/share/ruby/timeout.rb:33:in `catch' /usr/share/ruby/timeout.rb:33:in `catch' /usr/share/ruby/timeout.rb:108:in `timeout' /var/www/miq/vmdb/app/models/miq_queue.rb:453:in `dispatch_method' /var/www/miq/vmdb/app/models/miq_queue.rb:430:in `block in deliver' /var/www/miq/vmdb/app/models/user.rb:275:in `with_user_group' /var/www/miq/vmdb/app/models/miq_queue.rb:430:in `deliver' ... Version-Release number of selected component (if applicable): CFME-5.10.6.0/RHV-4.3.4 How reproducible: Happen once so far, but I didn't do many 20 VMs migration run on CFME-5.10.6.0. Expected results: 1. Seems that VM migration should have been reported as successful, on CFME side. 2. In such case there should be a hint in UI to what is the reason migration failed. Additional info - Some Inputs, by developer: * Fabien Dupont "I have no idea why the appliance failed to read the state file. Given the error message, the file was empty. But looking at the file now, it's not." * Dan Berger I don't think the state file is empty, what I -think- is happening is that this line is actually failing: https://github.com/ManageIQ/manageiq/blob/master/app/models/conversion_host.rb#L179 Based on my experiments, this is an issue with the MiqSshUtil wrapper we're using. Instead of raising an error if the remote ssh command fails, it just returns an empty string. Then we try to parse that empty string and a different exception occurs. I've been working on splitting it out into its own repo, fixing it up and adding tests: https://github.com/djberg96/manageiq-ssh-util Once ManageIQ takes it over, we can update the dependencies and code.
Created attachment 1579807 [details] evm log
Created attachment 1579808 [details] automation log
Created attachment 1579810 [details] v2v import log for the vm failing migration import log v2v for v2v_migration_vm_6 v2v_migration_vm_0-5, 7-19 ended migration successfully, in CFME UI.
Created attachment 1579811 [details] v2v import wrapper log v2v import wrapper log for v2v_migration_vm_6
This bug seem to be more critical, since it also fails VM migration, before the VM disk upload ends. That is before the VM migration to RHV is complete. There fore this is not only a matter of error appearing on CFME side, showing VM migration failed though it passed. It actually fails the VM migration.
Comment #7 is based on a second test I did for 20 VMs migration, with max concurrent VMs migration per host 5, using 2 conversion host. Using VDDK. One of the VMs, failed on the "failed to read the state" error, mentioned in this bug description. The VM migration was cancelled from CFME side. and on RHV side, the VM disk upload was paused, and of course the VM itself was not created.
Could not reproduce with CFME 5.10.6.1 + ManageIQ/manageiq-gems-pending#437 applied. Moving it to 5.10.8, so that we can have ManageIQ/manageiq-gems-pending#437 backported in CFME 5.10.7 and test it again.
Moving it back to 5.10.7, but with status ON_QA. This will allow testing with CFME 5.10.7.
Tested on these versions: CFME-5.11.0.18.20190806180636_1dd6378 RHV-4.3.5.3-0.1.el7 RHV-hosts (2, that serve as conversion hosts): * Special packages of: libguestfs libguestfs-tools-c virt-v2v python-libguestfs: 1.40.2-5.el7.1.bz1680361.v3.1.x86_64. * OS Version:RHEL - 7.7 - 9.el7 * OS Description: Red Hat Enterprise Linux Server 7.7 Beta (Maipo) * Kernel Version: 3.10.0 - 957.21.3.el7.x86_64 * KVM Version:2.12.0 - 33.el7 * LIBVIRT Version: libvirt-4.5.0-23.el7 * VDSM Version: vdsm-4.30.19-1.el7ev Run 20 VMs of 20/100GB disk. Error reproduces: [----] I, [2019-08-12T08:16:32.184875 #15318:2ac513a565c4] INFO -- : Q-task_id([job_dispatcher]) MIQ(InfraConversionJob#process_finished) job finished, Conversion error: Could not parse conversion state data from file '/tmp/v2v-import-20190812T115240-147382.state': Fabien, I think this bug should yet move to ON_QA status, as we do not have a merged fix for it. Should have it's status match it's cloned bug 726439. Tomas, Would you please apply the fix mentioned in comment #18, on our hosts, same as you did before, so I can validate the fix? (I'll send your the host details by email)
Hi, patched 2 hosts as requested.
Could not reproduce therefore closing the bug. In case it will be introduced again, I'll update this bug.