Bug 1719689 - [v2v][RHV][vddk] 1 out of 10 VMs failed migration, on CFME failed to read the state file.
Summary: [v2v][RHV][vddk] 1 out of 10 VMs failed migration, on CFME failed to read the...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: V2V
Version: 5.10.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.11.1
Assignee: Daniel Berger
QA Contact: Ilanit Stein
Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks: 1726439
TreeView+ depends on / blocked
 
Reported: 2019-06-12 11:00 UTC by Ilanit Stein
Modified: 2019-08-22 14:03 UTC (History)
6 users (show)

Fixed In Version: 5.11.0.13
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1726439 (view as bug list)
Environment:
Last Closed: 2019-08-21 18:10:59 UTC
Category: Bug
Cloudforms Team: V2V
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
evm log (17.28 MB, application/gzip)
2019-06-12 11:05 UTC, Ilanit Stein
no flags Details
automation log (3.99 MB, application/gzip)
2019-06-12 11:05 UTC, Ilanit Stein
no flags Details
v2v import log for the vm failing migration (10.99 MB, text/plain)
2019-06-12 11:16 UTC, Ilanit Stein
no flags Details
v2v import wrapper log (25.29 KB, text/plain)
2019-06-12 11:17 UTC, Ilanit Stein
no flags Details

Description Ilanit Stein 2019-06-12 11:00:56 UTC
Description of problem:
Migration of 20 VMs from VMware to RHV, each 100 GB disk,
were migrated to 2 available conversion hosts 
(one host: 16 VMs, second host: 4 VMs) successfully.

However, one VM failed the migration in CFME UI.

There was not indication on the failure cause.
v2v import log did not indicate a problem, that I could find.

This is the error in evm.log:
[----] I, [2019-06-11T11:48:26.938636 #33063:111ef58]  INFO -- : Q-task_id([job_dispatcher]) MIQ(ServiceTemplateTransformationPl
anTask#get_conversion_state) InfraConversionJob get_conversion_state to update_options: {} 
[----] E, [2019-06-11T11:48:27.014069 #33063:111ef58] ERROR -- : Q-task_id([job_dispatcher]) [RuntimeError]: Could not parse con
version state data from file '/tmp/v2v-import-20190611T143104-24138.state':   Method:[block (2 levels) in <class:LogProxy>] 
[----] E, [2019-06-11T11:48:27.014317 #33063:111ef58] ERROR -- : Q-task_id([job_dispatcher]) /var/www/miq/vmdb/app/models/conver
sion_host.rb:140:in `rescue in get_conversion_state' 
/var/www/miq/vmdb/app/models/conversion_host.rb:134:in `get_conversion_state' 
/var/www/miq/vmdb/app/models/service_template_transformation_plan_task.rb:210:in `get_conversion_state' 
/var/www/miq/vmdb/app/models/infra_conversion_job.rb:94:in `poll_conversion' 
/var/www/miq/vmdb/app/models/job/state_machine.rb:34:in `signal' 
/var/www/miq/vmdb/app/models/miq_queue.rb:455:in `block in dispatch_method' 
/usr/share/ruby/timeout.rb:93:in `block in timeout' 
/usr/share/ruby/timeout.rb:33:in `block in catch' 
/usr/share/ruby/timeout.rb:33:in `catch' 
/usr/share/ruby/timeout.rb:33:in `catch' 
/usr/share/ruby/timeout.rb:108:in `timeout' 
/var/www/miq/vmdb/app/models/miq_queue.rb:453:in `dispatch_method' 
/var/www/miq/vmdb/app/models/miq_queue.rb:430:in `block in deliver' 
/var/www/miq/vmdb/app/models/user.rb:275:in `with_user_group' 
/var/www/miq/vmdb/app/models/miq_queue.rb:430:in `deliver'
...

Version-Release number of selected component (if applicable):
CFME-5.10.6.0/RHV-4.3.4

How reproducible:
Happen once so far, but I didn't do many 20 VMs migration run on CFME-5.10.6.0.

Expected results:
1. Seems that VM  migration should have been reported as successful, on CFME side.
2. In such case there should be a hint in UI to what is the reason migration failed.

Additional info - Some Inputs, by developer:

* Fabien Dupont
"I have no idea why the appliance failed to read the state file. Given the error message, the file was empty. But looking at the file now, it's not."

* Dan Berger
I don't think the state file is empty, what I -think- is happening is that this line is actually failing:
https://github.com/ManageIQ/manageiq/blob/master/app/models/conversion_host.rb#L179
Based on my experiments, this is an issue with the MiqSshUtil wrapper we're using. Instead of raising an error if the remote ssh command fails, it just returns an empty string. Then we try to parse that empty string and a different exception occurs.
I've been working on splitting it out into its own repo, fixing it up and adding tests:
https://github.com/djberg96/manageiq-ssh-util
Once ManageIQ takes it over, we can update the dependencies and code.

Comment 2 Ilanit Stein 2019-06-12 11:05:26 UTC
Created attachment 1579807 [details]
evm log

Comment 3 Ilanit Stein 2019-06-12 11:05:56 UTC
Created attachment 1579808 [details]
automation log

Comment 4 Ilanit Stein 2019-06-12 11:16:35 UTC
Created attachment 1579810 [details]
v2v import log for the vm failing migration

import log v2v for v2v_migration_vm_6 
v2v_migration_vm_0-5, 7-19 ended migration successfully, in CFME UI.

Comment 5 Ilanit Stein 2019-06-12 11:17:23 UTC
Created attachment 1579811 [details]
v2v import wrapper log

v2v import wrapper log for v2v_migration_vm_6

Comment 7 Ilanit Stein 2019-06-27 07:41:02 UTC
This bug seem to be more critical, since it also fails VM migration, before the VM disk upload ends.
That is before the VM migration to RHV is complete.

There fore this is not only a matter of error appearing on CFME side, showing VM migration failed though it passed. It actually fails the VM migration.

Comment 8 Ilanit Stein 2019-06-27 07:45:55 UTC
Comment #7 is based on a second test I did for 20 VMs migration, with max concurrent VMs migration per host 5, using 2 conversion host. Using VDDK. 

One of the VMs, failed on the "failed to read the state" error, mentioned in this bug description.
The VM migration was cancelled from CFME side.
and on RHV side, the VM disk upload was paused, and of course the VM itself was not created.

Comment 9 Fabien Dupont 2019-07-02 09:50:25 UTC
Could not reproduce with CFME 5.10.6.1 + ManageIQ/manageiq-gems-pending#437 applied.
Moving it to 5.10.8, so that we can have ManageIQ/manageiq-gems-pending#437 backported in CFME 5.10.7 and test it again.

Comment 10 Fabien Dupont 2019-07-02 10:45:03 UTC
Moving it back to 5.10.7, but with status ON_QA. This will allow testing with CFME 5.10.7.

Comment 13 Ilanit Stein 2019-08-12 13:46:22 UTC
Tested on these versions:

CFME-5.11.0.18.20190806180636_1dd6378
RHV-4.3.5.3-0.1.el7
RHV-hosts (2, that serve as conversion hosts):
* Special packages of: libguestfs libguestfs-tools-c virt-v2v python-libguestfs: 1.40.2-5.el7.1.bz1680361.v3.1.x86_64.
* OS Version:RHEL - 7.7 - 9.el7
* OS Description: Red Hat Enterprise Linux Server 7.7 Beta (Maipo)
* Kernel Version: 3.10.0 - 957.21.3.el7.x86_64
* KVM Version:2.12.0 - 33.el7
* LIBVIRT Version: libvirt-4.5.0-23.el7
* VDSM Version: vdsm-4.30.19-1.el7ev


Run 20 VMs of 20/100GB disk.

Error reproduces:

[----] I, [2019-08-12T08:16:32.184875 #15318:2ac513a565c4]  INFO -- : Q-task_id([job_dispatcher]) MIQ(InfraConversionJob#process_finished) job finished, Conversion error: Could not parse conversion state data from file '/tmp/v2v-import-20190812T115240-147382.state':

Fabien,
I think this bug should yet move to ON_QA status, as we do not have a merged fix for it.
Should have it's status match it's cloned bug 726439.


Tomas,
Would you please apply the fix mentioned in comment #18, on our hosts, same as you did before, so I can validate the fix?
(I'll send your the host details by email)

Comment 14 Tomáš Golembiovský 2019-08-13 10:15:18 UTC
Hi, patched 2 hosts as requested.

Comment 15 Ilanit Stein 2019-08-21 18:10:59 UTC
Could not reproduce therefore closing the bug.
In case it will be introduced again, I'll update this bug.


Note You need to log in before you can comment on or make changes to this bug.