Bug 1740098

Summary: [v2v][Scale][RHV] Single VM migration failed, but related virt-v2v error is not logged.
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Ilanit Stein <istein>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.1CC: bthurber, fdupont, jsuchane, mxie, mzhan, nsoffer, ptoscano, rjones, tzheng, xiaodwan, zili
Target Milestone: rcFlags: istein: needinfo+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: V2V
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-23 13:04:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ilanit Stein 2019-08-12 09:54:09 UTC
Description of problem:
When running a v2v migrating of a single VM, with 107GB disk (70% usage), VDDK, (VMware to RHV), ISCSI->FC, it failed, after 38 minutes. 
The disk was fully transferred. 
On CFME UI, it appeared as Cancelled migration.
In the v2v import log, the error causing this failure was missing.

Last messages in the v2v import log:
virtual copying rate: 611.6 M bits/sec
[1814.2] Creating output metadata
virt-v2v: error: transfer of disk 1/1 failed, see earlier error messages
rm -rf '/var/tmp/vddk.pi48dF'
rm -rf '/var/tmp/v2v.GEnF4O'
rm -rf '/var/tmp/v2v.1dO6A0'
rm -rf '/var/tmp/v2v.sg3z7b'
rm -rf '/var/tmp/rhvupload.zEm9Dn'
rm -rf '/var/tmp/null.oJiKaz'
libguestfs: trace: close
libguestfs: closing guestfs handle 0x560b451c35e0 (state 0)
nbdkit: debug: vddk: unload
nbdkit: debug: VDDK call: VixDiskLib_Exit ()
nbdkit: debug: VixDiskLib: VixDiskLib_Exit: Unmatched Init calls so far: 1.
nbdkit: python[1]: error: /var/tmp/v2v.1dO6A0/rhv-upload-plugin.py: close: error: ['Traceback (most recent call last):\n', '  File "/var/tmp/v2v.1dO6A0/rhv-upload-plugin.py", line 577, in close\n', '  File "/var/tmp/v2v.1dO6A0/rhv-upload-plugin.py", line 516, in delete_disk_on_failure\n', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 37334, in remove\n    self._internal_remove(headers, query, wait)\n', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 271, in _internal_remove\n    return future.wait() if wait else future\n', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 55, in wait\n    return self._code(response)\n', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 268, in callback\n    self._check_fault(response)\n', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 132, in _check_fault\n    self._raise_error(response, body)\n', '  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 118, in _raise_error\n    raise error\n', 'Error: Fault reason is "Operation Failed". Fault detail is "[Cannot remove Virtual Disk. Related operation is currently in progress. Please try again later.]". HTTP response code is 409.\n']

Version-Release number of selected component (if applicable):
CFME-5.11.0.18.20190806180636_1dd6378
RHV-4.3.5.3-0.1.el7
RHV-host:
* Interim packages of: 
libguestfs libguestfs-tools-c virt-v2v python-libguestfs: 
1.40.2-5.el7.1.bz1680361.v3.1.x86_64.
* OS Version:RHEL - 7.7 - 9.el7
* OS Description: Red Hat Enterprise Linux Server 7.7 Beta (Maipo)
* Kernel Version: 3.10.0 - 957.21.3.el7.x86_64
* KVM Version:2.12.0 - 33.el7
* LIBVIRT Version: libvirt-4.5.0-23.el7
* VDSM Version: vdsm-4.30.19-1.el7ev

How reproducible:
Not sure. So far happen once.

Additional info:
nsoffer:
"This is an error to remove the disk after the import failed because of another error which is not logged here. If we don't have info on the real error this is virt-v2v bug, handing errors incorrectly.
We will not be able to support this tool in production if errors are not logged."

Logs:
https://drive.google.com/drive/u/0/folders/14gkmq4OMjnZhpfSzadZ-gCetC6IFs3Ek

Comment 2 Richard W.M. Jones 2019-08-12 10:05:26 UTC
Note for QE: 1.40.2-5.el7.1.bz1680361.v3.1.x86_64 is a package I built for Ilanit
which is the same as 1.40.2-5.el7 + a couple of patches attempting to fix bz1680361.

Comment 3 Richard W.M. Jones 2019-08-12 10:06:16 UTC
Nir, can you tell us what we need to do to capture the "true" Python error?

Comment 4 Nir Soffer 2019-08-12 13:10:25 UTC
(In reply to Richard W.M. Jones from comment #3)
> Nir, can you tell us what we need to do to capture the "true" Python error?

We need to ensure that error during cleanup flow are handled and not raised,
hiding the original error that lead to the cleanup flow.

I hope that this code makes the error clear.


$ cat example.py 
def import_vm():
    raise Exception("real error")

def cleanup():
    raise Exception("cleanup error")

try:
    import_vm()
except Exception:
    cleanup()
    raise


$ python example.py 
Traceback (most recent call last):
  File "example.py", line 10, in <module>
    cleanup()
  File "example.py", line 5, in cleanup
    raise Exception("cleanup error")
Exception: cleanup error


The real error is hidden by the cleanup error, so we cannot fix
the real issue.

This is mainly python 2 issue. With python 3 we see:


$ python3 example.py 
Traceback (most recent call last):
  File "example.py", line 8, in <module>
    import_vm()
  File "example.py", line 2, in import_vm
    raise Exception("real error")
Exception: real error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "example.py", line 10, in <module>
    cleanup()
  File "example.py", line 5, in cleanup
    raise Exception("cleanup error")
Exception: cleanup error



To fix this in python 2, we need to make sure cleanup never raises.
it must handle error cleanup errors by logging them.

$ cat example_fix.py 
import logging

def import_vm():
    raise Exception("real error")

def cleanup():
    try:
        raise Exception("cleanup error")
    except Exception:
        logging.exception("Error while cleaning up")

try:
    import_vm()
except Exception:
    cleanup()
    raise

Now we will see both errors:

$ python example_fix.py 
ERROR:root:Error while cleaning up
Traceback (most recent call last):
  File "example_fix.py", line 8, in cleanup
    raise Exception("cleanup error")
Exception: cleanup error
Traceback (most recent call last):
  File "example_fix.py", line 13, in <module>
    import_vm()
  File "example_fix.py", line 4, in import_vm
    raise Exception("real error")
Exception: real error

Comment 5 Ilanit Stein 2019-09-03 09:39:33 UTC
Richard,

Can this bug get fixed sooner please, for RHEL-7.7/8?

Comment 6 Richard W.M. Jones 2019-09-03 10:37:48 UTC
We're not doing anything for RHEL 7.8.  RHEL 8 has Python 3 so
apparently would not be affected.

As usual for changes to the rhv-upload backend we need patches
that have been fully tested.  I have no time or manpower to make
these changes right now.

Comment 7 Ilanit Stein 2019-09-05 12:54:19 UTC
This bug do not reproduce on a small scale RHV.
I'll try to reproduce it on the large scale RHV, where it was found, once it's free to use.

Comment 8 Ilanit Stein 2019-09-18 14:00:17 UTC
I tried to reproduce it on large scale RHV environment RHV-4.3.6, with CFME-5.11.0.24. however it did not reproduce so far.
if/once I'll see it again, I'll update this bug.

Comment 9 Fabien Dupont 2019-12-03 16:26:16 UTC
Ilanit. Have you reproduced this BZ since September ? If not, could you consider closing it ?

Comment 10 Jaroslav Suchanek 2019-12-04 15:30:30 UTC
This bug will be addressed in next major release.

Comment 11 Ilanit Stein 2019-12-11 08:26:24 UTC
In reply to comment #9:
I did not reproduce this bug so far.
This bug is about improving the code, so that hidden errors will be reported.
Therefore, I am not sure a reproduce in the case is required.

Comment 12 Jaroslav Suchanek 2020-01-23 13:04:41 UTC
This is not impacting RHEL8 as the default is python3.