Description of problem: When running a v2v migrating of a single VM, with 107GB disk (70% usage), VDDK, (VMware to RHV), ISCSI->FC, it failed, after 38 minutes. The disk was fully transferred. On CFME UI, it appeared as Cancelled migration. In the v2v import log, the error causing this failure was missing. Last messages in the v2v import log: virtual copying rate: 611.6 M bits/sec [1814.2] Creating output metadata virt-v2v: error: transfer of disk 1/1 failed, see earlier error messages rm -rf '/var/tmp/vddk.pi48dF' rm -rf '/var/tmp/v2v.GEnF4O' rm -rf '/var/tmp/v2v.1dO6A0' rm -rf '/var/tmp/v2v.sg3z7b' rm -rf '/var/tmp/rhvupload.zEm9Dn' rm -rf '/var/tmp/null.oJiKaz' libguestfs: trace: close libguestfs: closing guestfs handle 0x560b451c35e0 (state 0) nbdkit: debug: vddk: unload nbdkit: debug: VDDK call: VixDiskLib_Exit () nbdkit: debug: VixDiskLib: VixDiskLib_Exit: Unmatched Init calls so far: 1. nbdkit: python[1]: error: /var/tmp/v2v.1dO6A0/rhv-upload-plugin.py: close: error: ['Traceback (most recent call last):\n', ' File "/var/tmp/v2v.1dO6A0/rhv-upload-plugin.py", line 577, in close\n', ' File "/var/tmp/v2v.1dO6A0/rhv-upload-plugin.py", line 516, in delete_disk_on_failure\n', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 37334, in remove\n self._internal_remove(headers, query, wait)\n', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 271, in _internal_remove\n return future.wait() if wait else future\n', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 55, in wait\n return self._code(response)\n', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 268, in callback\n self._check_fault(response)\n', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 132, in _check_fault\n self._raise_error(response, body)\n', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 118, in _raise_error\n raise error\n', 'Error: Fault reason is "Operation Failed". Fault detail is "[Cannot remove Virtual Disk. Related operation is currently in progress. Please try again later.]". HTTP response code is 409.\n'] Version-Release number of selected component (if applicable): CFME-5.11.0.18.20190806180636_1dd6378 RHV-4.3.5.3-0.1.el7 RHV-host: * Interim packages of: libguestfs libguestfs-tools-c virt-v2v python-libguestfs: 1.40.2-5.el7.1.bz1680361.v3.1.x86_64. * OS Version:RHEL - 7.7 - 9.el7 * OS Description: Red Hat Enterprise Linux Server 7.7 Beta (Maipo) * Kernel Version: 3.10.0 - 957.21.3.el7.x86_64 * KVM Version:2.12.0 - 33.el7 * LIBVIRT Version: libvirt-4.5.0-23.el7 * VDSM Version: vdsm-4.30.19-1.el7ev How reproducible: Not sure. So far happen once. Additional info: nsoffer: "This is an error to remove the disk after the import failed because of another error which is not logged here. If we don't have info on the real error this is virt-v2v bug, handing errors incorrectly. We will not be able to support this tool in production if errors are not logged." Logs: https://drive.google.com/drive/u/0/folders/14gkmq4OMjnZhpfSzadZ-gCetC6IFs3Ek
Note for QE: 1.40.2-5.el7.1.bz1680361.v3.1.x86_64 is a package I built for Ilanit which is the same as 1.40.2-5.el7 + a couple of patches attempting to fix bz1680361.
Nir, can you tell us what we need to do to capture the "true" Python error?
(In reply to Richard W.M. Jones from comment #3) > Nir, can you tell us what we need to do to capture the "true" Python error? We need to ensure that error during cleanup flow are handled and not raised, hiding the original error that lead to the cleanup flow. I hope that this code makes the error clear. $ cat example.py def import_vm(): raise Exception("real error") def cleanup(): raise Exception("cleanup error") try: import_vm() except Exception: cleanup() raise $ python example.py Traceback (most recent call last): File "example.py", line 10, in <module> cleanup() File "example.py", line 5, in cleanup raise Exception("cleanup error") Exception: cleanup error The real error is hidden by the cleanup error, so we cannot fix the real issue. This is mainly python 2 issue. With python 3 we see: $ python3 example.py Traceback (most recent call last): File "example.py", line 8, in <module> import_vm() File "example.py", line 2, in import_vm raise Exception("real error") Exception: real error During handling of the above exception, another exception occurred: Traceback (most recent call last): File "example.py", line 10, in <module> cleanup() File "example.py", line 5, in cleanup raise Exception("cleanup error") Exception: cleanup error To fix this in python 2, we need to make sure cleanup never raises. it must handle error cleanup errors by logging them. $ cat example_fix.py import logging def import_vm(): raise Exception("real error") def cleanup(): try: raise Exception("cleanup error") except Exception: logging.exception("Error while cleaning up") try: import_vm() except Exception: cleanup() raise Now we will see both errors: $ python example_fix.py ERROR:root:Error while cleaning up Traceback (most recent call last): File "example_fix.py", line 8, in cleanup raise Exception("cleanup error") Exception: cleanup error Traceback (most recent call last): File "example_fix.py", line 13, in <module> import_vm() File "example_fix.py", line 4, in import_vm raise Exception("real error") Exception: real error
Richard, Can this bug get fixed sooner please, for RHEL-7.7/8?
We're not doing anything for RHEL 7.8. RHEL 8 has Python 3 so apparently would not be affected. As usual for changes to the rhv-upload backend we need patches that have been fully tested. I have no time or manpower to make these changes right now.
This bug do not reproduce on a small scale RHV. I'll try to reproduce it on the large scale RHV, where it was found, once it's free to use.
I tried to reproduce it on large scale RHV environment RHV-4.3.6, with CFME-5.11.0.24. however it did not reproduce so far. if/once I'll see it again, I'll update this bug.
Ilanit. Have you reproduced this BZ since September ? If not, could you consider closing it ?
This bug will be addressed in next major release.
In reply to comment #9: I did not reproduce this bug so far. This bug is about improving the code, so that hidden errors will be reported. Therefore, I am not sure a reproduce in the case is required.
This is not impacting RHEL8 as the default is python3.