Bug 1616226 - The disk is not cleaned up in ovirt when image_transfer job fails in some condition
Summary: The disk is not cleaned up in ovirt when image_transfer job fails in some con...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: virt-v2v
Version: 8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.1
Assignee: Richard W.M. Jones
QA Contact: liuzi
URL:
Whiteboard: V2V
Depends On:
Blocks: 1771318
TreeView+ depends on / blocked
 
Reported: 2018-08-15 10:49 UTC by Xiaodai Wang
Modified: 2020-11-17 17:45 UTC (History)
8 users (show)

Fixed In Version: virt-v2v-1.42.0-3.module+el8.3.0+6497+b190d2a5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-17 17:44:45 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
disk info (239.36 KB, image/png)
2020-05-29 02:42 UTC, liuzi
no flags Details

Description Xiaodai Wang 2018-08-15 10:49:15 UTC
Description of problem:
The disk is not cleaned up in ovirt when image_transfer job fails in some condition

Version-Release number of selected component (if applicable):
virt-v2v-1.38.2-10.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
Here is an example link for the failure.

https://libvirt-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/v2v/view/RHEL-7.6/job/v2v-RHEL-7.6-runtest-x86_64-matrix-esx6.7_rhv_upload/1/testReport/rhev/convert_vm_to_ovirt/esx_vm_6_7_windows_win10_arch_i386_raw_f_ISCSI_rhv_upload_rhv_direct_rhv_verifypeer_preallocated/

Actual results:
when image_transfer job fails due to incorrect http response, the job will be paused, and the disk cannot be cleaned up successfully. You can find the paused job in ovirt.

[stderr] nbdkit: python[1]: error: /var/tmp/rhvupload.HRyS4g/rhv-upload-plugin.py: close: error: Fault reason is "Operation Failed". Fault detail is "[Cannot remove Virtual Disk. Related operation is currently in progress. Please try again later.]". HTTP response code is 409.

Expected results:
The disk should be cleaned up successfully.

Additional info:

Comment 2 Richard W.M. Jones 2018-08-15 10:57:47 UTC
I had a helpful email from Daniel Erez explaining what we should do.
Quoting from him:

> The image transfer moves to a 'Paused' status on failure.
> So to remove the disk, 'cancel' action[1] should be invoked
> on the image transfer object,
> then the disk will be deleted automatically.
> 
> [1] http://ovirt.github.io/ovirt-engine-api-model/master/#services/image_transfer/methods/cancel

Comment 3 Pino Toscano 2019-09-24 08:15:51 UTC
Sadly the log is not available anymore... :-/ Do you still have a reliable way to trigger this issue?
I think I might have fixed it with a couple of recent commits:
https://github.com/libguestfs/libguestfs/commit/8118f28b6ff93c11f92fd65873285c2eba10ea0a
https://github.com/libguestfs/libguestfs/commit/0f3686e9ed420b039a8a332df95e3c39c1e2143b

Comment 6 Pino Toscano 2020-05-19 15:42:02 UTC
There were lots of changes upstream about this done by Nir Soffer, available in virt-v2v 1.42.0.

Comment 9 liuzi 2020-05-26 04:35:49 UTC
Test bug with builds:
virt-v2v-1.42.0-3.module+el8.3.0+6497+b190d2a5.x86_64
libguestfs-1.42.0-1.module+el8.3.0+6496+d39ac712.x86_64

Steps:
Scenario 1: V2V conversion failed by manually cancel
1.Use virt-v2v to convert a guest from vmware to rhv and cancelled conversion during copying disks:
# virt-v2v  -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1  esx6.7-win2019-x86_64  -o rhv-upload -os nfs_data -of raw -b ovirtmgmt  -it vddk -io vddk-libdir=/home/vddk7.0/vmware-vix-disklib-distrib -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA  -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /home/rhvpasswd -oo rhv-cluster=Default -oo rhv-direct -ip /home/passwd -oo rhv-verifypeer=true -oo rhv-cafile=/home/ca.pem
[   1.0] Opening the source -i libvirt -ic vpx://root.73.141:443/data/10.73.75.219/?no_verify=1 esx6.7-win2019-x86_64 -it vddk  -io vddk-libdir=/home/vddk7.0/vmware-vix-disklib-distrib -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA
[   2.8] Creating an overlay to protect the source from being modified
[   3.7] Opening the overlay
[  11.3] Inspecting the overlay
[  18.8] Checking for sufficient free disk space in the guest
[  18.8] Estimating space required on target for each disk
[  18.8] Converting Windows Server 2019 Standard to run on KVM
virt-v2v: warning: /usr/share/virt-tools/pnp_wait.exe is missing.  
Firstboot scripts may conflict with PnP.
virt-v2v: warning: QEMU Guest Agent MSI not found on tools ISO/directory. 
You may want to install the guest agent manually after conversion.
virt-v2v: warning: there are no virtio drivers available for this version 
of Windows (10.0 x86_64 Server).  virt-v2v looks for drivers in 
/usr/share/virtio-win

The guest will be configured to use slower emulated devices.
virt-v2v: This guest does not have virtio drivers installed.
[  22.3] Mapping filesystem data to avoid copying unused and blank areas
[  23.3] Closing the overlay
[  23.5] Assigning disks to buses
[  23.5] Checking if the guest needs BIOS or UEFI to boot
[  23.5] Initializing the target -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /home/rhvpasswd -os nfs_data
[  24.9] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/tmp/v2vnbdkit.rFzSdX/nbdkit4.sock", "file.export": "/" } (raw)
^Cnbdkit: python[1]: error: write reply: NBD_CMD_WRITE: Broken pipe

1.2.Login into rhv,check disk info from Storage ->Storage Domains -> nfs_data ->disks:
1> Find guest's disk remained.
2> Disk's status is 'send 600 of 2048MB'

Scenario 2:v2v conversion failed caused by bug:
2.1 Use vddk6.5 to convert a guest to rhv:
# virt-v2v  -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1  esx6.7-rhel8.2-x86_64  -o rhv-upload -os nfs_data -of raw -b ovirtmgmt  -it vddk -io vddk-libdir=/home/vmware-vix-disklib-distrib -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA  -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /home/rhvpasswd -oo rhv-cluster=Default -oo rhv-direct -ip /home/passwd -oo rhv-verifypeer=true -oo rhv-cafile=/home/ca.pem
[   1.0] Opening the source -i libvirt -ic vpx://root.73.141:443/data/10.73.75.219/?no_verify=1 esx6.7-rhel8.2-x86_64 -it vddk  -io vddk-libdir=/home/vmware-vix-disklib-distrib -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA
[   2.6] Creating an overlay to protect the source from being modified
[   5.6] Opening the overlay
[  15.1] Inspecting the overlay
[  38.2] Checking for sufficient free disk space in the guest
[  38.2] Estimating space required on target for each disk
[  38.2] Converting Red Hat Enterprise Linux 8.2 (Ootpa) to run on KVM
virt-v2v: This guest has virtio drivers installed.
[ 190.5] Mapping filesystem data to avoid copying unused and blank areas
[ 191.6] Closing the overlay
[ 191.9] Assigning disks to buses
[ 191.9] Checking if the guest needs BIOS or UEFI to boot
[ 191.9] Initializing the target -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /home/rhvpasswd -os nfs_data
[ 193.3] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/tmp/v2vnbdkit.SEBoAh/nbdkit4.sock", "file.export": "/" } (raw)
nbdkit: vddk[3]: error: [NFC ERROR] NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 33554456 bytes
nbdkit: vddk[3]: error: VixDiskLib_Read: Memory allocation failed. Out of memory.
nbdkit: vddk[3]: error: [NFC ERROR] NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 33554456 bytes
nbdkit: vddk[3]: error: VixDiskLib_Read: Memory allocation failed. Out of memory.
nbdkit: vddk[3]: error: [NFC ERROR] NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 33554456 bytes
nbdkit: vddk[3]: error: VixDiskLib_Read: Memory allocation failed. Out of memory.
nbdkit: vddk[3]: error: [NFC ERROR] NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 33554456 bytes
nbdkit: vddk[3]: error: VixDiskLib_Read: Memory allocation failed. Out of memory.
nbdkit: vddk[3]: error: [NFC ERROR] NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 33554456 bytes
nbdkit: vddk[3]: error: VixDiskLib_Read: Memory allocation failed. Out of memory.
nbdkit: vddk[3]: error: [NFC ERROR] NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 33554456 bytes
nbdkit: vddk[3]: error: VixDiskLib_Read: Memory allocation failed. Out of memory.
qemu-img: error while reading at byte 44302336: Input/output error

virt-v2v: error: qemu-img command failed, see earlier errors

If reporting bugs, run virt-v2v with debugging enabled and include the 
complete output:

  virt-v2v -v -x [...]

2.2 Login into rhv,check disk info from Storage ->Storage Domains -> nfs_data ->disks:
1> Find guest's disk remained.
2> Disk's status is 'OK'


Hi,Pino
As above testing ,I think the bug is not fixed.And in different failure situations, the state of the disk is also different.
Details pls refer to attachment named:disk info

Comment 10 Pino Toscano 2020-05-26 06:23:23 UTC
(In reply to liuzi from comment #9)
> As above testing ,I think the bug is not fixed.And in different failure
> situations, the state of the disk is also different.

Manually cancelling (e.g. with Ctrl+C) is not well supported in general in virt-v2v, you can easily check there are other issues and leftovers when doing so.
It is a totally different thing to consider, so please do not mix it as part of this bug, otherwise this won't ever be fixed...

This bug was more about cleaning disks when there are disk transfer failures during the conversion.

> Details pls refer to attachment named:disk info

(still missing)

Comment 11 liuzi 2020-05-28 11:39:27 UTC
Test bug with builds:
virt-v2v-1.42.0-3.module+el8.3.0+6497+b190d2a5.x86_64
libguestfs-1.42.0-1.module+el8.3.0+6496+d39ac712.x86_64

Steps:
Scenario 1
1.Convert a guest to rhv. And wait until the transfer starts,make the ticket expire.
#  virt-v2v  -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1  esx6.7-win2016-x86_64-vmware-tools     -o rhv-upload -os nfs_data -of raw -b ovirtmgmt  -it vddk -io vddk-libdir=/home/vddk7.0/vmware-vix-disklib-distrib -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA  -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /home/rhvpasswd -oo rhv-cluster=Default -oo rhv-direct -ip /home/passwd -oo rhv-verifypeer=true -oo rhv-cafile=/home/ca.pem
[   1.0] Opening the source -i libvirt -ic vpx://root.73.141:443/data/10.73.75.219/?no_verify=1 esx6.7-win2016-x86_64-vmware-tools -it vddk  -io vddk-libdir=/home/vddk7.0/vmware-vix-disklib-distrib -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA
[   2.7] Creating an overlay to protect the source from being modified
[   3.6] Opening the overlay
[  11.0] Inspecting the overlay
[  19.7] Checking for sufficient free disk space in the guest
[  19.7] Estimating space required on target for each disk
[  19.7] Converting Windows Server 2016 Standard to run on KVM
virt-v2v: warning: /usr/share/virt-tools/pnp_wait.exe is missing.  
Firstboot scripts may conflict with PnP.
virt-v2v: warning: QEMU Guest Agent MSI not found on tools ISO/directory. 
You may want to install the guest agent manually after conversion.
virt-v2v: warning: there are no virtio drivers available for this version 
of Windows (10.0 x86_64 Server).  virt-v2v looks for drivers in 
/usr/share/virtio-win

The guest will be configured to use slower emulated devices.
virt-v2v: This guest does not have virtio drivers installed.
[  24.8] Mapping filesystem data to avoid copying unused and blank areas
[  26.6] Closing the overlay
[  26.9] Assigning disks to buses
[  26.9] Checking if the guest needs BIOS or UEFI to boot
[  26.9] Initializing the target -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /home/rhvpasswd -os nfs_data
[  28.3] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/tmp/v2vnbdkit.4Bqtfn/nbdkit4.sock", "file.export": "/" } (raw)
nbdkit: python[1]: error: /var/tmp/rhvupload.FDjiNs/rhv-upload-plugin.py: pwrite: error: Traceback (most recent call last):
   File "/var/tmp/rhvupload.FDjiNs/rhv-upload-plugin.py", line 94, in wrapper
    return func(h, *args)
   File "/var/tmp/rhvupload.FDjiNs/rhv-upload-plugin.py", line 234, in pwrite
    (offset, count))
   File "/var/tmp/rhvupload.FDjiNs/rhv-upload-plugin.py", line 178, in request_failed
    raise RuntimeError("%s: %d %s: %r" % (msg, status, reason, body[:200]))
 RuntimeError: could not write sector offset 3372220416 size 2097152: 403 Forbidden: b'You are not allowed to access this resource: Ticket 14f3df50-faae-4f08-8c1b-f60d1a3c224b expired'

qemu-img: error while writing at byte 3372220416: Input/output error

nbdkit: python[1]: error: /var/tmp/rhvupload.FDjiNs/rhv-upload-plugin.py: flush: error: Traceback (most recent call last):
   File "/var/tmp/rhvupload.FDjiNs/rhv-upload-plugin.py", line 94, in wrapper
    return func(h, *args)
   File "/var/tmp/rhvupload.FDjiNs/rhv-upload-plugin.py", line 343, in flush
    request_failed(r, "could not flush")
   File "/var/tmp/rhvupload.FDjiNs/rhv-upload-plugin.py", line 178, in request_failed
    raise RuntimeError("%s: %d %s: %r" % (msg, status, reason, body[:200]))
 RuntimeError: could not flush: 403 Forbidden: b'You are not allowed to access this resource: Ticket 14f3df50-faae-4f08-8c1b-f60d1a3c224b expired'

virt-v2v: error: qemu-img command failed, see earlier errors

If reporting bugs, run virt-v2v with debugging enabled and include the 
complete output:

  virt-v2v -v -x [...]

1.2 Login into rhv,check disk info from Storage ->Storage Domains -> nfs_data ->disks:
a> The disk is cleaned up in ovirt.

Scenario 2:
2.Convert a guest to rhv. And wait until the transfer starts,restart the ovirt host's network.
#  virt-v2v  -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1  esx6.7-win2016-x86_64-vmware-tools     -o rhv-upload -os nfs_data -of raw -b ovirtmgmt  -it vddk -io vddk-libdir=/home/vddk7.0/vmware-vix-disklib-distrib -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA  -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /home/rhvpasswd -oo rhv-cluster=Default -oo rhv-direct -ip /home/passwd -oo rhv-verifypeer=true -oo rhv-cafile=/home/ca.pem
[   1.0] Opening the source -i libvirt -ic vpx://root.73.141:443/data/10.73.75.219/?no_verify=1 esx6.7-win2016-x86_64-vmware-tools -it vddk  -io vddk-libdir=/home/vddk7.0/vmware-vix-disklib-distrib -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA
[   2.7] Creating an overlay to protect the source from being modified
[   3.5] Opening the overlay
[  10.7] Inspecting the overlay
[  16.7] Checking for sufficient free disk space in the guest
[  16.7] Estimating space required on target for each disk
[  16.7] Converting Windows Server 2016 Standard to run on KVM
virt-v2v: warning: /usr/share/virt-tools/pnp_wait.exe is missing.  
Firstboot scripts may conflict with PnP.
virt-v2v: warning: QEMU Guest Agent MSI not found on tools ISO/directory. 
You may want to install the guest agent manually after conversion.
virt-v2v: warning: there are no virtio drivers available for this version 
of Windows (10.0 x86_64 Server).  virt-v2v looks for drivers in 
/usr/share/virtio-win

The guest will be configured to use slower emulated devices.
virt-v2v: This guest does not have virtio drivers installed.
[  20.5] Mapping filesystem data to avoid copying unused and blank areas
[  22.3] Closing the overlay
[  22.6] Assigning disks to buses
[  22.6] Checking if the guest needs BIOS or UEFI to boot
[  22.6] Initializing the target -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /home/rhvpasswd -os nfs_data
[  23.9] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/tmp/v2vnbdkit.lAo1WE/nbdkit4.sock", "file.export": "/" } (raw)
    (1.01/100%)

nbdkit: python[1]: error: /var/tmp/rhvupload.dl0JJk/rhv-upload-plugin.py: pwrite: error: Traceback (most recent call last):
   File "/var/tmp/rhvupload.dl0JJk/rhv-upload-plugin.py", line 94, in wrapper
    return func(h, *args)
   File "/var/tmp/rhvupload.dl0JJk/rhv-upload-plugin.py", line 234, in pwrite
    (offset, count))
   File "/var/tmp/rhvupload.dl0JJk/rhv-upload-plugin.py", line 178, in request_failed
    raise RuntimeError("%s: %d %s: %r" % (msg, status, reason, body[:200]))
 RuntimeError: could not write sector offset 350093312 size 2097152: 500 Internal Server Error: b'Server failed to perform the request, check logs'

qemu-img: error while writing at byte 350093312: Input/output error

virt-v2v: error: qemu-img command failed, see earlier errors

If reporting bugs, run virt-v2v with debugging enabled and include the 
complete output:

  virt-v2v -v -x [...]

2.2  Login into rhv,check disk info from Storage ->Storage Domains -> nfs_data ->disks:
a> The disk is cleaned up in ovirt.

Result:As above testing ,virt-v2v can clean disks in ovirt host when disk transfer failures during the conversion.So change the bug from ON_QA to VERIFIED.

Comment 12 liuzi 2020-05-29 02:42:53 UTC
Created attachment 1693206 [details]
disk info

Comment 15 errata-xmlrpc 2020-11-17 17:44:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137


Note You need to log in before you can comment on or make changes to this bug.