1976020 – [RFE][v2v] [upload/download disk/CBT] Failed to attach disk to the VM - disk is OK but image transfer still holds a lock on the disk

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1976020 - [RFE][v2v] [upload/download disk/CBT] Failed to attach disk to the VM - disk is OK but image transfer still holds a lock on the disk

Summary: [RFE][v2v] [upload/download disk/CBT] Failed to attach disk to the VM - disk ...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	virt-v2v
Sub Component:
Version:	9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	beta
Target Release:	---
Assignee:	Virtualization Maintenance
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	1849861
Blocks:	1923178 1976024
TreeView+	depends on / blocked

Reported:	2021-06-25 01:51 UTC by Xiaodai Wang
Modified:	2022-04-20 11:59 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:	1849861
Clones:	1976024 (view as bug list)
Environment:
Last Closed:	2022-04-20 11:59:14 UTC
Type:	Feature Request
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Description Xiaodai Wang 2021-06-25 01:51:31 UTC

This bug should also be fixed on v2v side.

> Pavel is working on image transfer examples, so I hope he can also fix virt-v2v.

+++ This bug was initially created as a clone of Bug #1849861 +++

Description of problem:
Failed to attach disk to the VM

Version-Release number of selected component (if applicable):
virt-v2v-1.40.2-23.module+el8.2.1+6797+0db00a40.x86_64
ovirt-engine-4.3.10-0.2.el7.noarch

How reproducible:
50%

Steps to Reproduce:
1. Run auto script by 'avocado run --vt-type v2v --vt-machine-type q35 convert_from_file.positive_test.linux.input_mode.ova.parse.invalid_line_in_manifest.output_mode.rhev.rhv_upload --vt-connect-uri qemu:///system'.  
   1)The real v2v command was executed in the script is '/bin/virt-v2v -i ova -ic qemu:///system -o rhv-upload -os nfs_data -of raw -b ovirtmgmt -n ovirtmgmt /var/lib/libvirt/images/v2v/ova-images/case84967/invalid_line_in_manifest.ova -on ova_vm_toT -oc https://hp-dl360eg8-03.lab.eng.pek2.redhat.com/ovirt-engine/api -op /tmp/rhv_upload_passwd_file -oo rhv-cafile=/tmp/rhv_upload_ca.pem -oo rhv-cluster=NFS -oo rhv-direct -v -x'.
   2) The ova file used in the script are as below.
# ll -h /var/lib/libvirt/images/v2v/ova-images/case84967/
total 20M
-rwxr-xr-x. 1 nobody nobody 3.9M Apr 17  2017 invalid_line_in_manifest-missing_space.ova
-rwxr-xr-x. 1 nobody nobody 3.9M Apr 14  2017 invalid_line_in_manifest.ova
-rw-r--r--. 1 nobody nobody 3.9M Mar 29  2019 SHA1.ova
-rwxr-xr-x. 1 nobody nobody 3.9M Apr 14  2017 SHA1_SHA256.ova
-rw-r--r--. 1 nobody nobody 3.9M Mar 29  2019 SHA256.ova

Actual results:
2020-06-22 11:41:37,183+08 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1295) [bd848c1] EVENT_ID: VM_IMPORT_FROM_CONFIGURATION_ATTACH_DISKS_FAILED(175), VM ova_vm_toT has been imported from the given configuration but the following disk(s) failed to attach: a49ea084-1cc6-4f80-8ea8-79f8fdd3c0dc.

Expected results:
The disk should be attached to the VM successfully.

Additional info:
1) I can only reproduce it by running the script. I tried to reproduce it by running the v2v
command directly in the terminal but failed.
2) The ova disk is not bootable.
3) Please see attachment for the v2v debug log and ovirt-engine log.

--- Additional comment from Xiaodai Wang on 2020-06-23 03:31:37 UTC ---

Here is the discussion:

http://post-office.corp.redhat.com/archives/v2v-devel/2020-June/msg00035.html

--- Additional comment from Xiaodai Wang on 2020-06-23 03:36:09 UTC ---



--- Additional comment from Nir Soffer on 2020-06-23 09:49:57 UTC ---

Just to make the command more clear:

/bin/virt-v2v \
    -i ova \
    -ic qemu:///system \
    -o rhv-upload \
    -os nfs_data \
    -of raw \
    -b ovirtmgmt \
    -n ovirtmgmt /var/lib/libvirt/images/v2v/ova-images/case84967/invalid_line_in_manifest.ova \
    -on ova_vm_toT \
    -oc https://hp-dl360eg8-03.lab.eng.pek2.redhat.com/ovirt-engine/api \
    -op /tmp/rhv_upload_passwd_file \
    -oo rhv-cafile=/tmp/rhv_upload_ca.pem \
    -oo rhv-cluster=NFS \
    -oo rhv-direct \
    -v \
    -x

The command looks broken, see the -n argument.

Why do you use ova? Are you testing ova support?

Can you reproduce using local disk?

virt-v2v \
        -i disk disk.img \
        -o rhv-upload \
        -oc https://my.engine/ovirt-engine/api \
        -op password-file \
        -on v2v \
        -os my-domain \
        -of raw \
        -oa sparse \
        -oo rhv-cafile=/etc/pki/vdsm/certs/cacert.pem \
        -oo rhv-cluster=my-cluster \
        -oo rhv-direct=true

The issue described sounds like an engine bug, so it should be reproducible
using local disk.

--- Additional comment from Xiaodai Wang on 2020-06-23 10:22:58 UTC ---

(In reply to Nir Soffer from comment #3)
> Just to make the command more clear:
> 
> /bin/virt-v2v \
>     -i ova \
>     -ic qemu:///system \
>     -o rhv-upload \
>     -os nfs_data \
>     -of raw \
>     -b ovirtmgmt \
>     -n ovirtmgmt
> /var/lib/libvirt/images/v2v/ova-images/case84967/invalid_line_in_manifest.
> ova \
>     -on ova_vm_toT \
>     -oc https://hp-dl360eg8-03.lab.eng.pek2.redhat.com/ovirt-engine/api \
>     -op /tmp/rhv_upload_passwd_file \
>     -oo rhv-cafile=/tmp/rhv_upload_ca.pem \
>     -oo rhv-cluster=NFS \
>     -oo rhv-direct \
>     -v \
>     -x
> 
> The command looks broken, see the -n argument.

It's not a broken argument. '-n' is duplicate with '-b'.

> 
> Why do you use ova? Are you testing ova support?

Yes, we also have ova testing cases for v2v.

> 
> Can you reproduce using local disk?
> 
> virt-v2v \
>         -i disk disk.img \
>         -o rhv-upload \
>         -oc https://my.engine/ovirt-engine/api \
>         -op password-file \
>         -on v2v \
>         -os my-domain \
>         -of raw \
>         -oa sparse \
>         -oo rhv-cafile=/etc/pki/vdsm/certs/cacert.pem \
>         -oo rhv-cluster=my-cluster \
>         -oo rhv-direct=true
> 
> The issue described sounds like an engine bug, so it should be reproducible
> using local disk.

Yes, I reproduced it with a local disk.

Steps:
1) Convert the OVA file to a qcow2 file.
qemu-img 'create' '-q' '-f' 'qcow2' '-b' 'json:{ "file": { "driver": "raw", "offset": 6656, "size": 4063232, "file": { "driver": "file", "filename": "/var/lib/libvirt/images/v2v/ova-images/case84967/invalid_line_in_manifest.ova" } } }' '-o' 'compat=1.1,backing_fmt=vmdk' '/var/tmp/v2v.qcow2'
2) export LIBGUESTFS_BACKEND=direct
3) Run v2v command(I copied '/var/tmp/v2v.qcow2' to '/root/' in case it's deleted by mistake.).
# virt-v2v -i disk -ic qemu:///system -o rhv-upload -os nfs_data -of raw -b ovirtmgmt -n ovirtmgmt /root/v2v.qcow2 -on testvm -oc https://hp-dl360eg8-03.lab.eng.pek2.redhat.com/ovirt-engine/api -op /root/rhv_upload_passwd_file -oo rhv-cafile=/root/rhv_upload_ca.pem -oo rhv-cluster=NFS -oo rhv-direct -v -x

I'm going to upload the v2v and ovirt-engine logs and also the local qcow2 disk if you'd like to reproduce it on your environment.

--- Additional comment from Xiaodai Wang on 2020-06-23 10:30:52 UTC ---



--- Additional comment from Tal Nisan on 2020-06-29 14:14:40 UTC ---

Richard, any insights about what is the issue here?

--- Additional comment from Richard W.M. Jones on 2020-07-01 11:24:29 UTC ---

(In reply to Tal Nisan from comment #6)
> Richard, any insights about what is the issue here?

From the virt-v2v point the conversion is completely successful in
both cases.

So the possibilities can only be:

* It's a RHV engine problem of some kind.

* virt-v2v's -o rhv-upload mode should wait a bit longer or
check the disk status in a different way from how we're doing
it at the moment to give time for the disk to "become available"
(whatever that means).  For reference here's what we're doing now
to wait for the disk to appear:

https://github.com/libguestfs/virt-v2v/blob/c1caf7132000a4560c3e20c2753978e8dd10036a/v2v/rhv-upload-plugin.py#L593

and the general VM creation code here which runs after the above:

https://github.com/libguestfs/virt-v2v/blob/master/v2v/rhv-upload-createvm.py

--- Additional comment from Richard W.M. Jones on 2020-07-01 11:27:21 UTC ---

To add that in the both cases in this bug we print this message:

  2020-06-21 23:35:41,381 process          L0458 DEBUG| [stderr] finalized after 1.02907133102417 seconds

  finalized after 1.0110740661621094 seconds

which given the code above seems to mean that the disk status did
change to types.DiskStatus.OK.

--- Additional comment from Nir Soffer on 2020-07-01 11:58:56 UTC ---

(In reply to Richard W.M. Jones from comment #7)
> (In reply to Tal Nisan from comment #6)
> > Richard, any insights about what is the issue here?
> 
> From the virt-v2v point the conversion is completely successful in
> both cases.
> 
> So the possibilities can only be:
> 
> * It's a RHV engine problem of some kind.

Very likely. I have seen in the past case when finalizing the transfer
just hangs forever. This is not reproducible and the next transfer 
succeeds.

> * virt-v2v's -o rhv-upload mode should wait a bit longer or
> check the disk status in a different way from how we're doing
> it at the moment to give time for the disk to "become available"
> (whatever that means).  For reference here's what we're doing now
> to wait for the disk to appear:
> 
> https://github.com/libguestfs/virt-v2v/blob/
> c1caf7132000a4560c3e20c2753978e8dd10036a/v2v/rhv-upload-plugin.py#L593

This should be the correct way to wait for the disk, but I don't 
know engine internals.

> and the general VM creation code here which runs after the above:
> 
> https://github.com/libguestfs/virt-v2v/blob/master/v2v/rhv-upload-createvm.py

This does not attach the disk but create a vm with ovf, maybe there is
engine bug handling this ovf, or some issue with the ovf.

--- Additional comment from Nir Soffer on 2020-07-01 12:04:26 UTC ---

(In reply to Richard W.M. Jones from comment #8)
> To add that in the both cases in this bug we print this message:
> 
>   2020-06-21 23:35:41,381 process          L0458 DEBUG| [stderr] finalized
> after 1.02907133102417 seconds
> 
>   finalized after 1.0110740661621094 seconds
> 
> which given the code above seems to mean that the disk status did
> change to types.DiskStatus.OK.

This is not the same message as in:
https://github.com/libguestfs/virt-v2v/blob/c1caf7132000a4560c3e20c2753978e8dd10036a/v2v/rhv-upload-plugin.py#L593

The relevant logs from finalize_transfer are:

    transfer %s finalized in %.3f seconds

On success, and in case of timeout (60 seconds?), we raise:

   RuntimeError("timed out waiting for transfer %s to finalize)

Can you point me to the virt-v2v code used here?

--- Additional comment from Richard W.M. Jones on 2020-07-01 12:14:51 UTC ---

Oh you're right (and I can't read ...).  I note also this is the old version
of virt-v2v (1.40.2, new version is 1.42.0).

This message changed in the following upstream commit:

  https://github.com/libguestfs/virt-v2v/commit/75fcf46379ec8cad31684f3ca3dd24caa7aaaa46

which means (as expected) we're not carrying all of the patches
needed in this branch.

Is there a chance to move to RHEL AV 8.3.0?

--- Additional comment from Nir Soffer on 2020-07-01 13:04:29 UTC ---

(In reply to Richard W.M. Jones from comment #11)
> Oh you're right (and I can't read ...).  I note also this is the old version
> of virt-v2v (1.40.2, new version is 1.42.0).
> 
> This message changed in the following upstream commit:
>  
> https://github.com/libguestfs/virt-v2v/commit/
> 75fcf46379ec8cad31684f3ca3dd24caa7aaaa46
> 
> which means (as expected) we're not carrying all of the patches
> needed in this branch.

I think the issue with the old code was not handling the ILLEGAL state
which could end in an endless loop instead of fast failure, but otherwise
we check for LOCKED and OK stages, so based on the logs, we detected 
that the disk was OK.

Looking in engine log:

1. Transfer take exclusive lock on the disk

2020-06-23 18:18:27,005+08 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-1340) [539a982e-8973-4783-a56e-298622c811ec] Lock Acqu
ired to object 'EngineLock:{exclusiveLocks='[700b798a-4434-44c7-b500-a03ff9363faf=DISK]', sharedLocks='[]'}'

2. Transfer was finalized

2020-06-23 18:18:29,638+08 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-1340) [1ff2ecec-eb40-46f3-9868-bf903f8f9db8] Updating image transfer 9996fe27-2881-41a0-99c1-80d0005ed3ea (image 700b798a-4434-44c7-b500-a03ff9363faf) phase to Finalizing Success

3. Starting to import vm from configuration

2020-06-23 18:18:33,098+08 INFO  [org.ovirt.engine.core.bll.exportimport.ImportVmFromConfigurationCommand] (default task-1340) [b313fbb8-c1c7-45c1-b919-2da431bb94ca] Lock Ac
quired to object 'EngineLock:{exclusiveLocks='[a65fddb4-48e1-46e1-a080-074d156cab8b=VM, testvm=VM_NAME]', sharedLocks='[a65fddb4-48e1-46e1-a080-074d156cab8b=REMOTE_VM]'}'

4. Import vm command fail to acquire lock on the disk

2020-06-23 18:18:33,349+08 INFO  [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-1340) [57cc33d7] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[a65fddb4-48e1-46e1-a080-074d156cab8b=VM_DISK_BOOT, 700b798a-4434-44c7-b500-a03ff9363faf=DISK]', sharedLocks=''}'

2020-06-23 18:18:33,349+08 WARN  [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-1340) [57cc33d7] Validation of action 'AttachDiskToVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__ATTACH_ACTION_TO,VAR__TYPE__DISK,ACTION_TYPE_FAILED_DISK_IS_LOCKED


5. Transfer marked as finished

2020-06-23 18:18:35,322+08 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [539a982e-8973-4783-a56e-298622c811ec] Updating image transfer 9996fe27-2881-41a0-99c1-80d0005ed3ea (image 700b798a-4434-44c7-b500-a03ff9363faf) phase to Finished Success


6. Transfer release exclusive lock on the disk

2020-06-23 18:18:35,398+08 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [539a982e-8973-4783-a56e-298622c811ec] Lock freed to object 'EngineLock:{exclusiveLocks='[700b798a-4434-44c7-b500-a03ff9363faf=DISK]', sharedLocks='[]'}'


We have 2 ways lock disks:
- set imagestatus to LOCKED (2) in the database
- take memory locks (e.g. EngineLock:{exclusiveLocks='[700b798a-4434-44c7-b500-a03ff9363faf=DISK]')

When starting a transfer the transfer command takes exclusive lock on the disk 
and change the disk status to LOCKED.

When ending the tranfer the transfer command release the exclusive lock and
change the disk status to OK.

Based on the logs, we change the disk to OK before releasing the exclusive 
lock, so when v2v continue the import the disk is still locked.

This is not the first time we had such issues.

--- Additional comment from Nir Soffer on 2020-07-01 13:10:24 UTC ---

Moving to RHV and changing severity to high since fails basic 
functionallity.

--- Additional comment from Avihai on 2020-07-08 09:39:47 UTC ---

(In reply to Nir Soffer from comment #13)
> Moving to RHV and changing severity to high since fails basic 
> functionallity.

Tal this is a severe bug that will affect customers as well please retarget to 4.4.1(if possible) or at least 4.4.2.

--- Additional comment from Ilan Zuckerman on 2020-07-08 09:52:04 UTC ---

Encountered similar issue when downloading disk with help of python3-ovirt-engine-sdk4:

When using SDK for downloading qcow2 disk, the transfer times out to finalize transfer with the following ERROR from the SDK [1], 
with the following error from engine [2],
With the following error from vdsm host [3]


[1]:
2020-07-05 10:37:18,495 - MainThread - RemoteExecutor - DEBUG - [root.16.37/qum5net] Results of command: python3 /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py download fac3911d-95b6-41be-a538-dad345691502 --password-file /root/pass.txt --engine-url https://hosted-engine-06.lab.eng.tlv2.redhat.com --username admin@internal --cafile /root/ca.crt --backup-dir /root/full_backup/ --backup-uuid 44ae937c-8c30-41a1-bb68-d4daf3177d7e
2020-07-05 10:37:18,496 - MainThread - RemoteExecutor - DEBUG - [root.16.37/qum5net]   OUT: [   0.0 ] Downloading VM fac3911d-95b6-41be-a538-dad345691502 disks
[   0.2 ] Creating image transfer for disk 2d075957-7481-4af3-9920-e63d7759af83
[   2.9 ] Image transfer 391d9654-0e59-4e24-9524-e59a37cb8b53 is ready
[ ------- ] 0 bytes, 0.00 seconds, 0 bytes/s                                   
Formatting '/root/full_backup/2d075957-7481-4af3-9920-e63d7759af83.202007051036.full.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
[  99.89% ] 1022.88 MiB, 0.22 seconds, 4.56 GiB/s                              
[  99.92% ] 1023.19 MiB, 0.33 seconds, 3.00 GiB/s                              
[  99.99% ] 1023.88 MiB, 0.44 seconds, 2.29 GiB/s                              
[ 100.00% ] 1.00 GiB, 0.46 seconds, 2.17 GiB/s                                 
[   3.3 ] Finalizing image transfer

2020-07-05 10:37:18,496 - MainThread - RemoteExecutor - DEBUG - [root.16.37/qum5net]   ERR: Traceback (most recent call last):
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py", line 476, in <module>
    main()
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py", line 173, in main
    args.command(args)
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py", line 244, in cmd_download
    connection, args.backup_uuid, args, incremental=args.incremental)
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py", line 385, in download_backup
    download_disk(connection, backup_uuid, disk, disk_path, args, incremental=incremental)
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py", line 430, in download_disk
    imagetransfer.finalize_transfer(connection, transfer, disk)
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 296, in finalize_transfer
    .format(transfer.id))
RuntimeError: Timed out waiting for transfer 391d9654-0e59-4e24-9524-e59a37cb8b53 to finalize

---------------------------------------------------------------------------------------------

[2]:
2020-07-05 10:37:20,801+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-48) [b434516f-9c7f-46fc-9f0c-50ee4ffc3775] START, GetImageTicketVDSCommand(HostName = host_mixed_1, GetImageTicketVDSCommandParameters:{hostId='9fb37721-708b-4327-9148-d3f302e51436', ticketId='0287b970-6b73-4b9e-b683-47c1f44dbe86', timeout='null'}), log id: 382fd9d6
2020-07-05 10:37:20,807+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-48) [b434516f-9c7f-46fc-9f0c-50ee4ffc3775] FINISH, GetImageTicketVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.ImageTicketInformation@dc80e36, log id: 382fd9d6
2020-07-05 10:37:20,825+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-48) [b434516f-9c7f-46fc-9f0c-50ee4ffc3775] EVENT_ID: DOWNLOAD_IMAGE_CANCELED_TIMEOUT(1,072), Download was canceled by system. Reason: timeout due to transfer inactivity.
2020-07-05 10:37:20,826+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-48) [b434516f-9c7f-46fc-9f0c-50ee4ffc3775] Updating image transfer 391d9654-0e59-4e24-9524-e59a37cb8b53 (image 2d075957-7481-4af3-9920-e63d7759af83) phase to Cancelled

---------------------------------------------------------------------------------------------

[3]:
2020-07-05 10:37:19,686+0300 INFO  (jsonrpc/7) [api.virt] FINISH stop_backup return={'status': {'code': 0, 'message': 'Done'}} from=::ffff:10.46.16.252,60486, flow_id=backups_syncAction_9850ddba-f8b8-4b8, vmId=fac3911d-95b6-41be-a538-dad345691502 (api:54)
2020-07-05 10:37:19,686+0300 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call VM.stop_backup succeeded in 0.01 seconds (__init__:312)
2020-07-05 10:37:19,694+0300 ERROR (libvirt/events) [virt.vm] (vmId='fac3911d-95b6-41be-a538-dad345691502') Block job (untracked) type BACKUP for drive sda was canceled (vm:5306)
2020-07-05 10:37:19,710+0300 ERROR (libvirt/events) [virt.vm] (vmId='fac3911d-95b6-41be-a538-dad345691502') Block job (untracked) type BACKUP for drive vdb was canceled (vm:5306)
2020-07-05 10:37:19,732+0300 ERROR (libvirt/events) [virt.vm] (vmId='fac3911d-95b6-41be-a538-dad345691502') Block job (untracked) type BACKUP for drive sdb was canceled (vm:5306)
2020-07-05 10:37:19,783+0300 ERROR (libvirt/events) [virt.vm] (vmId='fac3911d-95b6-41be-a538-dad345691502') Block job (untracked) type BACKUP for drive vdc was canceled (vm:5306)

--- Additional comment from Tomáš Golembiovský on 2021-01-12 12:49:53 UTC ---



--- Additional comment from Ilan Zuckerman on 2021-02-02 07:11:17 UTC ---

I recently open a new bug [1] which is apperantly a duplicate of this one.
Just to mention that the issue Nir is explaining in his comments might affect not only v2v, but also any Automated tests which involve upload / download disks, and also CBT test cases (as they also involve downloading and uploading), and also other manual efforts which involve the usage of SDK.

[1]:
https://bugzilla.redhat.com/1923178

--- Additional comment from RHEL Program Management on 2021-02-02 07:29:25 UTC ---

This bug report has Keywords: Regression or TestBlocker.

Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release.

Please resolve ASAP.

--- Additional comment from Tal Nisan on 2021-02-04 10:15:19 UTC ---



--- Additional comment from Ilan Zuckerman on 2021-02-15 09:05:54 UTC ---

(In reply to Ilan Zuckerman from comment #17)
> I recently open a new bug [1] which is apperantly a duplicate of this one.
> Just to mention that the issue Nir is explaining in his comments might
> affect not only v2v, but also any Automated tests which involve upload /
> download disks, and also CBT test cases (as they also involve downloading
> and uploading), and also other manual efforts which involve the usage of SDK.
> 
> [1]:
> https://bugzilla.redhat.com/1923178

Just making this more clear:

This issue (reported in 1923178) is first seen in rhv-4.4.5-4 , and was reproduced once more (by automated regression runs) in rhv-4.4.5-5
I noticed this only on NFS/gluster flavors

--- Additional comment from Pavel Bar on 2021-02-19 12:02:03 UTC ---

Adding additional details just in case (might be helpful) - there was a PR that was closed:
https://github.com/oVirt/ovirt-ansible-collection/pull/215

--- Additional comment from Eyal Shenitzky on 2021-02-21 07:43:31 UTC ---

(In reply to Pavel Bar from comment #21)
> Adding additional details just in case (might be helpful) - there was a PR
> that was closed:
> https://github.com/oVirt/ovirt-ansible-collection/pull/215

PR closed because it waits for the fix in the engine code instead of fixing it in the Ansible collection.

Comment 3 Nir Soffer 2021-07-12 00:28:38 UTC

I posted patch upstream for discussion:
https://listman.redhat.com/archives/libguestfs/2021-July/msg00009.html

Comment 4 Nir Soffer 2021-07-16 16:39:46 UTC

Fixed upstream in:
https://github.com/libguestfs/virt-v2v/commit/79702b28329d15a7485801ed7e915d486fcc0cf4

Comment 5 Xiaodai Wang 2021-08-10 03:11:45 UTC

The issue was reproduced again in virt-v2v-1.45.3-1.el9.x86_64 and 4.4.8.3-0.10.el8ev.

https://libvirt-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/v2v-RHEL-9.0-runtest-x86_64-function-convert_from_file-rhel/3/testReport/rhel/convert_from_file/positive_test_linux_input_mode_ova_parse_invalid_line_in_manifest_missing_space_output_mode_rhev_rhv_upload/

The engine log:
2021-08-09 21:20:30,728+08 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-66) [5c0e1348] EVENT_ID: VM_IMPORT_FROM_CONFIGURATION_ATTACH_DISKS_FAILED(175), VM ova_vm_1Qr has been imported from the given configuration but the following disk(s) failed to attach: 1a060938-b201-4a51-95af-6b2ad1f0130b.
2021-08-09 21:20:33,253+08 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-66) [9ed25544-50c6-44c8-a16a-404bca5d8b4b] EVENT_ID: USER_FAILED_RUN_VM(54), Failed to run VM ova_vm_1Qr due to a failed validation: [Cannot run VM without at least one bootable disk.

Comment 7 Nir Soffer 2021-08-10 12:08:55 UTC

(In reply to Xiaodai Wang from comment #5)
> The issue was reproduced again in virt-v2v-1.45.3-1.el9.x86_64 and
> 4.4.8.3-0.10.el8ev.

Engine reports FINISHED_SUCCESS status before it release the lock on the disk,
this is bug 1923178, which is targeted to RHV 4.4.9.

virt-v2v is doing the right thing, waiting on the image transfer status. We 
cannot do anything if engine reports incorrect status.

Here are the relevant events from engine log:

1. Image transfer finish (but it is actually did not finish yet)

2021-08-09 21:20:29,411+08 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-64) [a3786a76-07da-4bd8-8927-2d9ebceb4f70] Updating image transfer cd785714-07ca-465
6-bbe4-8542506535b8 (image 1a060938-b201-4a51-95af-6b2ad1f0130b) phase to Finished Success

2. virt-v2v attempt to create the vm

2021-08-09 21:20:30,670+08 INFO  [org.ovirt.engine.core.bll.exportimport.ImportVmFromConfigurationCommand] (default task-66) [b2c13365-4b9f-4172-962a-77bc9b106035] Running command: ImportVmFromConfigurationCommand internal: false. Entities affected :  ID
: f1388cb6-dc42-4e07-afe5-5a84b754c140 Type: ClusterAction group CREATE_VM with role type USER,  ID: 00000000-0000-0000-0000-000000000000 Type: StorageAction group IMPORT_EXPORT_VM with role type ADMIN
...
2021-08-09 21:20:30,725+08 WARN  [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-66) [5c0e1348] Validation of action 'AttachDiskToVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__ATTACH_ACTION_TO,VAR__TYPE__DISK,ACTION_TYPE_FAILED_DISK_IS_LOCKED

3. Image transfer finish and release the lock

2021-08-09 21:20:38,623+08 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-55) [a3786a76-07da-4bd8-8927-2d9ebceb4f70] Lock freed to object 'EngineLock:{exclusiveLocks='[1a060938-b201-4a51-95af-6b2ad1f0130b=DISK]', sharedLocks='[]'}'

2021-08-09 21:20:38,632+08 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-55) [a3786a76-07da-4bd8-8927-2d9ebceb4f70] EVENT_ID: TRANSFER_IMAGE_SUCCEEDED(1,032), Image Upload with disk ova_vm_1Qr-000 succeeded.

I could not reproduce this in my setup (tried about 100 imports). It is possible that
your engine is installed on a very old hardware and/or very low memory, so engine
is extremely slow?

Comment 8 Xiaodai Wang 2021-08-10 13:27:47 UTC

The server didn't have heavy consumption, and only one v2v job was running when it happened.
The ova file is also very small, only several megabytes.

-rwxr-xr-x. 1 nobody nobody 3.9M Apr 17  2017 ova-images/case84967/invalid_line_in_manifest-missing_space.ova

Comment 9 Nir Soffer 2021-08-10 13:56:08 UTC

(In reply to Xiaodai Wang from comment #8)
> The server didn't have heavy consumption, and only one v2v job was running
> when it happened.
> The ova file is also very small, only several megabytes.
> 
> -rwxr-xr-x. 1 nobody nobody 3.9M Apr 17  2017
> ova-images/case84967/invalid_line_in_manifest-missing_space.ova

Can you reproduce this with a real ova? I can easily reproduce this
issue by uploading and downloading empty image (takes few milliseconds
per uplaod/download), but I could never reproduce the issue with real
images (e.g. 6g fedora image).

Comment 10 Xiaodai Wang 2021-08-10 15:51:30 UTC

No, I never saw this issue happened with other real OVA files.

Comment 11 John Ferlan 2021-11-17 16:18:58 UTC

Rich - I'll move this to your capable hands to resolve. The bug is currently in POST w/o an ITR, but since you know the v2v release process better than me, I'll let you handle it.

Comment 13 Richard W.M. Jones 2022-04-12 11:26:31 UTC

Moving this back to backlog since although the bug was thought to be fixed
earlier, there is evidence that it's not really fixed and will require
more investigation and a reproducer.

Comment 14 Klaus Heinrich Kiwi 2022-04-13 14:44:47 UTC

(In reply to Richard W.M. Jones from comment #13)
> Moving this back to backlog since although the bug was thought to be fixed
> earlier, there is evidence that it's not really fixed and will require
> more investigation and a reproducer.

Thanks. Should we also do anything about the RHEL8.5 bug 1976024 which is CLOSED ERRATA? If RHEL 8 is fixed, we might need to flag this as a regression and get it fixed asap.

Comment 15 Nir Soffer 2022-04-13 15:00:18 UTC

(In reply to Richard W.M. Jones from comment #13)
> Moving this back to backlog since although the bug was thought to be fixed
> earlier, there is evidence that it's not really fixed and will require
> more investigation and a reproducer.

Where is the evidence that this is not fixed?

I think the original issue in RHV was fixed, and current virt-v2v monitors
the image transfer properly so this issue cannot happen now.

Comment 16 Richard W.M. Jones 2022-04-14 07:50:05 UTC

(In reply to Klaus Heinrich Kiwi from comment #14)
> Thanks. Should we also do anything about the RHEL8.5 bug 1976024 which is
> CLOSED ERRATA? If RHEL 8 is fixed, we might need to flag this as a
> regression and get it fixed asap.

I wouldn't do anything unless a customer reports a problem.  There
is no customer case attached to either this bug or 1976024.

(In reply to Nir Soffer from comment #15)
> Where is the evidence that this is not fixed?
> 
> I think the original issue in RHV was fixed, and current virt-v2v monitors
> the image transfer properly so this issue cannot happen now.

I have not tried to reproduce the bug myself, I'm just going off the
preceeding comments.  If it's either fixed -- or not a problem
because it somehow only affects tiny OVAs which we'd never see in real
life -- then let's close the bug.

Comment 17 Klaus Heinrich Kiwi 2022-04-20 11:59:14 UTC

(In reply to Richard W.M. Jones from comment #16)
> (In reply to Klaus Heinrich Kiwi from comment #14)
> > Thanks. Should we also do anything about the RHEL8.5 bug 1976024 which is
> > CLOSED ERRATA? If RHEL 8 is fixed, we might need to flag this as a
> > regression and get it fixed asap.
> 
> I wouldn't do anything unless a customer reports a problem.  There
> is no customer case attached to either this bug or 1976024.
> 
> (In reply to Nir Soffer from comment #15)
> > Where is the evidence that this is not fixed?
> > 
> > I think the original issue in RHV was fixed, and current virt-v2v monitors
> > the image transfer properly so this issue cannot happen now.
> 
> I have not tried to reproduce the bug myself, I'm just going off the
> preceeding comments.  If it's either fixed -- or not a problem
> because it somehow only affects tiny OVAs which we'd never see in real
> life -- then let's close the bug.

Let's take the latter approach unless there's evidence to do otherwise.

Note You need to log in before you can comment on or make changes to this bug.