Bug 1974741

Summary: Disk images remain in locked state if the HE VM is rebooted during a image transfer
Product: Red Hat Enterprise Virtualization Manager Reporter: Juan Orti <jortialc>
Component: ovirt-engineAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED ERRATA QA Contact: Ilia Markelov <imarkelo>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4.6CC: ahadas, chhu, emarcus, juzhou, mavital, mxie
Target Milestone: ovirt-4.5.0   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.5.0.1 Doc Type: Bug Fix
Doc Text:
Previously, a bug in the finalization mechanism left the disk locked in the database. In this release, the finalization mechanism works correctly, and the disk remains unlocked in all scenarios.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-26 16:22:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1985906    
Attachments:
Description Flags
engine.log none

Description Juan Orti 2021-06-22 12:32:23 UTC
Created attachment 1793017 [details]
engine.log

Description of problem:
If we are downloading a snapshot disk and the HE VM is rebooted, the image transfer fails but the disk remains in locked state in the database.

Version-Release number of selected component (if applicable):
ovirt-engine-4.4.6.8-0.1.el8ev.noarch
ovirt-imageio-daemon-2.1.1-1.el8ev.x86_64
RHVH host 4.4.6.1-0.20210527.0

How reproducible:
I've reproduced it two times (100%)

Steps to Reproduce:
1. Create a snapshot of a VM
2. Download the snapshot from an external machine using the proxy_url:

# curl -k -X POST \
    -H 'Version: 4' \
    -H 'Accept: application/json' \
    -d 'grant_type=password&scope=ovirt-app-api&username=admin%40internal&password=password' \
    https://jorti-rhvm44.lab.sbr-virt.gsslab.brq.redhat.com/ovirt-engine/sso/oauth/token

# curl -k -X POST \
    -H 'Version: 4' \
    -H 'Accept: application/xml' \
    -H 'Content-Type: application/xml' \
    -H 'Authorization: Bearer ISXD5wwMtWjBEFKQivXc_LHRyjTHFsojTGNnvIxWIXabxkCZK0TFzhqPQ0gXXtR63hz8SHAd3yuayBj0tk0OgQ' \
    -d '<image_transfer><snapshot id="595574ec-5b73-4407-978a-d986a21a735f"/><direction>download</direction></image_transfer>' \
    https://jorti-rhvm44.lab.sbr-virt.gsslab.brq.redhat.com/ovirt-engine/api/imagetransfers

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<image_transfer href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00" id="ccfef028-9541-466c-ab60-96e88b5ead00">
    <actions>
        <link href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00/resume" rel="resume"/>
        <link href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00/cancel" rel="cancel"/>
        <link href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00/extend" rel="extend"/>
        <link href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00/pause" rel="pause"/>
        <link href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00/finalize" rel="finalize"/>
    </actions>
    <active>false</active>
    <direction>download</direction>
    <format>cow</format>
    <inactivity_timeout>60</inactivity_timeout>
    <phase>transferring</phase>
    <proxy_url>https://jorti-rhvm44.lab.sbr-virt.gsslab.brq.redhat.com:54323/images/a3a3cbac-12df-4bf0-8e20-3707ce40fe8b</proxy_url>
    <shallow>false</shallow>
    <timeout_policy>legacy</timeout_policy>
    <transfer_url>https://jorti-rhvh44-01.lab.sbr-virt.gsslab.brq.redhat.com:54322/images/a3a3cbac-12df-4bf0-8e20-3707ce40fe8b</transfer_url>
    <transferred>0</transferred>
    <host href="/ovirt-engine/api/hosts/fd89aed2-fdae-4e90-bec6-8a6f8e16d5e9" id="fd89aed2-fdae-4e90-bec6-8a6f8e16d5e9"/>
</image_transfer>

# curl -vk \
    -H 'Version: 4' \
    -H 'Authorization: Bearer ISXD5wwMtWjBEFKQivXc_LHRyjTHFsojTGNnvIxWIXabxkCZK0TFzhqPQ0gXXtR63hz8SHAd3yuayBj0tk0OgQ' \
    https://jorti-rhvm44.lab.sbr-virt.gsslab.brq.redhat.com:54323/images/a3a3cbac-12df-4bf0-8e20-3707ce40fe8b -o /var/tmp/image_transfer.qcow2    
    

3. While the download is in progress, reboot the HostedEngine VM.

Actual results:
The image transfer is marked as failed and it is logged that the lock on the disk is freed, however its status is still locked in the database.

2021-06-22 12:54:52,350+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-3) [a8a0f626-642b-4aca-9333-3c31b8f04aab] Failed to transfer disk '595574ec-5b73-4407-978a-d986a21a735f' (command id 'ccfef028-9541-466c-ab60-96e88b5ead00')
2021-06-22 12:54:52,351+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-3) [a8a0f626-642b-4aca-9333-3c31b8f04aab] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand' successfully.
2021-06-22 12:54:52,352+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-3) [a8a0f626-642b-4aca-9333-3c31b8f04aab] Lock freed to object 'EngineLock:{exclusiveLocks='[116646a2-de9d-4339-bff2-1b4ea9ede5a5=DISK]', sharedLocks='[75465d8e-681f-4258-81b5-d0e9e9d092f6=VM]'}' <----
2021-06-22 12:54:52,457+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-3) [a8a0f626-642b-4aca-9333-3c31b8f04aab] Error during log command: org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand. Exception null


engine=> select image_guid, parentid, image_group_id, imagestatus, active from images where image_group_id = '116646a2-de9d-4339-bff2-1b4ea9ede5a5';
              image_guid              |               parentid               |            image_group_id            | imagestatus | active 
--------------------------------------+--------------------------------------+--------------------------------------+-------------+--------
 ebb3b181-272d-4ebe-a1c2-b0e075c57ea2 | 595574ec-5b73-4407-978a-d986a21a735f | 116646a2-de9d-4339-bff2-1b4ea9ede5a5 |           1 | t
 595574ec-5b73-4407-978a-d986a21a735f | 00000000-0000-0000-0000-000000000000 | 116646a2-de9d-4339-bff2-1b4ea9ede5a5 |           2 | f <----
(2 rows)


Expected results:
Image unlocked after the transfer is marked as failed.

Additional info:
In customer's environment this happened after the restart of several ovirt services (ovirt-engine, ovirt-imageio, ovirt-vmconsole-proxy-sshd, ovirt-provider-ovn and ovirt-websocket-proxy) instead of the reboot of the full VM.

Comment 2 mxie@redhat.com 2022-01-17 04:15:06 UTC
I met similar problem on rhv-4.4.10.2-0.1.el8ev, cannot switch rhv node to Maintenance mode because Image transfer is in progress although all images have been cleaned in storage->disk option, please refer to screenshot, besides, stopping ovirt-imageio service can't resolve this problem

Comment 4 Arik 2022-02-01 20:53:05 UTC
(In reply to mxie from comment #2)
> I met similar problem on rhv-4.4.10.2-0.1.el8ev, cannot switch rhv node to
> Maintenance mode because Image transfer is in progress although all images
> have been cleaned in storage->disk option, please refer to screenshot,
> besides, stopping ovirt-imageio service can't resolve this problem

This sounds like a different issue - we enable switching a host to maintenance when a image transfer is in status FAILED
However, the issue you face might be the same as the one reported in bz 2037057 - so if you can provide us logs that cover the lifecycle of the image transfer(s) due to which the move-to-maintenance is blocked (in a separate bug), that would be appreciated

Comment 5 Benny Zlotnik 2022-02-09 15:08:37 UTC
A simple reproducer:
1. Start disk download
2. While transfer is in progress, stop ovirt-engine
3. Restart the ovirt-imageio service
4. Start ovirt-engine

Comment 7 Benny Zlotnik 2022-03-30 11:52:16 UTC
This should be fixed by recent changes in image transfer, moving to ON_QA

Comment 12 mxie@redhat.com 2022-03-31 09:27:03 UTC
(In reply to Arik from comment #4)
> (In reply to mxie from comment #2)
> > I met similar problem on rhv-4.4.10.2-0.1.el8ev, cannot switch rhv node to
> > Maintenance mode because Image transfer is in progress although all images
> > have been cleaned in storage->disk option, please refer to screenshot,
> > besides, stopping ovirt-imageio service can't resolve this problem
> 
> This sounds like a different issue - we enable switching a host to
> maintenance when a image transfer is in status FAILED
> However, the issue you face might be the same as the one reported in bz
> 2037057 - so if you can provide us logs that cover the lifecycle of the
> image transfer(s) due to which the move-to-maintenance is blocked (in a
> separate bug), that would be appreciated

Just saw the comment, file bug2070491 to track the problem, thanks!

Comment 13 Benny Zlotnik 2022-04-27 13:39:13 UTC
It is likely fixes for the following bugs
https://bugzilla.redhat.com/show_bug.cgi?id=2043984
https://bugzilla.redhat.com/show_bug.cgi?id=2057445 (this one was backported to 4.4.10)

Fixed this issue

Comment 14 Ilia Markelov 2022-04-28 15:46:42 UTC
Verified.

The disk is not locked after reproducing the steps.

Versions:
ovirt-engine-4.5.0.5-0.7.el8ev 
ovirt-imageio-daemon-2.4.3-1.el8ev.x86_64

Comment 19 errata-xmlrpc 2022-05-26 16:22:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4711

Comment 20 meital avital 2022-08-08 19:54:30 UTC
Due to QE capacity, we are not going to cover this issue in our automation