Bug 1974741 - Disk images remain in locked state if the HE VM is rebooted during a image transfer
Summary: Disk images remain in locked state if the HE VM is rebooted during a image tr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.5.0
: ---
Assignee: Benny Zlotnik
QA Contact: Ilia Markelov
URL:
Whiteboard:
Depends On:
Blocks: 1985906
TreeView+ depends on / blocked
 
Reported: 2021-06-22 12:32 UTC by Juan Orti
Modified: 2024-10-01 18:44 UTC (History)
6 users (show)

Fixed In Version: ovirt-engine-4.5.0.1
Doc Type: Bug Fix
Doc Text:
Previously, a bug in the finalization mechanism left the disk locked in the database. In this release, the finalization mechanism works correctly, and the disk remains unlocked in all scenarios.
Clone Of:
Environment:
Last Closed: 2022-05-26 16:22:29 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine.log (11.02 MB, text/plain)
2021-06-22 12:32 UTC, Juan Orti
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-42599 0 None None None 2022-01-19 15:18:03 UTC
Red Hat Knowledge Base (Solution) 6133221 0 None None None 2021-06-22 13:36:00 UTC
Red Hat Product Errata RHSA-2022:4711 0 None None None 2022-05-26 16:22:55 UTC

Description Juan Orti 2021-06-22 12:32:23 UTC
Created attachment 1793017 [details]
engine.log

Description of problem:
If we are downloading a snapshot disk and the HE VM is rebooted, the image transfer fails but the disk remains in locked state in the database.

Version-Release number of selected component (if applicable):
ovirt-engine-4.4.6.8-0.1.el8ev.noarch
ovirt-imageio-daemon-2.1.1-1.el8ev.x86_64
RHVH host 4.4.6.1-0.20210527.0

How reproducible:
I've reproduced it two times (100%)

Steps to Reproduce:
1. Create a snapshot of a VM
2. Download the snapshot from an external machine using the proxy_url:

# curl -k -X POST \
    -H 'Version: 4' \
    -H 'Accept: application/json' \
    -d 'grant_type=password&scope=ovirt-app-api&username=admin%40internal&password=password' \
    https://jorti-rhvm44.lab.sbr-virt.gsslab.brq.redhat.com/ovirt-engine/sso/oauth/token

# curl -k -X POST \
    -H 'Version: 4' \
    -H 'Accept: application/xml' \
    -H 'Content-Type: application/xml' \
    -H 'Authorization: Bearer ISXD5wwMtWjBEFKQivXc_LHRyjTHFsojTGNnvIxWIXabxkCZK0TFzhqPQ0gXXtR63hz8SHAd3yuayBj0tk0OgQ' \
    -d '<image_transfer><snapshot id="595574ec-5b73-4407-978a-d986a21a735f"/><direction>download</direction></image_transfer>' \
    https://jorti-rhvm44.lab.sbr-virt.gsslab.brq.redhat.com/ovirt-engine/api/imagetransfers

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<image_transfer href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00" id="ccfef028-9541-466c-ab60-96e88b5ead00">
    <actions>
        <link href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00/resume" rel="resume"/>
        <link href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00/cancel" rel="cancel"/>
        <link href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00/extend" rel="extend"/>
        <link href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00/pause" rel="pause"/>
        <link href="/ovirt-engine/api/imagetransfers/ccfef028-9541-466c-ab60-96e88b5ead00/finalize" rel="finalize"/>
    </actions>
    <active>false</active>
    <direction>download</direction>
    <format>cow</format>
    <inactivity_timeout>60</inactivity_timeout>
    <phase>transferring</phase>
    <proxy_url>https://jorti-rhvm44.lab.sbr-virt.gsslab.brq.redhat.com:54323/images/a3a3cbac-12df-4bf0-8e20-3707ce40fe8b</proxy_url>
    <shallow>false</shallow>
    <timeout_policy>legacy</timeout_policy>
    <transfer_url>https://jorti-rhvh44-01.lab.sbr-virt.gsslab.brq.redhat.com:54322/images/a3a3cbac-12df-4bf0-8e20-3707ce40fe8b</transfer_url>
    <transferred>0</transferred>
    <host href="/ovirt-engine/api/hosts/fd89aed2-fdae-4e90-bec6-8a6f8e16d5e9" id="fd89aed2-fdae-4e90-bec6-8a6f8e16d5e9"/>
</image_transfer>

# curl -vk \
    -H 'Version: 4' \
    -H 'Authorization: Bearer ISXD5wwMtWjBEFKQivXc_LHRyjTHFsojTGNnvIxWIXabxkCZK0TFzhqPQ0gXXtR63hz8SHAd3yuayBj0tk0OgQ' \
    https://jorti-rhvm44.lab.sbr-virt.gsslab.brq.redhat.com:54323/images/a3a3cbac-12df-4bf0-8e20-3707ce40fe8b -o /var/tmp/image_transfer.qcow2    
    

3. While the download is in progress, reboot the HostedEngine VM.

Actual results:
The image transfer is marked as failed and it is logged that the lock on the disk is freed, however its status is still locked in the database.

2021-06-22 12:54:52,350+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-3) [a8a0f626-642b-4aca-9333-3c31b8f04aab] Failed to transfer disk '595574ec-5b73-4407-978a-d986a21a735f' (command id 'ccfef028-9541-466c-ab60-96e88b5ead00')
2021-06-22 12:54:52,351+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-3) [a8a0f626-642b-4aca-9333-3c31b8f04aab] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand' successfully.
2021-06-22 12:54:52,352+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-3) [a8a0f626-642b-4aca-9333-3c31b8f04aab] Lock freed to object 'EngineLock:{exclusiveLocks='[116646a2-de9d-4339-bff2-1b4ea9ede5a5=DISK]', sharedLocks='[75465d8e-681f-4258-81b5-d0e9e9d092f6=VM]'}' <----
2021-06-22 12:54:52,457+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-3) [a8a0f626-642b-4aca-9333-3c31b8f04aab] Error during log command: org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand. Exception null


engine=> select image_guid, parentid, image_group_id, imagestatus, active from images where image_group_id = '116646a2-de9d-4339-bff2-1b4ea9ede5a5';
              image_guid              |               parentid               |            image_group_id            | imagestatus | active 
--------------------------------------+--------------------------------------+--------------------------------------+-------------+--------
 ebb3b181-272d-4ebe-a1c2-b0e075c57ea2 | 595574ec-5b73-4407-978a-d986a21a735f | 116646a2-de9d-4339-bff2-1b4ea9ede5a5 |           1 | t
 595574ec-5b73-4407-978a-d986a21a735f | 00000000-0000-0000-0000-000000000000 | 116646a2-de9d-4339-bff2-1b4ea9ede5a5 |           2 | f <----
(2 rows)


Expected results:
Image unlocked after the transfer is marked as failed.

Additional info:
In customer's environment this happened after the restart of several ovirt services (ovirt-engine, ovirt-imageio, ovirt-vmconsole-proxy-sshd, ovirt-provider-ovn and ovirt-websocket-proxy) instead of the reboot of the full VM.

Comment 2 mxie@redhat.com 2022-01-17 04:15:06 UTC
I met similar problem on rhv-4.4.10.2-0.1.el8ev, cannot switch rhv node to Maintenance mode because Image transfer is in progress although all images have been cleaned in storage->disk option, please refer to screenshot, besides, stopping ovirt-imageio service can't resolve this problem

Comment 4 Arik 2022-02-01 20:53:05 UTC
(In reply to mxie from comment #2)
> I met similar problem on rhv-4.4.10.2-0.1.el8ev, cannot switch rhv node to
> Maintenance mode because Image transfer is in progress although all images
> have been cleaned in storage->disk option, please refer to screenshot,
> besides, stopping ovirt-imageio service can't resolve this problem

This sounds like a different issue - we enable switching a host to maintenance when a image transfer is in status FAILED
However, the issue you face might be the same as the one reported in bz 2037057 - so if you can provide us logs that cover the lifecycle of the image transfer(s) due to which the move-to-maintenance is blocked (in a separate bug), that would be appreciated

Comment 5 Benny Zlotnik 2022-02-09 15:08:37 UTC
A simple reproducer:
1. Start disk download
2. While transfer is in progress, stop ovirt-engine
3. Restart the ovirt-imageio service
4. Start ovirt-engine

Comment 7 Benny Zlotnik 2022-03-30 11:52:16 UTC
This should be fixed by recent changes in image transfer, moving to ON_QA

Comment 12 mxie@redhat.com 2022-03-31 09:27:03 UTC
(In reply to Arik from comment #4)
> (In reply to mxie from comment #2)
> > I met similar problem on rhv-4.4.10.2-0.1.el8ev, cannot switch rhv node to
> > Maintenance mode because Image transfer is in progress although all images
> > have been cleaned in storage->disk option, please refer to screenshot,
> > besides, stopping ovirt-imageio service can't resolve this problem
> 
> This sounds like a different issue - we enable switching a host to
> maintenance when a image transfer is in status FAILED
> However, the issue you face might be the same as the one reported in bz
> 2037057 - so if you can provide us logs that cover the lifecycle of the
> image transfer(s) due to which the move-to-maintenance is blocked (in a
> separate bug), that would be appreciated

Just saw the comment, file bug2070491 to track the problem, thanks!

Comment 13 Benny Zlotnik 2022-04-27 13:39:13 UTC
It is likely fixes for the following bugs
https://bugzilla.redhat.com/show_bug.cgi?id=2043984
https://bugzilla.redhat.com/show_bug.cgi?id=2057445 (this one was backported to 4.4.10)

Fixed this issue

Comment 14 Ilia Markelov 2022-04-28 15:46:42 UTC
Verified.

The disk is not locked after reproducing the steps.

Versions:
ovirt-engine-4.5.0.5-0.7.el8ev 
ovirt-imageio-daemon-2.4.3-1.el8ev.x86_64

Comment 19 errata-xmlrpc 2022-05-26 16:22:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4711

Comment 20 meital avital 2022-08-08 19:54:30 UTC
Due to QE capacity, we are not going to cover this issue in our automation


Note You need to log in before you can comment on or make changes to this bug.