Bug 2057445
Summary: | VM disk remains in locked state if image transfer (image download) times out | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Raul Aldaz <raldaz> |
Component: | ovirt-engine | Assignee: | Benny Zlotnik <bzlotnik> |
Status: | CLOSED ERRATA | QA Contact: | Evelina Shames <eshames> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.4.9 | CC: | aefrat, ahadas, bcholler, bzlotnik, emarcus, jortialc, mgokhool, mtessun, pelauter |
Target Milestone: | ovirt-4.4.10-3 | Keywords: | ZStream |
Target Release: | 4.4.10 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-4.4.10.7 | Doc Type: | Bug Fix |
Doc Text: |
Previously, when an image transfer failed during cleanup, the cleanup was not retried as it should, and the disk remained locked.
In this release, the retry mechanism for the transfer cleanup phase works correctly, and the disk is unlocked once the transfer is complete.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-24 13:30:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Raul Aldaz
2022-02-23 12:07:33 UTC
Based on the following comment, the "409 Conflict" error should be retried by the client, but it looks there's no attempt to retry the removal of the ticket: https://github.com/oVirt/ovirt-imageio/blob/d5e9c757e0a44fdd88d9e54b77283e718407b9cc/ovirt_imageio/_internal/config.py#L166 # Number of seconds to wait when removing a ticket. If ticket cannot be # removed within this timeout, the request will fail with "409 Conflict", # and the user need to retry the request again. A ticket can be removed # only when the number of connections using the ticket is zero. remove_timeout = 60 Reproducer with a vdsm code change: edit /usr/lib/python3.6/site-packages/vdsm/storage/imagetickets.py +95 failed_once = False @requires_image_daemon def remove_ticket(uuid): global failed_once if not failed_once: failed_once = True raise Exception("failed remove ticked") failed_once = False _request("DELETE", uuid) after editing: $ systemctl restart vdsmd Downloading a disk should show an exception like this: 2022-03-07 04:28:29,683-05 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-13) [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] Failed to stop image transfer 'a82a9461-da19-477e-8db5-b29370462ed9' for ticket '4a4f5a28-a53f-4dd7-a9a6-9ad4ab280b2d': {}: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to RemoveImageTicketVDS, error = failed remove ticked, code = 100 (Failed with error GeneralException and code 100) However, it will be retried and should succeed: 2022-03-07 04:28:33,720-05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54) [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] START, RemoveImageTicketVDSCommand(HostName = hosto72, RemoveImageTicketVDSCommandParameters:{hostId='7ee44999-d01b-43c2-93dc-499f6345a9fe', ticketId='4a4f5a28-a53f-4dd7-a9a6-9ad4ab280b2d', timeout='null'}), log id: 5789d81c 2022-03-07 04:28:33,735-05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54) [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] FINISH, RemoveImageTicketVDSCommand, return: StatusOnlyReturn [status=Status [code=0, message=Done]], log id: 5789d81c The disk should be unlocked and subsequent transfers on the same disk should succeed (In reply to Benny Zlotnik from comment #9) > Reproducer with a vdsm code change: > > edit /usr/lib/python3.6/site-packages/vdsm/storage/imagetickets.py +95 > > failed_once = False > @requires_image_daemon > def remove_ticket(uuid): > global failed_once > > if not failed_once: > failed_once = True > raise Exception("failed remove ticked") > > failed_once = False > _request("DELETE", uuid) > > after editing: > $ systemctl restart vdsmd > > Downloading a disk should show an exception like this: > 2022-03-07 04:28:29,683-05 ERROR > [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-13) > [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] Failed to stop image transfer > 'a82a9461-da19-477e-8db5-b29370462ed9' for ticket > '4a4f5a28-a53f-4dd7-a9a6-9ad4ab280b2d': {}: > org.ovirt.engine.core.common.errors.EngineException: EngineException: > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: > VDSGenericException: VDSErrorException: Failed to RemoveImageTicketVDS, > error = failed remove ticked, code = 100 (Failed with error GeneralException > and code 100) > > However, it will be retried and should succeed: > 2022-03-07 04:28:33,720-05 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54) > [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] START, > RemoveImageTicketVDSCommand(HostName = hosto72, > RemoveImageTicketVDSCommandParameters:{hostId='7ee44999-d01b-43c2-93dc- > 499f6345a9fe', ticketId='4a4f5a28-a53f-4dd7-a9a6-9ad4ab280b2d', > timeout='null'}), log id: 5789d81c > > 2022-03-07 04:28:33,735-05 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54) > [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] FINISH, RemoveImageTicketVDSCommand, > return: StatusOnlyReturn [status=Status [code=0, message=Done]], log id: > 5789d81c > > The disk should be unlocked and subsequent transfers on the same disk should > succeed Verified with the above steps on ovirt-engine-4.4.10.7: Downloading a disk showed an exception: 2022-03-14 12:10:43,211+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-28) [dfd772a6-671f-42f5-abc4-627d2963923f] Failed to stop image transfer session for ticket 'a5c2a518-9e9a-4690-93eb-0af5cd4b8f7e': {}: org.ovirt.engine.core.common.errors. EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to RemoveImageTicketVDS, error = failed remove ticked, code = 100 (Failed with error GeneralException and code 100) Retried: 2022-03-14 12:10:53,234+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorSe rvice-engineScheduledThreadPool-Thread-75) [dfd772a6-671f-42f5-abc4-627d2963923f] START, RemoveImageTicketVDSCommand(HostName = host_mi xed_1, RemoveImageTicketVDSCommandParameters:{hostId='15614953-8aef-415c-bc8f-de97af12ee6f', ticketId='a5c2a518-9e9a-4690-93eb-0af5cd4b 8f7e', timeout='null'}), log id: 5e75230d 2022-03-14 12:10:53,244+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-75) [dfd772a6-671f-42f5-abc4-627d2963923f] FINISH, RemoveImageTicketVDSCommand, return: StatusOnlyReturn [status=Status [code=0, message=Done]], log id: 5e75230d Operation succeeded: 2022-03-14 12:10:54,559+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorSer vice-engineScheduledThreadPool-Thread-78) [dfd772a6-671f-42f5-abc4-627d2963923f] EVENT_ID: TRANSFER_IMAGE_SUCCEEDED(1,032), Image Downl oad with disk test succeeded. The disk is in 'OK' state. Moving to 'Verified'. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.10-3]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1052 |