Description of problem: During a Netbackup job, Image of VM snapshot is downloaded ok, but it is left in locked state after client connection times out and ticket is closed on ovirt-imageio Looks similar to BZ #1796231 Engine logs showing time out in ticket cancellation and unable to remove locked image: ~~~ 2022-02-15 20:45:49,697+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [f09f0e1c-4123-4778-917c-a7c823b42dc6] Failed in 'RemoveImageTicketVDS' method 2022-02-15 20:45:49,703+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [f09f0e1c-4123-4778-917c-a7c823b42dc6] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM <hostname> command RemoveImageTicketVDS failed: Image daemon request failed: 'status=409, reason=Conflict, error=Timeout cancelling ticket 07fe38f2-c130-45f0-84b3-51c399936899\n' 2022-02-15 20:45:49,703+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [f09f0e1c-4123-4778-917c-a7c823b42dc6] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand' return value 'StatusOnlyReturn [status=Status [code=482, message=Image daemon request failed: 'status=409, reason=Conflict, error=Timeout cancelling ticket 07fe38f2-c130-45f0-84b3-51c399936899\n']]' 2022-02-15 20:45:49,703+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [f09f0e1c-4123-4778-917c-a7c823b42dc6] HostName = <hostname> 2022-02-15 20:45:49,703+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [f09f0e1c-4123-4778-917c-a7c823b42dc6] Command 'RemoveImageTicketVDSCommand(HostName = <hostname>, RemoveImageTicketVDSCommandParameters:{hostId='86ba23d1-8f2d-4852-93d8-c323ef879f76', ticketId='07fe38f2-c130-45f0-84b3-51c399936899', timeout='null'})' execution failed: VDSGenericException: VDSErrorException: Failed to RemoveImageTicketVDS, error = Image daemon request failed: 'status=409, reason=Conflict, error=Timeout cancelling ticket 07fe38f2-c130-45f0-84b3-51c399936899\n', code = 482 2022-02-15 20:45:49,703+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [f09f0e1c-4123-4778-917c-a7c823b42dc6] FINISH, RemoveImageTicketVDSCommand, return: , log id: 5f8b94d0 2022-02-15 20:45:49,703+01 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [f09f0e1c-4123-4778-917c-a7c823b42dc6] Failed to stop image transfer session for ticket '07fe38f2-c130-45f0-84b3-51c399936899': {}: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to RemoveImageTicketVDS, error = Image daemon request failed: 'status=409, reason=Conflict, error=Timeout cancelling ticket 07fe38f2-c130-45f0-84b3-51c399936899\n', code = 482 (Failed with error ImageDeamonError and code 482) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:114) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.removeImageTicketFromDaemon(TransferDiskImageCommand.java:1270) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.stopImageTransferSession(TransferDiskImageCommand.java:1244) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.handleFinalizingSuccess(TransferDiskImageCommand.java:811) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.executeStateHandler(TransferDiskImageCommand.java:526) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.proceedCommandExecution(TransferDiskImageCommand.java:492) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferImageCommandCallback.doPolling(TransferImageCommandCallback.java:21) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at org.glassfish.javax.enterprise.concurrent.1.redhat-00001//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:360) at org.glassfish.javax.enterprise.concurrent.1.redhat-00001//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:511) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) at org.glassfish.javax.enterprise.concurrent.1.redhat-00001//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:227) Caused by: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to RemoveImageTicketVDS, error = Image daemon request failed: 'status=409, reason=Conflict, error=Timeout cancelling ticket 07fe38f2-c130-45f0-84b3-51c399936899\n', code = 482 2022-02-15 20:47:18,057+01 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-248) [30219dfb-71f3-4537-97e6-21ccd0618719] Lock Acquired to object 'EngineLock:{exclusiveLocks='[421cd54e-9f41-d987-976e-2e259c5e218f=VM]', sharedLocks=''}' 2022-02-15 20:47:18,089+01 WARN [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-248) [30219dfb-71f3-4537-97e6-21ccd0618719] Validation of action 'RemoveSnapshot' failed for user <username>. Reasons: VAR__TYPE__SNAPSHOT,VAR__ACTION__REMOVE,ACTION_TYPE_FAILED_DISKS_LOCKED,$diskAliases <VM name> 2022-02-15 20:47:18,089+01 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-248) [30219dfb-71f3-4537-97e6-21ccd0618719] Lock freed to object 'EngineLock:{exclusiveLocks='[421cd54e-9f41-d987-976e-2e259c5e218f=VM]', sharedLocks=''}' ~~~ ovirt-imageio logs showing timeout is reached after default 60s for remove_timeout config: ~~~ 2022-02-15 20:44:49,693 DEBUG (Thread-518) [auth] Cancelling ticket 07fe38f2-c130-45f0-84b3-51c399936899 2022-02-15 20:44:49,693 INFO (Thread-518) [auth] Waiting until ticket 07fe38f2-c130-45f0-84b3-51c399936899 is unused 2022-02-15 20:45:49,693 ERROR (Thread-518) [http] Client error: Timeout cancelling ticket 07fe38f2-c130-45f0-84b3-51c399936899 2022-02-15 20:46:59,153 DEBUG (Thread-455) [auth] Removing connection 455 context from ticket 07fe38f2-c130-45f0-84b3-51c399936899 2022-02-15 20:46:59,155 DEBUG (Thread-456) [auth] Removing connection 456 context from ticket 07fe38f2-c130-45f0-84b3-51c399936899 2022-02-15 20:46:59,158 DEBUG (Thread-456) [auth] Removing last connection for ticket 07fe38f2-c130-45f0-84b3-51c399936899 ~~~ Version-Release number of selected component (if applicable): ovirt-engine-4.4.9.5-0.1.el8ev.noarch ovirt-imageio-daemon-2.3.0-1.el8ev.x86_64 How reproducible: Unable to reproduce by now Actual results: Snapshot image is left in locked state. This disallows its removal and further backup jobs. Expected results: Snapshot image state should be cleared, so it can be removed. Additional info: According to associated case, this behavior appeared after upgrade from 4.4.7 to 4.4.9.
Based on the following comment, the "409 Conflict" error should be retried by the client, but it looks there's no attempt to retry the removal of the ticket: https://github.com/oVirt/ovirt-imageio/blob/d5e9c757e0a44fdd88d9e54b77283e718407b9cc/ovirt_imageio/_internal/config.py#L166 # Number of seconds to wait when removing a ticket. If ticket cannot be # removed within this timeout, the request will fail with "409 Conflict", # and the user need to retry the request again. A ticket can be removed # only when the number of connections using the ticket is zero. remove_timeout = 60
Reproducer with a vdsm code change: edit /usr/lib/python3.6/site-packages/vdsm/storage/imagetickets.py +95 failed_once = False @requires_image_daemon def remove_ticket(uuid): global failed_once if not failed_once: failed_once = True raise Exception("failed remove ticked") failed_once = False _request("DELETE", uuid) after editing: $ systemctl restart vdsmd Downloading a disk should show an exception like this: 2022-03-07 04:28:29,683-05 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-13) [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] Failed to stop image transfer 'a82a9461-da19-477e-8db5-b29370462ed9' for ticket '4a4f5a28-a53f-4dd7-a9a6-9ad4ab280b2d': {}: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to RemoveImageTicketVDS, error = failed remove ticked, code = 100 (Failed with error GeneralException and code 100) However, it will be retried and should succeed: 2022-03-07 04:28:33,720-05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54) [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] START, RemoveImageTicketVDSCommand(HostName = hosto72, RemoveImageTicketVDSCommandParameters:{hostId='7ee44999-d01b-43c2-93dc-499f6345a9fe', ticketId='4a4f5a28-a53f-4dd7-a9a6-9ad4ab280b2d', timeout='null'}), log id: 5789d81c 2022-03-07 04:28:33,735-05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54) [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] FINISH, RemoveImageTicketVDSCommand, return: StatusOnlyReturn [status=Status [code=0, message=Done]], log id: 5789d81c The disk should be unlocked and subsequent transfers on the same disk should succeed
(In reply to Benny Zlotnik from comment #9) > Reproducer with a vdsm code change: > > edit /usr/lib/python3.6/site-packages/vdsm/storage/imagetickets.py +95 > > failed_once = False > @requires_image_daemon > def remove_ticket(uuid): > global failed_once > > if not failed_once: > failed_once = True > raise Exception("failed remove ticked") > > failed_once = False > _request("DELETE", uuid) > > after editing: > $ systemctl restart vdsmd > > Downloading a disk should show an exception like this: > 2022-03-07 04:28:29,683-05 ERROR > [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-13) > [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] Failed to stop image transfer > 'a82a9461-da19-477e-8db5-b29370462ed9' for ticket > '4a4f5a28-a53f-4dd7-a9a6-9ad4ab280b2d': {}: > org.ovirt.engine.core.common.errors.EngineException: EngineException: > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: > VDSGenericException: VDSErrorException: Failed to RemoveImageTicketVDS, > error = failed remove ticked, code = 100 (Failed with error GeneralException > and code 100) > > However, it will be retried and should succeed: > 2022-03-07 04:28:33,720-05 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54) > [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] START, > RemoveImageTicketVDSCommand(HostName = hosto72, > RemoveImageTicketVDSCommandParameters:{hostId='7ee44999-d01b-43c2-93dc- > 499f6345a9fe', ticketId='4a4f5a28-a53f-4dd7-a9a6-9ad4ab280b2d', > timeout='null'}), log id: 5789d81c > > 2022-03-07 04:28:33,735-05 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54) > [d002a3d7-d9c1-41f7-a420-9f5ae48fcaad] FINISH, RemoveImageTicketVDSCommand, > return: StatusOnlyReturn [status=Status [code=0, message=Done]], log id: > 5789d81c > > The disk should be unlocked and subsequent transfers on the same disk should > succeed Verified with the above steps on ovirt-engine-4.4.10.7: Downloading a disk showed an exception: 2022-03-14 12:10:43,211+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-28) [dfd772a6-671f-42f5-abc4-627d2963923f] Failed to stop image transfer session for ticket 'a5c2a518-9e9a-4690-93eb-0af5cd4b8f7e': {}: org.ovirt.engine.core.common.errors. EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to RemoveImageTicketVDS, error = failed remove ticked, code = 100 (Failed with error GeneralException and code 100) Retried: 2022-03-14 12:10:53,234+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorSe rvice-engineScheduledThreadPool-Thread-75) [dfd772a6-671f-42f5-abc4-627d2963923f] START, RemoveImageTicketVDSCommand(HostName = host_mi xed_1, RemoveImageTicketVDSCommandParameters:{hostId='15614953-8aef-415c-bc8f-de97af12ee6f', ticketId='a5c2a518-9e9a-4690-93eb-0af5cd4b 8f7e', timeout='null'}), log id: 5e75230d 2022-03-14 12:10:53,244+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-75) [dfd772a6-671f-42f5-abc4-627d2963923f] FINISH, RemoveImageTicketVDSCommand, return: StatusOnlyReturn [status=Status [code=0, message=Done]], log id: 5e75230d Operation succeeded: 2022-03-14 12:10:54,559+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorSer vice-engineScheduledThreadPool-Thread-78) [dfd772a6-671f-42f5-abc4-627d2963923f] EVENT_ID: TRANSFER_IMAGE_SUCCEEDED(1,032), Image Downl oad with disk test succeeded. The disk is in 'OK' state. Moving to 'Verified'.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.10-3]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1052