Description of problem: Open ImageIO transfer and wait until timeout expired After expiration, the transfer became inactive, but not closed VM disk left in a locked state until the user manually not finalizing the transfer. Version-Release number of selected component (if applicable): Two RHEV 4.4 nodes with layer rhvh-4.4.1.1-0.20200722.0+1 Ovirt-manager 4.4.1 NFS 4.1 storage domain ImageIO version: # rpm -qa | grep image ovirt-imageio-client-2.0.9-1.el8ev.x86_64 genisoimage-1.1.11-39.el8.x86_64 ovirt-imageio-common-2.0.9-1.el8ev.x86_64 ovirt-imageio-daemon-2.0.9-1.el8ev.x86_64 redhat-virtualization-host-image-update-placeholder-4.4.1-1.el8ev.noarch How reproducible: Steps to Reproduce: 1. Start an Image Transfer for upload. 2. Wait until the timeout expired. Actual results: After expiration, the transfer became inactive, but not closed VM disk left in a locked state until the user manually not finalizing the transfer. Expected results: After expiration, the transfer became inactive and closed. VM disk should be unlocked. Additional info: Probably this is: https://bugzilla.redhat.com/show_bug.cgi?id=1524184
(In reply to Yuriy Khokhlov (Veeam) from comment #0) > Steps to Reproduce: > 1. Start an Image Transfer for upload. > 2. Wait until the timeout expired. Do you mean inactivity_timeout? > Actual results: > After expiration, the transfer became inactive, but not closed > VM disk left in a locked state until the user manually not finalizing the > transfer. This is the expected behavior in the current system. We cancel transfer only for download and only if the the entire disk was downloaded. This heuristics is required when downloading disk from the UI, when the download is performed by the browser, and the browser has no way to tell the system when the operation is finished. When using the SDK, the SDK user is responsible for finalizing or cancelling transfers. > Expected results: > After expiration, the transfer became inactive and closed. > VM disk should be unlocked. It is possible to change the behavior so the system cancels image transfers after a timeout, but this may cause trouble to exiting applications that depend on the current behavior. I think this should be closed as not a bug. The backup application must ensure that image transfers are canceled or finalized in all cases.
Hello, Nir. > It is possible to change the behavior so the system cancels image transfers > after a timeout, but this may cause trouble to exiting applications that > depend on the current behavior. I understood your point. Its correct for UI download case. But, i don't agreed that backup software can always close transfer session. For example, backup server can crushed during backup, in that case vm will be locked in backup state, until system administrators fix backup server. Second thing - lost network connection for backup server to hypervisor. Third thing - just lost backup server database (ransomware attack?). Of course this issues can be fixed in some time, and after that backup software can finish session (but only if we have normal backup database with save sessions lock). But, all this time customer cant poweroff or restart vm and cant make changing in configuration, i think this behavior isn't good for customer. Is it possible to add special flag for UI, which preserve current behavior, but for regular API usage make this timeout working? Thanks.
(In reply to Yury.Panchenko from comment #2) Yuriy, what if we add a flag so you can control timeout policy? Example usage: <imagetransfer> <inactivity_timeout>60</inactivity_timeout> <timeout_policy>cancel</timeout_policy> ... </imagetransfer> With this if the system does not detect any activity for 60 seconds the system will cancel the transfer and unlock the disk. So if your backup application crashes or cannot access the engine the system will recover automatically. With the flag existing users not specifying the flag will use the old policy.
> Yuriy, what if we add a flag so you can control timeout policy? It will be great. We can use that flag in our software, and problem is gone.
Great!
Could NOT verify on rhv-release-4.4.5-7-001.noarch See below the following cases: Set timeout_policy to 'cancel' and Initiate Upload transfer set the timeout_policy to 'pause' and Initiate Download transfer Attaching engine + vdsm logs Setup: Modify the helpers/imagetransfer.py 'inactivity_timeout' to 10 sec: Leave the timeout policy as 'legacy' def create_transfer( connection, disk=None, direction=types.ImageTransferDirection.UPLOAD, host=None, backup=None, inactivity_timeout=10, timeout=60, disk_snapshot=None, shallow=None, timeout_policy=types.ImageTransferTimeoutPolicy.LEGACY): initiate Upload transfer. Expected: Upload will be paused due to inactivity. Actual: As expected [root@storage-ge13-vdsm1 examples]# python3 image_transfer.py -c engine upload 0826a12c-d05c-4331-bf4e-85153d2c3973 Engine.log: 2021-03-08 09:28:01,462+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [f29865ec-1a5a-4ff2-92a4-0343964e8780] EVENT_ID: UPLOAD_IMAGE_PAUSED_BY_SYSTEM_TIMEOUT(1,071), Upload was paused by system. Reason: timeout due to transfer inactivity. ------------------------------------- Initiate Download transfer Expected: The transfer is cancelled Actual: As expected [root@storage-ge13-vdsm1 examples]# python3 image_transfer.py -c engine download 0826a12c-d05c-4331-bf4e-85153d2c3973 engine.log: 2021-03-08 09:26:06,120+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-61) [79e18f93-0c9d-490b-baf4-397554c42eab] EVENT_ID: DOWNLOAD_IMAGE_CANCELED_TIMEOUT(1,072), Download was canceled by system. Reason: timeout due to transfer inactivity. --------------------------------------- Edit the file again, and set the timeout_policy to 'cancel': def create_transfer( connection, disk=None, direction=types.ImageTransferDirection.UPLOAD, host=None, backup=None, inactivity_timeout=10, timeout=60, disk_snapshot=None, shallow=None, timeout_policy=types.ImageTransferTimeoutPolicy.CANCEL): Initiate Download transfer Expected: The transfer is cancelled Actual: As expected 2021-03-08 09:36:16,778+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [288c0127-33b9-4945-8889-ffb2ad9ea9a6] EVENT_ID: DOWNLOAD_IMAGE_CANCELED_TIMEOUT(1,072), Download was canceled by system. Reason: timeout due to transfer inactivity. Initiate Upload transfer Expected: The transfer is cancelled !!!!! Actual: THE TRANSFER IS PAUSED !!!!! [root@storage-ge13-vdsm1 examples]# python3 image_transfer.py -c engine upload 0826a12c-d05c-4331-bf4e-85153d2c3973 engine.log: 2021-03-08 09:37:46,076+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-3) [e4c7aa5f-654e-4982-9b8d-a628e4bd91e8] EVENT_ID: UPLOAD_IMAGE_PAUSED_BY_SYSTEM_TIMEOUT(1,071), Upload was paused by system. Reason: timeout due to transfer inactivity. from image transfer log: 2021-03-08 09:37:31,407 INFO (MainThread) [helpers] Creating image transfer for disk=0826a12c-d05c-4331-bf4e-85153d2c3973, direction=upload host=None backup=None shallow=False timeout_policy=cancel 2021-03-08 09:37:32,295 INFO (MainThread) [helpers] Transfer --------------------------------------- Edit the file again, and set the timeout_policy to 'pause': def create_transfer( connection, disk=None, direction=types.ImageTransferDirection.UPLOAD, host=None, backup=None, inactivity_timeout=10, timeout=60, disk_snapshot=None, shallow=None, timeout_policy=types.ImageTransferTimeoutPolicy.PAUSE): Initiate Download transfer Expected: The transfer is PAUSED !!!! Actual: The transfer was CANCELLED !!!! [root@storage-ge13-vdsm1 examples]# python3 image_transfer.py -c engine download 0826a12c-d05c-4331-bf4e-85153d2c3973 engine.log: 2021-03-08 09:49:15,356+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-77) [bb5964fb-e4f1-436e-8c15-5cc7834a343f] EVENT_ID: DOWNLOAD_IMAGE_CANCELED_TIMEOUT(1,072), Download was canceled by system. Reason: timeout due to transfer inactivity. from image transfer log: 2021-03-08 09:49:00,268 INFO (MainThread) [helpers] Creating image transfer for disk=0826a12c-d05c-4331-bf4e-85153d2c3973, direction=download host=None backup=None shallow=False timeout_policy=pause Initiate Upload transfer Expected: The transfer is PAUSED Actual: As expected [root@storage-ge13-vdsm1 examples]# python3 image_transfer.py -c engine upload 0826a12c-d05c-4331-bf4e-85153d2c3973 engine.log: 2021-03-08 09:53:59,717+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-52) [cb58c751-ce7a-4c7d-8a14-4aa604a2a735] EVENT_ID: UPLOAD_IMAGE_PAUSED_BY_SYSTEM_TIMEOUT(1,071), Upload was paused by system. Reason: timeout due to transfer inactivity. from image transfer log: 2021-03-08 09:53:44,802 INFO (MainThread) [helpers] Creating image transfer for disk=0826a12c-d05c-4331-bf4e-85153d2c3973, direction=upload host=None backup=None shallow=False timeout_policy=pause
Created attachment 1761507 [details] logs
Hi Ilan, After a long debugging session, and forcing each transfer to be a timed-out transfer by disabling the time, and forcing each time to perform a timeout policy, I see that all the described is working in my environment. can you please provide more information, about image type, image size, or maybe share the same image. I see that if the image size is too small the transfer always will succeed. let's meet tomorrow.
After further checking with Ilan we found that the patch https://gerrit.ovirt.org/c/ovirt-engine/+/113126 is not included in realse: rhv-release-4.4.5-7-001.noarch, and will be included in the next release.
*build
Moving to verified. Both of the problematic cases from comment #8 were also verified on rhv-4.4.5-9 Set timeout_policy to 'cancel' and Initiate Upload transfer Expected: The transfer is cancelled Actual: the transfer is cancelled set the timeout_policy to 'pause' and Initiate Download transfer Expected: The transfer is Paused Actual: the transfer is paused
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.