Description of problem: * Unable to switch RHV host into maintenance mode. ~~~ Error while executing action: Cannot switch Host rhv_host.example.com to Maintenance mode. Image transfer is in progress for the following (1) disks: 9ea7ee44-5590-4785-86b2-652f1c0120e0 Please wait for the operations to complete and try again. ~~~ * Checking the image_transfer table in engine database, there seems to image transfer stuck in phase 7. ~~~ command_id | ab628d47-9472-4db7-911b-cb1c97ba2ef2 command_type | 1024 phase | 7 last_updated | 2022-08-31 17:37:29.365+00 message | vds_id | 2a45acb5-e998-41b1-8a20-dbbf45bd8f2f disk_id | 9ea7ee44-5590-4785-86b2-652f1c0120e0 imaged_ticket_id | proxy_uri | https://rhv_m.example.com:54323/images bytes_sent | 4096 bytes_total | 10737418240 type | 1 active | f daemon_uri | https://rhv_host.example.com:54322/images client_inactivity_timeout | 5 image_format | 5 backend | 1 backup_id | client_type | 2 shallow | f timeout_policy | legacy ~~~ * There are not active jobs in the command_entities tables either. * RHV-M UI/Webamin shows disk (Storage -> Disks) in "Finalizing" Status. Version-Release number of selected component (if applicable): ovirt-engine-4.5.1.3-0.28.el8ev.noarch How reproducible: 100% Steps to Reproduce: * Using the image_transfer.py download a disk image with --inactivity-timeout timeout ~~~ # cd /usr/share/doc/python3-ovirt-engine-sdk4/examples # python3 image_transfer.py -c engine1 download 9ea7ee44-5590-4785-86b2-652f1c0120e0 --inactivity-timeout=5 [ 0.0 ] Connecting to engine... [ 0.0 ] Looking up disk 9ea7ee44-5590-4785-86b2-652f1c0120e0 [ 0.7 ] Creating image transfer for download [ 2.5 ] Transfer ID: ab628d47-9472-4db7-911b-cb1c97ba2ef2 [ 2.5 ] Transfer host name: rhv_host.example.com [ 2.5 ] Transfer URL: https://rhv_host.example.com:54322/images/00a10313-6d91-45bb-801c-70aca2f6eb25 [ 2.5 ] Proxy URL: https://rhv_m.example.com:54323/images/00a10313-6d91-45bb-801c-70aca2f6eb25 [ 2.5 ] Conneted to imageio server [ 2.5 ] Reading from server... [ 52.5 ] Reading from server... [ 52.5 ] Finalizing image transfer... Traceback (most recent call last): File "image_transfer.py", line 177, in <module> client.read(0, buf) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/client/_api.py", line 393, in read return self._backend.readinto(buffer) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", line 237, in readinto res = self._get(length) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", line 433, in _get res = self._con.getresponse() File "/usr/lib64/python3.6/http/client.py", line 1361, in getresponse response.begin() File "/usr/lib64/python3.6/http/client.py", line 311, in begin version, status, reason = self._read_status() File "/usr/lib64/python3.6/http/client.py", line 280, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response During handling of the above exception, another exception occurred: Traceback (most recent call last): File "image_transfer.py", line 183, in <module> imagetransfer.finalize_transfer(connection, transfer, disk) File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 350, in finalize_transfer .format(transfer.id, transfer.phase)) RuntimeError: Timed out waiting for transfer ab628d47-9472-4db7-911b-cb1c97ba2ef2 to finalize, phase: finalizing_success ~~~ Actual results: Image_transfer still exists on the image_transfer table. Unable to move host into maintenance. Expected results: The failed image_transfer should be removed, allowing the host to enter maintenance mode. Additional info: Per BZ https://bugzilla.redhat.com/show_bug.cgi?id=1981297 implementation, we only clean up image_transfer in phase 9 and 10.
* From the engine logs, the image transfer failed and cancelled. ~~~ 2022-08-31 17:36:38,082Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (default task-1) [b13f5a25-f264-44ba-85e9-7939157057e3] START, AddImageTicketVDSCommand(HostName = rhv_host.example.com, AddImageTicketVDSCommandParameters:{hostId='2a45acb5-e998-41b1-8a20-dbbf45bd8f2f', ticketId='00a10313-6d91-45bb-801c-70aca2f6eb25', timeout='300', operations='[read]', size='10737418240', url='nbd:unix:/run/vdsm/nbd/ab628d47-9472-4db7-911b-cb1c97ba2ef2.sock', filename='null', sparse='false', transferId='ab628d47-9472-4db7-911b-cb1c97ba2ef2', dirty='false'}), log id: 5e8530d4 2022-08-31 17:36:38,134Z INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [b13f5a25-f264-44ba-85e9-7939157057e3] EVENT_ID: TRANSFER_IMAGE_INITIATED(1,031), Image Download with disk XXXXXX_7.9_Disk1 was initiated by admin@internal-authz. 2022-08-31 17:36:52,694Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [b13f5a25-f264-44ba-85e9-7939157057e3] EVENT_ID: DOWNLOAD_IMAGE_CANCELED_TIMEOUT(1,072), Download was canceled by system. Reason: timeout due to transfer inactivity. 2022-08-31 17:37:12,849Z ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [b13f5a25-f264-44ba-85e9-7939157057e3] Finalizing failed image transfer 'ab628d47-9472-4db7-911b-cb1c97ba2ef2' for Download disk 'XXXXX-7.9_Disk1' (disk id: '9ea7ee44-5590-4785-86b2-652f1c0120e0', image id: 'e49f3ce7-7182-4d3b-b724-198324e0cb91') 2022-08-31 17:37:12,853Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [b13f5a25-f264-44ba-85e9-7939157057e3] START, RemoveImageTicketVDSCommand(HostName = rhv_host.example.com, RemoveImageTicketVDSCommandParameters:{hostId='2a45acb5-e998-41b1-8a20-dbbf45bd8f2f', ticketId='00a10313-6d91-45bb-801c-70aca2f6eb25', timeout='null'}), log id: 47ac1b3 2022-08-31 17:37:12,857Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [b13f5a25-f264-44ba-85e9-7939157057e3] FINISH, RemoveImageTicketVDSCommand, return: StatusOnlyReturn [status=Status [code=0, message=Done]], log id: 47ac1b3 2022-08-31 17:37:12,858Z INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [b13f5a25-f264-44ba-85e9-7939157057e3] Successfully stopped image transfer 'ab628d47-9472-4db7-911b-cb1c97ba2ef2' for ticket '00a10313-6d91-45bb-801c-70aca2f6eb25' 2022-08-31 17:37:13,993Z INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-62) [b13f5a25-f264-44ba-85e9-7939157057e3] EVENT_ID: TRANSFER_IMAGE_CANCELLED(1,033), Image Download with disk XXXXX-7.9_Disk1 was cancelled. 2022-08-31 17:37:29,346Z INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferImageStatusCommand] (default task-1) [34ae66bb-9fcd-4dbf-9701-f60fc8ba2ebd] Running command: TransferImageStatusCommand internal: false. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: SystemAction group CREATE_DISK with role type USER 2022-08-31 17:37:29,365Z INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-1) [34ae66bb-9fcd-4dbf-9701-f60fc8ba2ebd] Updating image transfer 'ab628d47-9472-4db7-911b-cb1c97ba2ef2' phase from 'Finished Failure' to 'Finalizing Success' 2022-08-31 17:40:43,904Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [adbc55a5-c142-40f6-b112-2560b93e54d8] EVENT_ID: GENERIC_ERROR_MESSAGE(14,001), Cannot switch Host rhv_host.example.com to Maintenance mode. Image transfer is in progress for the following (1) disks: 9ea7ee44-5590-4785-86b2-652f1c0120e0 Please wait for the operations to complete and try again. 2022-08-31 17:40:43,904Z WARN [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-1) [adbc55a5-c142-40f6-b112-2560b93e54d8] Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_HOST_WITH_RUNNING_IMAGE_TRANSFERS,$host rhv_host.example.com,$disks 9ea7ee44-5590-4785-86b2-652f1c0120e0,$disks_COUNTER 1 --- AFTER ABOUT 1 HOUR ----- 2022-08-31 18:39:48,145Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3) [c7832fb3-2f80-47a3-902f-d413d2cad9e5] EVENT_ID: GENERIC_ERROR_MESSAGE(14,001), Cannot switch Host rhv_host.example.com to Maintenance mode. Image transfer is in progress for the following (1) disks: 9ea7ee44-5590-4785-86b2-652f1c0120e0 Please wait for the operations to complete and try again. 2022-08-31 18:39:48,146Z WARN [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-3) [c7832fb3-2f80-47a3-902f-d413d2cad9e5] Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_HOST_WITH_RUNNING_IMAGE_TRANSFERS,$host rhv_host.example.com,$disks 9ea7ee44-5590-4785-86b2-652f1c0120e0,$disks_COUNTER 1 ~~~
This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000).
updating the version
Benny, looking at this one it reminded me the issues you handled in https://github.com/oVirt/ovirt-engine/pull/159 but here the failure seems to happen later on due to inactivity - is it a different case that has is not being handled properly and results in a transition from 'Finished Failure' to 'Finalizing Success'?
(In reply to Arik from comment #5) > Benny, looking at this one it reminded me the issues you handled in > https://github.com/oVirt/ovirt-engine/pull/159 > but here the failure seems to happen later on due to inactivity - is it a > different case that has is not being handled properly and results in a > transition from 'Finished Failure' to 'Finalizing Success'? I think it's the same issue that we have with allowing an incorrect phase transition, it seems like the script calls finalize which changes the phase, but this transfer has already finished
(In reply to Benny Zlotnik from comment #6) > I think it's the same issue that we have with allowing an incorrect phase > transition, it seems like the script calls finalize which changes the phase, > but this transfer has already finished Ah right, which was fixed for backups (https://github.com/oVirt/ovirt-engine/commit/b7353bdec83f883e1cfd5fc299b7c625fd96247e) but not yet for image transfers (https://github.com/oVirt/ovirt-engine/pull/92)? if so, do you remember what was the context (bug/thread) of https://github.com/oVirt/ovirt-engine/pull/92?
(In reply to Arik from comment #7) > (In reply to Benny Zlotnik from comment #6) > > I think it's the same issue that we have with allowing an incorrect phase > > transition, it seems like the script calls finalize which changes the phase, > > but this transfer has already finished > > Ah right, which was fixed for backups > (https://github.com/oVirt/ovirt-engine/commit/ > b7353bdec83f883e1cfd5fc299b7c625fd96247e) but not yet for image transfers > (https://github.com/oVirt/ovirt-engine/pull/92)? if so, do you remember what > was the context (bug/thread) of > https://github.com/oVirt/ovirt-engine/pull/92? I created this PR because I was aware of an issue, but we didn't have any concrete bug that was caused by it, but the current plan is to use this as a fix for https://bugzilla.redhat.com/show_bug.cgi?id=2092816
Should be resolved by the fix for bz 2092816
Due to QE capacity, we are not going to cover this issue in our automation, if you think we should please contact the QA Contact.
Verified. The failed image_transfer was removed without any leftovers in the DB table. UI shows correct state of the disk. Versions: engine-4.5.3-0.2.el8ev vdsm-4.50.3.1-1.el8ev.x86_64
*** Bug 2131470 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.3] bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8502