Bug 2123141
Summary: | Unable to switch RHV host into maintenance mode as there are image transfer in progress | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Bimal Chollera <bcholler> |
Component: | ovirt-engine | Assignee: | Benny Zlotnik <bzlotnik> |
Status: | CLOSED ERRATA | QA Contact: | Ilia Markelov <imarkelo> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.5.1 | CC: | ahadas, aperotti, bzlotnik, dfodor, emarcus, gveitmic, jortialc, mavital, michal.skrivanek, mwest, nashok |
Target Milestone: | ovirt-4.5.3 | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-4.5.3 | Doc Type: | Bug Fix |
Doc Text: |
In this release, image transfers cannot move from the final state (finished successfully or finished with failure) back to the non-final state which could lead to hanging image transfers that block moving hosts to maintenance.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-11-16 12:17:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2092816 | ||
Bug Blocks: | 1541529 |
Description
Bimal Chollera
2022-08-31 18:57:52 UTC
* From the engine logs, the image transfer failed and cancelled. ~~~ 2022-08-31 17:36:38,082Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (default task-1) [b13f5a25-f264-44ba-85e9-7939157057e3] START, AddImageTicketVDSCommand(HostName = rhv_host.example.com, AddImageTicketVDSCommandParameters:{hostId='2a45acb5-e998-41b1-8a20-dbbf45bd8f2f', ticketId='00a10313-6d91-45bb-801c-70aca2f6eb25', timeout='300', operations='[read]', size='10737418240', url='nbd:unix:/run/vdsm/nbd/ab628d47-9472-4db7-911b-cb1c97ba2ef2.sock', filename='null', sparse='false', transferId='ab628d47-9472-4db7-911b-cb1c97ba2ef2', dirty='false'}), log id: 5e8530d4 2022-08-31 17:36:38,134Z INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [b13f5a25-f264-44ba-85e9-7939157057e3] EVENT_ID: TRANSFER_IMAGE_INITIATED(1,031), Image Download with disk XXXXXX_7.9_Disk1 was initiated by admin@internal-authz. 2022-08-31 17:36:52,694Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [b13f5a25-f264-44ba-85e9-7939157057e3] EVENT_ID: DOWNLOAD_IMAGE_CANCELED_TIMEOUT(1,072), Download was canceled by system. Reason: timeout due to transfer inactivity. 2022-08-31 17:37:12,849Z ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [b13f5a25-f264-44ba-85e9-7939157057e3] Finalizing failed image transfer 'ab628d47-9472-4db7-911b-cb1c97ba2ef2' for Download disk 'XXXXX-7.9_Disk1' (disk id: '9ea7ee44-5590-4785-86b2-652f1c0120e0', image id: 'e49f3ce7-7182-4d3b-b724-198324e0cb91') 2022-08-31 17:37:12,853Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [b13f5a25-f264-44ba-85e9-7939157057e3] START, RemoveImageTicketVDSCommand(HostName = rhv_host.example.com, RemoveImageTicketVDSCommandParameters:{hostId='2a45acb5-e998-41b1-8a20-dbbf45bd8f2f', ticketId='00a10313-6d91-45bb-801c-70aca2f6eb25', timeout='null'}), log id: 47ac1b3 2022-08-31 17:37:12,857Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RemoveImageTicketVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [b13f5a25-f264-44ba-85e9-7939157057e3] FINISH, RemoveImageTicketVDSCommand, return: StatusOnlyReturn [status=Status [code=0, message=Done]], log id: 47ac1b3 2022-08-31 17:37:12,858Z INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [b13f5a25-f264-44ba-85e9-7939157057e3] Successfully stopped image transfer 'ab628d47-9472-4db7-911b-cb1c97ba2ef2' for ticket '00a10313-6d91-45bb-801c-70aca2f6eb25' 2022-08-31 17:37:13,993Z INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-62) [b13f5a25-f264-44ba-85e9-7939157057e3] EVENT_ID: TRANSFER_IMAGE_CANCELLED(1,033), Image Download with disk XXXXX-7.9_Disk1 was cancelled. 2022-08-31 17:37:29,346Z INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferImageStatusCommand] (default task-1) [34ae66bb-9fcd-4dbf-9701-f60fc8ba2ebd] Running command: TransferImageStatusCommand internal: false. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: SystemAction group CREATE_DISK with role type USER 2022-08-31 17:37:29,365Z INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-1) [34ae66bb-9fcd-4dbf-9701-f60fc8ba2ebd] Updating image transfer 'ab628d47-9472-4db7-911b-cb1c97ba2ef2' phase from 'Finished Failure' to 'Finalizing Success' 2022-08-31 17:40:43,904Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [adbc55a5-c142-40f6-b112-2560b93e54d8] EVENT_ID: GENERIC_ERROR_MESSAGE(14,001), Cannot switch Host rhv_host.example.com to Maintenance mode. Image transfer is in progress for the following (1) disks: 9ea7ee44-5590-4785-86b2-652f1c0120e0 Please wait for the operations to complete and try again. 2022-08-31 17:40:43,904Z WARN [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-1) [adbc55a5-c142-40f6-b112-2560b93e54d8] Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_HOST_WITH_RUNNING_IMAGE_TRANSFERS,$host rhv_host.example.com,$disks 9ea7ee44-5590-4785-86b2-652f1c0120e0,$disks_COUNTER 1 --- AFTER ABOUT 1 HOUR ----- 2022-08-31 18:39:48,145Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3) [c7832fb3-2f80-47a3-902f-d413d2cad9e5] EVENT_ID: GENERIC_ERROR_MESSAGE(14,001), Cannot switch Host rhv_host.example.com to Maintenance mode. Image transfer is in progress for the following (1) disks: 9ea7ee44-5590-4785-86b2-652f1c0120e0 Please wait for the operations to complete and try again. 2022-08-31 18:39:48,146Z WARN [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-3) [c7832fb3-2f80-47a3-902f-d413d2cad9e5] Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_HOST_WITH_RUNNING_IMAGE_TRANSFERS,$host rhv_host.example.com,$disks 9ea7ee44-5590-4785-86b2-652f1c0120e0,$disks_COUNTER 1 ~~~ This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000). updating the version Benny, looking at this one it reminded me the issues you handled in https://github.com/oVirt/ovirt-engine/pull/159 but here the failure seems to happen later on due to inactivity - is it a different case that has is not being handled properly and results in a transition from 'Finished Failure' to 'Finalizing Success'? (In reply to Arik from comment #5) > Benny, looking at this one it reminded me the issues you handled in > https://github.com/oVirt/ovirt-engine/pull/159 > but here the failure seems to happen later on due to inactivity - is it a > different case that has is not being handled properly and results in a > transition from 'Finished Failure' to 'Finalizing Success'? I think it's the same issue that we have with allowing an incorrect phase transition, it seems like the script calls finalize which changes the phase, but this transfer has already finished (In reply to Benny Zlotnik from comment #6) > I think it's the same issue that we have with allowing an incorrect phase > transition, it seems like the script calls finalize which changes the phase, > but this transfer has already finished Ah right, which was fixed for backups (https://github.com/oVirt/ovirt-engine/commit/b7353bdec83f883e1cfd5fc299b7c625fd96247e) but not yet for image transfers (https://github.com/oVirt/ovirt-engine/pull/92)? if so, do you remember what was the context (bug/thread) of https://github.com/oVirt/ovirt-engine/pull/92? (In reply to Arik from comment #7) > (In reply to Benny Zlotnik from comment #6) > > I think it's the same issue that we have with allowing an incorrect phase > > transition, it seems like the script calls finalize which changes the phase, > > but this transfer has already finished > > Ah right, which was fixed for backups > (https://github.com/oVirt/ovirt-engine/commit/ > b7353bdec83f883e1cfd5fc299b7c625fd96247e) but not yet for image transfers > (https://github.com/oVirt/ovirt-engine/pull/92)? if so, do you remember what > was the context (bug/thread) of > https://github.com/oVirt/ovirt-engine/pull/92? I created this PR because I was aware of an issue, but we didn't have any concrete bug that was caused by it, but the current plan is to use this as a fix for https://bugzilla.redhat.com/show_bug.cgi?id=2092816 Should be resolved by the fix for bz 2092816 Due to QE capacity, we are not going to cover this issue in our automation, if you think we should please contact the QA Contact. Verified. The failed image_transfer was removed without any leftovers in the DB table. UI shows correct state of the disk. Versions: engine-4.5.3-0.2.el8ev vdsm-4.50.3.1-1.el8ev.x86_64 *** Bug 2131470 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.3] bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8502 |