Description of problem: Currently the SPM fails to switch to another host if the SPM has uncleared tasks. 2018-09-12 14:23:04,978+10 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-9) [5ca4484a-3205-4de0-9a56-13589f75a8bc] SpmStopVDSCommand::Not stopping SPM on vds 'host2.rhvlab', pool id '35e30250-a68b-11e8-91a3-52540015c1ff' as there are uncleared tasks 'Task '0b357849-a61f-41c6-b88f-e2676d519d98', status 'finished'' Is it not unusual, especially when troubleshooting other problems, for the SPM to contain uncleared tasks. For example, if GSS requests a customer to run some specific vdsm-cli command (i.e. snapshot issue) or some other failure that is cleared (i.e. cleaned on engine side). Usually this requires additional back and forth to clear the tasks and continue the problem solving, I've seen a few cases already where this happened. The request here is rather simple, to make everyone's life easier, please consider allowing StopSpm to proceed if all uncleared tasks have finished. Or make them engine clear them. I assume there isn't much left to be done for a finished task, so hopefully this is correct and possible. Version-Release number of selected component (if applicable): ovirt-engine-4.2.6.4-1.el7.noarch
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
Perhaps it will be better to add a button to clear finished tasks from the SPM, how complicated should this be Benny?
It shouldn't get too complicated, we can get the list of tasks using SPMGetAllTasksStatusesVDSCommand, and clear finished tasks manually using SPMClearTaskVDSCommand. However, there might be a complication if a task is cleared manually and the command which initiated the task is about to clear it itself
This is happening in 4.3.5. It's not possible to put the SPM host into maintenance due to finished but uncleared tasks. Package versions: ovirt-engine-4.3.5.5-0.1.el7.noarch vdsm-4.30.17-1.el7ev.x86_64 audit_log ~~~ 2019-09-27 06:44:58.162+00 | | Not stopping SPM on vds rhvh-01, pool id ca46e326-07cc-4c49-b7de-8475875b5d72 as there are uncleared tasks Task '77ee449f-91d3-4669-9ef8-289a504aed23', status 'finished' + | | Task '1e8236dd-09e7-400a-ba9f-20e3ed1883af', status 'finished' + | | Task 'b381109b-36d8-4f8f-91e1-734bd0d773ba', status 'finished' + | | Task 'f497db40-e182-45b7-a998-030e49b25134', status 'finished' + | | Task '5121fbaa-1eb0-4e58-ab5c-297976969682', status 'finished' + | | Task 'f2d47d50-19b5-4b06-b541-81aa58eb8373', status 'finished' 2019-09-27 06:44:58.169+00 | 6a2d3db3-f4ac-4b5f-b30e-ab2a3c0e1c39 | Failed to force select sleme as the SPM due to a failure to stop the current SPM. ~~~ engine.log ~~~ 2019-09-27 08:43:57,227+02 INFO [org.ovirt.engine.core.bll.storage.pool.ForceSelectSPMCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] Running command: ForceSelectSPMCommand internal: false. Entities affected : ID: 63eec750-85d5-43a2-9c91-8689cbf9da2f Type: VDSAction group MANIPULATE_HOST with role type ADMIN 2019-09-27 08:43:57,229+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SpmStopOnIrsVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] START, SpmStopOnIrsVDSCommand( SpmStopOnIrsVDSCommandParameters:{storagePoolId='ca46e326-07cc-4c49-b7de-8475875b5d72', ignoreFailoverLimit='false'}), log id: 1886c98c 2019-09-27 08:43:57,229+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{storagePoolId='ca46e326-07cc-4c49-b7de-8475875b5d72', ignoreFailoverLimit='false', vdsId='0010e09f-058f-4d26-b4aa-a52efb6d32e0', ignoreStopFailed='false'}), log id: 26ecc74d 2019-09-27 08:43:57,234+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] START, SpmStopVDSCommand(HostName = rhvh-01, SpmStopVDSCommandParameters:{hostId='0010e09f-058f-4d26-b4aa-a52efb6d32e0', storagePoolId='ca46e326-07cc-4c49-b7de-8475875b5d72'}), log id: 17c79803 2019-09-27 08:43:57,241+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] SpmStopVDSCommand::Not stopping SPM on vds 'rhvh-01', pool id 'ca46e326-07cc-4c49-b7de-8475875b5d72' as there are uncleared tasks 'Task '77ee449f-91d3-4669-9ef8-289a504aed23', status 'finished' Task '1e8236dd-09e7-400a-ba9f-20e3ed1883af', status 'finished' Task 'b381109b-36d8-4f8f-91e1-734bd0d773ba', status 'finished' Task 'f497db40-e182-45b7-a998-030e49b25134', status 'finished' Task '5121fbaa-1eb0-4e58-ab5c-297976969682', status 'finished' Task 'f2d47d50-19b5-4b06-b541-81aa58eb8373', status 'finished'' 2019-09-27 08:43:57,247+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds rhvh-01, pool id ca46e326-07cc-4c49-b7de-8475875b5d72 as there are uncleared tasks Task '77ee449f-91d3-4669-9ef8-289a504aed23', status 'finished' Task '1e8236dd-09e7-400a-ba9f-20e3ed1883af', status 'finished' Task 'b381109b-36d8-4f8f-91e1-734bd0d773ba', status 'finished' Task 'f497db40-e182-45b7-a998-030e49b25134', status 'finished' Task '5121fbaa-1eb0-4e58-ab5c-297976969682', status 'finished' Task 'f2d47d50-19b5-4b06-b541-81aa58eb8373', status 'finished' 2019-09-27 08:43:57,247+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] FINISH, SpmStopVDSCommand, return: , log id: 17c79803 2019-09-27 08:43:57,247+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] FINISH, ResetIrsVDSCommand, return: , log id: 26ecc74d 2019-09-27 08:43:57,247+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SpmStopOnIrsVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] FINISH, SpmStopOnIrsVDSCommand, return: , log id: 1886c98c 2019-09-27 08:43:57,252+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] EVENT_ID: USER_FORCE_SELECTED_SPM_STOP_FAILED(4,096), Failed to force select sleme as the SPM due to a failure to stop the current SPM. ~~~
Benny, is there a downside to add the clear finished task to the "Set as SPM" button which will do both? Can something wrong happen if we clear the task by this button?
(In reply to Tal Nisan from comment #9) > Benny, is there a downside to add the clear finished task to the "Set as > SPM" button which will do both? Can something wrong happen if we clear the > task by this button? Only thing coming to mind is if we attempt to clean up a task that is already in the process of being cleaned up automatically, but I guess this can be avoided by placing appropriate locks
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: RHV-M(ovirt-engine) 4.4.z security, bug fix, enhancement update [ovirt-4.4.4]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0381