Bug 1627997

Summary: [RFE] Allow SPM switching if all tasks have finished via REST-API
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Pavel Bar <pbar>
Status: CLOSED ERRATA QA Contact: Evelina Shames <eshames>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2.6CC: abpatil, aefrat, bzlotnik, jortialc, mike.hodgkinson, mkalinin, mtessun, nashok, pbar, pelauter, schandle, sfishbai, tnisan, usurse, vpagar
Target Milestone: ovirt-4.4.4Keywords: FutureFeature
Target Release: 4.4.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.4.5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-02 13:57:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 902971, 1547336, 1939548    

Description Germano Veit Michel 2018-09-12 05:05:54 UTC
Description of problem:

Currently the SPM fails to switch to another host if the SPM has uncleared tasks.

2018-09-12 14:23:04,978+10 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-9) [5ca4484a-3205-4de0-9a56-13589f75a8bc] SpmStopVDSCommand::Not stopping SPM on vds 'host2.rhvlab', pool id '35e30250-a68b-11e8-91a3-52540015c1ff' as there are uncleared tasks 'Task '0b357849-a61f-41c6-b88f-e2676d519d98', status 'finished''

Is it not unusual, especially when troubleshooting other problems, for the SPM to contain uncleared tasks. For example, if GSS requests a customer to run some specific vdsm-cli command (i.e. snapshot issue) or some other failure that is cleared (i.e. cleaned on engine side). Usually this requires additional back and forth to clear the tasks and continue the problem solving, I've seen a few cases already where this happened.

The request here is rather simple, to make everyone's life easier, please consider allowing StopSpm to proceed if all uncleared tasks have finished. Or make them engine clear them. 

I assume there isn't much left to be done for a finished task, so hopefully this is correct and possible.

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.6.4-1.el7.noarch

Comment 2 Sandro Bonazzola 2019-01-28 09:40:11 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 5 Tal Nisan 2019-07-10 13:47:36 UTC
Perhaps it will be better to add a button to clear finished tasks from the SPM, how complicated should this be Benny?

Comment 6 Benny Zlotnik 2019-07-10 13:59:11 UTC
It shouldn't get too complicated, we can get the list of tasks using SPMGetAllTasksStatusesVDSCommand, and clear finished tasks manually using SPMClearTaskVDSCommand.
However, there might be a complication if a task is cleared manually and the command which initiated the task is about to clear it itself

Comment 8 Juan Orti 2019-10-01 06:48:08 UTC
This is happening in 4.3.5. It's not possible to put the SPM host into maintenance due to finished but uncleared tasks.

Package versions:
ovirt-engine-4.3.5.5-0.1.el7.noarch
vdsm-4.30.17-1.el7ev.x86_64


audit_log

~~~
 2019-09-27 06:44:58.162+00    |                                      | Not stopping SPM on vds rhvh-01, pool id ca46e326-07cc-4c49-b7de-8475875b5d72 as there are uncleared tasks Task '77ee449f-91d3-4669-9ef8-289a504aed23', status 'finished'                                                                                                                                                                                                                                                                                                                                                                                                                                            +
                               |                                      | Task '1e8236dd-09e7-400a-ba9f-20e3ed1883af', status 'finished'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        +
                               |                                      | Task 'b381109b-36d8-4f8f-91e1-734bd0d773ba', status 'finished'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        +
                               |                                      | Task 'f497db40-e182-45b7-a998-030e49b25134', status 'finished'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        +
                               |                                      | Task '5121fbaa-1eb0-4e58-ab5c-297976969682', status 'finished'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        +
                               |                                      | Task 'f2d47d50-19b5-4b06-b541-81aa58eb8373', status 'finished'
 2019-09-27 06:44:58.169+00    | 6a2d3db3-f4ac-4b5f-b30e-ab2a3c0e1c39 | Failed to force select sleme as the SPM due to a failure to stop the current SPM.
~~~


engine.log

~~~
2019-09-27 08:43:57,227+02 INFO  [org.ovirt.engine.core.bll.storage.pool.ForceSelectSPMCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] Running command: ForceSelectSPMCommand internal: false. Entities affected :  ID: 63eec750-85d5-43a2-9c91-8689cbf9da2f Type: VDSAction group MANIPULATE_HOST with role type ADMIN
2019-09-27 08:43:57,229+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SpmStopOnIrsVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] START, SpmStopOnIrsVDSCommand( SpmStopOnIrsVDSCommandParameters:{storagePoolId='ca46e326-07cc-4c49-b7de-8475875b5d72', ignoreFailoverLimit='false'}), log id: 1886c98c
2019-09-27 08:43:57,229+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{storagePoolId='ca46e326-07cc-4c49-b7de-8475875b5d72', ignoreFailoverLimit='false', vdsId='0010e09f-058f-4d26-b4aa-a52efb6d32e0', ignoreStopFailed='false'}), log id: 26ecc74d
2019-09-27 08:43:57,234+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] START, SpmStopVDSCommand(HostName = rhvh-01, SpmStopVDSCommandParameters:{hostId='0010e09f-058f-4d26-b4aa-a52efb6d32e0', storagePoolId='ca46e326-07cc-4c49-b7de-8475875b5d72'}), log id: 17c79803
2019-09-27 08:43:57,241+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] SpmStopVDSCommand::Not stopping SPM on vds 'rhvh-01', pool id 'ca46e326-07cc-4c49-b7de-8475875b5d72' as there are uncleared tasks 'Task '77ee449f-91d3-4669-9ef8-289a504aed23', status 'finished'
Task '1e8236dd-09e7-400a-ba9f-20e3ed1883af', status 'finished'
Task 'b381109b-36d8-4f8f-91e1-734bd0d773ba', status 'finished'
Task 'f497db40-e182-45b7-a998-030e49b25134', status 'finished'
Task '5121fbaa-1eb0-4e58-ab5c-297976969682', status 'finished'
Task 'f2d47d50-19b5-4b06-b541-81aa58eb8373', status 'finished''
2019-09-27 08:43:57,247+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds rhvh-01, pool id ca46e326-07cc-4c49-b7de-8475875b5d72 as there are uncleared tasks Task '77ee449f-91d3-4669-9ef8-289a504aed23', status 'finished'
Task '1e8236dd-09e7-400a-ba9f-20e3ed1883af', status 'finished'
Task 'b381109b-36d8-4f8f-91e1-734bd0d773ba', status 'finished'
Task 'f497db40-e182-45b7-a998-030e49b25134', status 'finished'
Task '5121fbaa-1eb0-4e58-ab5c-297976969682', status 'finished'
Task 'f2d47d50-19b5-4b06-b541-81aa58eb8373', status 'finished'
2019-09-27 08:43:57,247+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] FINISH, SpmStopVDSCommand, return: , log id: 17c79803
2019-09-27 08:43:57,247+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] FINISH, ResetIrsVDSCommand, return: , log id: 26ecc74d
2019-09-27 08:43:57,247+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SpmStopOnIrsVDSCommand] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] FINISH, SpmStopOnIrsVDSCommand, return: , log id: 1886c98c
2019-09-27 08:43:57,252+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3143) [b4824152-7848-4f44-aee7-ce66730d5e0f] EVENT_ID: USER_FORCE_SELECTED_SPM_STOP_FAILED(4,096), Failed to force select sleme as the SPM due to a failure to stop the current SPM.
~~~

Comment 9 Tal Nisan 2019-12-17 16:26:12 UTC
Benny, is there a downside to add the clear finished task to the "Set as SPM" button which will do both? Can something wrong happen if we clear the task by this button?

Comment 10 Benny Zlotnik 2019-12-19 08:00:53 UTC
(In reply to Tal Nisan from comment #9)
> Benny, is there a downside to add the clear finished task to the "Set as
> SPM" button which will do both? Can something wrong happen if we clear the
> task by this button?
Only thing coming to mind is if we attempt to clean up a task that is already in the process of being cleaned up automatically, but I guess this can be avoided by placing appropriate locks

Comment 20 errata-xmlrpc 2021-02-02 13:57:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: RHV-M(ovirt-engine) 4.4.z security, bug fix, enhancement update [ovirt-4.4.4]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0381