Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1592180

Summary: [RFE] Add 'force' option to clear aborted task using vdsm-client
Product: Red Hat Enterprise Virtualization Manager Reporter: nijin ashok <nashok>
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED WONTFIX QA Contact: Evelina Shames <eshames>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2.3CC: abpatil, aefrat, ahadas, eshames, fdelorey, guillaume.pavese, gveitmic, jortialc, lsurette, mperina, nashok, nsoffer, omachace, pkliczew, schandle, smaudet, srevivo, tnisan, ycui
Target Milestone: ---Keywords: EasyFix, FutureFeature, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-23 16:15:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 902971    

Description nijin ashok 2018-06-18 04:47:43 UTC
Description of problem:

Currently, "force" option for "stopTask" is not exposed to vdsm-client/vdsClient. So it's impossible to stop the task which is in the aborted stage other than manual steps. It will fail with the error below.

===
2018-06-01 15:39:37,703+1200 ERROR (Thread-1450) [storage.TaskManager.Task] (Task='ecc67ee5-7b53-421f-89a6-cd1cfcf10845') Unexpected error (task:872)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 879, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in stopTask
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2243, in stopTask
    return self.taskMng.stopTask(taskID=taskID, force=force)
  File "/usr/share/vdsm/storage/taskManager.py", line 158, in stopTask
    t.stop(force=force)
  File "/usr/share/vdsm/storage/task.py", line 1253, in stop
    self._incref(force)
  File "/usr/share/vdsm/storage/task.py", line 985, in _incref
    raise se.TaskAborted(unicode(self))
TaskAborted: Task is aborted: u'e5ad2e3a-5506-4305-b0d5-9f738a04c2bf' - code 411
===


Version-Release number of selected component (if applicable):

rhvm-4.2.3

How reproducible:

100%

Steps to Reproduce:


Actual results:

No "force" option for stopTask via vdsm-client

Expected results:

Provide "force" option for stopTask via vdsm-client

Additional info:

Comment 2 Piotr Kliczewski 2018-06-22 13:19:52 UTC
This parameter was not supported by jsonrpc since the beginning (3.5). It is not defined in the schema [1] and the Api.py [2]. We have this parameter on the storage api level [3]. This BZ seems to be an RFE to change the api.


[1] https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml#L9929
[2] https://github.com/oVirt/vdsm/blob/master/lib/vdsm/API.py#L118
[3] https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/hsm.py#L2232

Comment 3 Nir Soffer 2018-06-22 14:29:30 UTC
(In reply to nijin ashok from comment #0)
> Description of problem:
> 
> Currently, "force" option for "stopTask" is not exposed to
> vdsm-client/vdsClient. So it's impossible to stop the task which is in the
> aborted stage other than manual steps. It will fail with the error below.
...
> TaskAborted: Task is aborted: u'e5ad2e3a-5506-4305-b0d5-9f738a04c2bf' - code
> 411

Again a bug asking for specific solution instead of describing the problem and
how it effects users.

If the task is aborted, why do we need to stop it? can we use clearTask?

nijin, can you explain:
- what the user is trying to do, and why
- how the system respond
- how the system should respond
- what flows are effected (engine?, vdsm-client?)

I'm changing the title to reflect the my current understanding on this bug.

Also, this is a storage issue. Even if the issue is missing argument in vdsm api
schema, this must be handled by the storage team.

Comment 4 Guillaume Pavese 2018-06-29 16:51:01 UTC
Hi, 

as a data point, 

i tried to modify a bonding mode (mode 2 to mode 1) on a host and something seems to have gone wrong.

Since I tried rebooting the host, tried all args variants of taskcleaner.sh (never got anything listed by this command)

I still have two tasks listed as running in the manager for hours :

"Invoking Activate Host" and "Configuring networks on host"

That seems to prevent me from reinstalling the host or removing it :
"Cannot remove Host. Related operation is currently in progress. Please try again later."

Which is unfortunate as I am now stuck.

So without knowing if that's the real solution, I would too like to see a way to cancel/stop running tasks to get out of that sort of situations.

Comment 5 Nir Soffer 2018-06-29 17:17:36 UTC
(In reply to Guillaume Pavese from comment #4)
Thanks for adding more info on this. It seems that the issue is how to abort 
tasks in the system.

Is this info about one of the cases attached to this bug?

I'm moving this to ovirt-engine since engine is responsible for managing running
tasks. The fact that you have "stuck" tasks means that engine failed to manage
them.

Now, even if we have a bug in engine, it may be possible to stop running storage 
tasks and clear them on the host, as a way to work around possible engine bug.
Note that activating hosts or configureing host networks are not storage tasks.

If you still have the host in this state, it would be useful to check if there are
running tasks on the host using this command:

    vdsm-client Host getAllTasksInfo

If there are running tasks, you can stop them:

    vdsm-client Task stop taskID=xxx-yyy

If there are finished tasks, you can clear them:

    vdsm-client Task clear taskID=xxx-yyy

Please check the online help for available commands:

$ sudo vdsm-client Task -h
usage: vdsm-client Task [-h] method [arg=value] ...

optional arguments:
  -h, --help          show this help message and exit

Task methods:
  method [arg=value]
    getStatus         Get Task status information.
    revert            Rollback a Task to restore the previous system state.
    clear             Discard information about a finished Task.
    getInfo           Get information about a Task.
    stop              Stop a currently running Task.

If you cannot stop and clear tasks using vdsm-client, please file a separate vdsm
bug.

Comment 6 Guillaume Pavese 2018-06-29 18:03:23 UTC
I only got empty results by doing (on the concerned host but the other host running the engine too):

vdsm-client Host getAllTasksInfo
{}

I managed to "recover" by powering off the engine, rebooting the host holding it then hosted-engine --vm-start

The two phantom tasks are now gone and my two hosts are up and running...

That was not optimal. 
Maybe there was a supported way of doing it, but seeing the tasks in the GUI while not with vdsm-client, seems to call for a way to manage (cancel/stop/clean) them from the GUI.

Comment 11 Nir Soffer 2018-07-02 16:23:26 UTC
Moving back to vdsm, Piotr will file a bug for ovirt-engine.

nijin, please see comment 3.

Comment 12 Piotr Kliczewski 2018-07-03 08:25:20 UTC
(In reply to Nir Soffer from comment #11)
> Moving back to vdsm, Piotr will file a bug for ovirt-engine.
> 

Here is the engine bug: BZ #1597554.

Comment 14 Nir Soffer 2018-07-09 15:45:06 UTC
(In reply to nijin ashok from comment #13)
Thanks for clarifying this issue.

Do you know how to reproduce these stuck tasks?

Also, do you have the output of getAllTasksInfo or similar apis showing info about
the stuck tasks?

Comment 15 nijin ashok 2018-07-09 15:50:34 UTC
(In reply to Nir Soffer from comment #14)
> (In reply to nijin ashok from comment #13)
> Thanks for clarifying this issue.
> 
> Do you know how to reproduce these stuck tasks?
> 

Nop, I was not able to reproduce this.

> Also, do you have the output of getAllTasksInfo or similar apis showing info
> about
> the stuck tasks?

cat sos_strings/vdsm/Host.getAllTasksInfo
{"e5ad2e3a-5506-4305-b0d5-9f738a04c2bf": {"verb": "createVolume", "id": "e5ad2e3a-5506-4305-b0d5-9f738a04c2bf"}, "773ae23d-fd76-48ba-a1f7-7d425e86cdb1": {"verb": "createVolume", "id": "773ae23d-fd76-48ba-a1f7-7d425e86cdb1"}, "ab896f90-e87b-4acb-a728-9c59b009d9cc": {"verb": "createVolume", "id": "ab896f90-e87b-4acb-a728-9c59b009d9cc"}, "bf7257df-47e3-46ef-aaba-a81228d0f330": {"verb": "createVolume", "id": "bf7257df-47e3-46ef-aaba-a81228d0f330"}, "c43d42a3-b329-4581-9443-71de80015c5c": {"verb": "createVolume", "id": "c43d42a3-b329-4581-9443-71de80015c5c"}, "84d7336b-d262-4a2a-8c70-a89e18f236ad": {"verb": "createVolume", "id": "84d7336b-d262-4a2a-8c70-a89e18f236ad"}, "30116e4a-1b70-4b6f-8c58-cc9067ea918b": {"verb": "createVolume", "id": "30116e4a-1b70-4b6f-8c58-cc9067ea918b"}}

Comment 16 Sandro Bonazzola 2019-01-28 09:41:41 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 19 Daniel Gur 2019-08-28 13:14:40 UTC
sync2jira

Comment 20 Daniel Gur 2019-08-28 13:19:43 UTC
sync2jira

Comment 21 Juan Orti 2020-03-25 10:41:06 UTC
This has happened to a customer when trying to cancel a task that was stuck for several days.

Versions:
RHV-H 4.3.7.1
vdsm-4.30.38-1.el7ev.x86_64

ovirt-engine-4.3.7.2-0.1.el7.noarch

## engine.log
## The merge command fails, causing the taks to become stuck:
2020-03-22 04:47:31,010Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-6445) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Lock Acquired to object 'EngineLock:{exclusiveLocks='[9a405876-2660-4490-afa8-38f557b7594d=VM]', sharedLocks=''}'
2020-03-22 04:47:31,080Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-6445) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: RemoveSnapshotCommand internal: false. Entities affected :  ID: 9a405876-2660-4490-afa8-38f557b7594d Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER
2020-03-22 04:47:31,082Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-6445) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Lock freed to object 'EngineLock:{exclusiveLocks='[9a405876-2660-4490-afa8-38f557b7594d=VM]', sharedLocks=''}'
2020-03-22 04:47:31,119Z INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-6445) [] EVENT_ID: USER_REMOVE_SNAPSHOT(342), Snapshot '_GX_BACKUP_vm01_335513_29318_server01-bkp' deletion for VM 'vm01' was initiated by backup.
2020-03-22 04:47:31,123Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-8) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: RemoveSnapshotSingleDiskLiveCommand internal: true. Entities affected :  ID: 59e7b50b-6fc4-4db0-910b-f32e52da0a40 Type: Storage
2020-03-22 04:47:31,350Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-22) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Executing Live Merge command step 'EXTEND'
2020-03-22 04:47:31,369Z INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-22) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshot' (id: 'f050a543-622a-4b93-8ef9-3fadc418d9fa') waiting on child command id: 'ed3508db-2269-4d82-a803-93608e56f91e
' type:'RemoveSnapshotSingleDiskLive' to complete
2020-03-22 04:47:31,369Z INFO  [org.ovirt.engine.core.bll.MergeExtendCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: MergeExtendCommand internal: true. Entities affected :  ID: 59e7b50b-6fc4-4db0-910b-f32e52da0a40 Type: Storage
2020-03-22 04:47:31,369Z INFO  [org.ovirt.engine.core.bll.MergeExtendCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Refreshing volume 63c60c27-86b3-4b3c-98f3-a28935fdad2e on host d3987c96-d6f3-4c81-8099-1bbbdd1aec71
2020-03-22 04:47:31,375Z INFO  [org.ovirt.engine.core.bll.RefreshVolumeCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: RefreshVolumeCommand internal: true.
2020-03-22 04:47:31,376Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshVolumeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] START, RefreshVolumeVDSCommand(HostName = host01, RefreshVolumeVDSCommandParameters:{hostId='d3987c96-d6f3-4c81-8099-1bbbdd1aec71', storagePoolId='bf
44d153-5113-467d-9232-0384452a39c9', storageDomainId='59e7b50b-6fc4-4db0-910b-f32e52da0a40', imageGroupId='c497a0a3-25bd-4868-871f-85e086f2b6c9', imageId='63c60c27-86b3-4b3c-98f3-a28935fdad2e'}), log id: 4165c76d
2020-03-22 04:47:31,907Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshVolumeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] FINISH, RefreshVolumeVDSCommand, return: , log id: 4165c76d
2020-03-22 04:47:31,907Z INFO  [org.ovirt.engine.core.bll.RefreshVolumeCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Successfully refreshed volume '63c60c27-86b3-4b3c-98f3-a28935fdad2e' on host 'd3987c96-d6f3-4c81-8099-1bbbdd1aec71'
2020-03-22 04:47:33,381Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-65) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Executing Live Merge command step 'MERGE'
2020-03-22 04:47:33,393Z INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-65) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshot' (id: 'f050a543-622a-4b93-8ef9-3fadc418d9fa') waiting on child command id: 'ed3508db-2269-4d82-a803-93608e56f91e
' type:'RemoveSnapshotSingleDiskLive' to complete
2020-03-22 04:47:33,397Z INFO  [org.ovirt.engine.core.bll.MergeCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: MergeCommand internal: true. Entities affected :  ID: 59e7b50b-6fc4-4db0-910b-f32e52da0a40 Type: Storage
2020-03-22 04:47:33,398Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] START, MergeVDSCommand(HostName = host01, MergeVDSCommandParameters:{hostId='d3987c96-d6f3-4c81-8099-1bbbdd1aec71', vmId='9a405876-2660-4490-afa8-38f557b7594
d', storagePoolId='bf44d153-5113-467d-9232-0384452a39c9', storageDomainId='59e7b50b-6fc4-4db0-910b-f32e52da0a40', imageGroupId='c497a0a3-25bd-4868-871f-85e086f2b6c9', imageId='3bea2515-420e-403f-9571-3589bb103f6a', baseImageId='63c60c27-86b3-4b3c-98f3-a28935fdad2e', topImageId='3bea2515-420e-403f-9571-3589bb103f6a', bandwidth='0'}), log id: 74a21
a9
2020-03-22 04:47:33,402Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Failed in 'MergeVDS' method
2020-03-22 04:47:33,404Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host01 command MergeVDS failed: Drive image file could not be found
2020-03-22 04:47:33,404Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturn [status=Status [code=13, message=Drive image file could not be 
found]]'
2020-03-22 04:47:33,404Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] HostName = host01
2020-03-22 04:47:33,404Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'MergeVDSCommand(HostName = host01, MergeVDSCommandParameters:{hostId='d3987c96-d6f3-4c81-8099-1bbbdd1aec71', vmId='9a405876-2660-4490-afa8-38f557b75
94d', storagePoolId='bf44d153-5113-467d-9232-0384452a39c9', storageDomainId='59e7b50b-6fc4-4db0-910b-f32e52da0a40', imageGroupId='c497a0a3-25bd-4868-871f-85e086f2b6c9', imageId='3bea2515-420e-403f-9571-3589bb103f6a', baseImageId='63c60c27-86b3-4b3c-98f3-a28935fdad2e', topImageId='3bea2515-420e-403f-9571-3589bb103f6a', bandwidth='0'})' execution f
ailed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13
2020-03-22 04:47:33,404Z INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] FINISH, MergeVDSCommand, return: , log id: 74a21a9
2020-03-22 04:47:33,404Z ERROR [org.ovirt.engine.core.bll.MergeCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorExceptio
n: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13 (Failed with error imageErr and code 13)
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
[...]
2020-03-22 04:47:34,400Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-58) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshotSingleDiskLive' (id: 'ed3508db-2269-4d82-a803-93608e56f91e') waiting on child command id: '6126d5fd-ef2a-4c63-92d1-efb969c19246' type:'Merge' to complete
2020-03-22 04:47:35,407Z INFO  [org.ovirt.engine.core.bll.MergeCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-26) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Merge command (jobId = null) has completed for images '63c60c27-86b3-4b3c-98f3-a28935fdad2e'..'3bea2515-420e-403f-9571-3589bb103f6a'
2020-03-22 04:47:36,412Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-56) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Executing Live Merge command step 'MERGE_STATUS'
2020-03-22 04:47:36,427Z INFO  [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: MergeStatusCommand internal: true. Entities affected :  ID: 59e7b50b-6fc4-4db0-910b-f32e52da0a40 Type: Storage
2020-03-22 04:47:36,482Z INFO  [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Successfully removed volume 3bea2515-420e-403f-9571-3589bb103f6a from the chain
2020-03-22 04:47:36,483Z INFO  [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Volume merge type 'COMMIT'
2020-03-22 04:47:37,430Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Executing Live Merge command step 'DESTROY_IMAGE'
2020-03-22 04:47:37,445Z INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DestroyImageVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] START, DestroyImageVDSCommand( DestroyImageVDSCommandParameters:{storagePoolId='bf44d153-5113-467d-9232-0384452a39c9', ignoreFailoverLimit='false', storageDomainId='59e7b50b-6fc4-4db0-910b-f32e52da0a40', imageGroupId='c497a0a3-25bd-4868-871f-85e086f2b6c9', imageId='00000000-0000-0000-0000-000000000000', imageList='[3bea2515-420e-403f-9571-3589bb103f6a]', postZero='false', force='false'}), log id: 5721b62c
2020-03-22 04:47:37,503Z INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command '62e5b18e-0130-413b-a51b-d37ea09438b7'
2020-03-22 04:47:37,503Z INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] CommandMultiAsyncTasks::attachTask: Attaching task 'ae230537-6e02-42f1-9bc1-f29f6563230b' to command '62e5b18e-0130-413b-a51b-d37ea09438b7'.
2020-03-22 04:47:37,507Z INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Adding task 'ae230537-6e02-42f1-9bc1-f29f6563230b' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
2020-03-22 04:47:37,507Z INFO  [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Successfully started task to remove orphaned volumes
2020-03-22 04:47:37,509Z INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] BaseAsyncTask::startPollingTask: Starting to poll task 'ae230537-6e02-42f1-9bc1-f29f6563230b'.
2020-03-22 04:47:37,509Z INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] BaseAsyncTask::startPollingTask: Starting to poll task 'ae230537-6e02-42f1-9bc1-f29f6563230b'.

## Task stuck for 3 days:
2020-03-22 04:47:39,445Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-8) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshotSingleDiskLive' (id: 'ed3508db-2269-4d82-a803-93608e56f91e') waiting on child command id: '62e5b18e-0130-413b-a51b-d37ea09438b7' type:'DestroyImage' to complete
2020-03-22 04:47:39,446Z INFO  [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-8) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Waiting on destroy image command to complete the task (taskId = ae230537-6e02-42f1-9bc1-f29f6563230b)
2020-03-22 04:47:43,456Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshotSingleDiskLive' (id: 'ed3508db-2269-4d82-a803-93608e56f91e') waiting on child command id: '62e5b18e-0130-413b-a51b-d37ea09438b7' type:'DestroyImage' to complete
2020-03-22 04:47:43,457Z INFO  [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Waiting on destroy image command to complete the task (taskId = ae230537-6e02-42f1-9bc1-f29f6563230b)
2020-03-22 04:47:45,463Z INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-5) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshot' (id: 'f050a543-622a-4b93-8ef9-3fadc418d9fa') waiting on child command id: 'ed3508db-2269-4d82-a803-93608e56f91e' type:'RemoveSnapshotSingleDiskLive' to complete
[...]
2020-03-24 03:09:19,645Z INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-11) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshot' (id: 'f050a543-622a-4b93-8ef9-3fadc418d9fa') waiting on child command id: 'ed3508db-2269-4d82-a803-93608e56f91e
' type:'RemoveSnapshotSingleDiskLive' to complete
2020-03-24 03:09:25,661Z INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshotSingleDiskLive' (id: 'ed3508db-2269-4d82-a803-93608e56f91e') waiting on child command id: '62e5b18e-
0130-413b-a51b-d37ea09438b7' type:'DestroyImage' to complete
2020-03-24 03:09:25,663Z INFO  [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Waiting on destroy image command to complete the task (taskId = ae230537-6e02-42f1-9bc1-f29f6563230b)
[...]
2020-03-25 00:32:01,561Z INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-18) [] Task id 'ae230537-6e02-42f1-9bc1-f29f6563230b' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis.


## SPM host
sos_commands/vdsm/vdsm-client_Host_getAllTasksInfo
{
    "ae230537-6e02-42f1-9bc1-f29f6563230b": {
        "verb": "deleteVolume", 
        "id": "ae230537-6e02-42f1-9bc1-f29f6563230b"
    }
}


## Trying to cancel the task:

# vdsm-client Task stop taskID=ae230537-6e02-42f1-9bc1-f29f6563230b
vdsm-client: Command Task.stop with args {'taskID': 'ae230537-6e02-42f1-9bc1-f29f6563230b'} failed:
(code=411, message=Task is aborted: u'ae230537-6e02-42f1-9bc1-f29f6563230b' - code 411)

Comment 22 Marina Kalinin 2021-05-24 19:41:25 UTC
This impacts the ability to troubleshoot and recover customer environments.
Multiple customer cases attached.
Please consider fixing.

Comment 26 Arik 2022-03-23 16:15:05 UTC
The issues that are attached to this bz are on unsupported versions (4.2/4.3 which means they are based on RHEL 7) and we don't know of new issues that required this ability of force-stopping a task with 4.4
If this still happens with 4.4, it would be better to investigate the root cause rather than forcing-stop a task which might lead to future problems that would be then difficult to explain
That said, the ability to force-stop a task as described in the attached KCS should still work for extreme cases in which it is needed