Bug 1592180
| Summary: | [RFE] Add 'force' option to clear aborted task using vdsm-client | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | nijin ashok <nashok> |
| Component: | vdsm | Assignee: | Nir Soffer <nsoffer> |
| Status: | CLOSED WONTFIX | QA Contact: | Evelina Shames <eshames> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.2.3 | CC: | abpatil, aefrat, ahadas, eshames, fdelorey, guillaume.pavese, gveitmic, jortialc, lsurette, mperina, nashok, nsoffer, omachace, pkliczew, schandle, smaudet, srevivo, tnisan, ycui |
| Target Milestone: | --- | Keywords: | EasyFix, FutureFeature, ZStream |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-23 16:15:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 902971 | ||
This parameter was not supported by jsonrpc since the beginning (3.5). It is not defined in the schema [1] and the Api.py [2]. We have this parameter on the storage api level [3]. This BZ seems to be an RFE to change the api. [1] https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml#L9929 [2] https://github.com/oVirt/vdsm/blob/master/lib/vdsm/API.py#L118 [3] https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/hsm.py#L2232 (In reply to nijin ashok from comment #0) > Description of problem: > > Currently, "force" option for "stopTask" is not exposed to > vdsm-client/vdsClient. So it's impossible to stop the task which is in the > aborted stage other than manual steps. It will fail with the error below. ... > TaskAborted: Task is aborted: u'e5ad2e3a-5506-4305-b0d5-9f738a04c2bf' - code > 411 Again a bug asking for specific solution instead of describing the problem and how it effects users. If the task is aborted, why do we need to stop it? can we use clearTask? nijin, can you explain: - what the user is trying to do, and why - how the system respond - how the system should respond - what flows are effected (engine?, vdsm-client?) I'm changing the title to reflect the my current understanding on this bug. Also, this is a storage issue. Even if the issue is missing argument in vdsm api schema, this must be handled by the storage team. Hi, as a data point, i tried to modify a bonding mode (mode 2 to mode 1) on a host and something seems to have gone wrong. Since I tried rebooting the host, tried all args variants of taskcleaner.sh (never got anything listed by this command) I still have two tasks listed as running in the manager for hours : "Invoking Activate Host" and "Configuring networks on host" That seems to prevent me from reinstalling the host or removing it : "Cannot remove Host. Related operation is currently in progress. Please try again later." Which is unfortunate as I am now stuck. So without knowing if that's the real solution, I would too like to see a way to cancel/stop running tasks to get out of that sort of situations. (In reply to Guillaume Pavese from comment #4) Thanks for adding more info on this. It seems that the issue is how to abort tasks in the system. Is this info about one of the cases attached to this bug? I'm moving this to ovirt-engine since engine is responsible for managing running tasks. The fact that you have "stuck" tasks means that engine failed to manage them. Now, even if we have a bug in engine, it may be possible to stop running storage tasks and clear them on the host, as a way to work around possible engine bug. Note that activating hosts or configureing host networks are not storage tasks. If you still have the host in this state, it would be useful to check if there are running tasks on the host using this command: vdsm-client Host getAllTasksInfo If there are running tasks, you can stop them: vdsm-client Task stop taskID=xxx-yyy If there are finished tasks, you can clear them: vdsm-client Task clear taskID=xxx-yyy Please check the online help for available commands: $ sudo vdsm-client Task -h usage: vdsm-client Task [-h] method [arg=value] ... optional arguments: -h, --help show this help message and exit Task methods: method [arg=value] getStatus Get Task status information. revert Rollback a Task to restore the previous system state. clear Discard information about a finished Task. getInfo Get information about a Task. stop Stop a currently running Task. If you cannot stop and clear tasks using vdsm-client, please file a separate vdsm bug. I only got empty results by doing (on the concerned host but the other host running the engine too):
vdsm-client Host getAllTasksInfo
{}
I managed to "recover" by powering off the engine, rebooting the host holding it then hosted-engine --vm-start
The two phantom tasks are now gone and my two hosts are up and running...
That was not optimal.
Maybe there was a supported way of doing it, but seeing the tasks in the GUI while not with vdsm-client, seems to call for a way to manage (cancel/stop/clean) them from the GUI.
Moving back to vdsm, Piotr will file a bug for ovirt-engine. nijin, please see comment 3. (In reply to Nir Soffer from comment #11) > Moving back to vdsm, Piotr will file a bug for ovirt-engine. > Here is the engine bug: BZ #1597554. (In reply to nijin ashok from comment #13) Thanks for clarifying this issue. Do you know how to reproduce these stuck tasks? Also, do you have the output of getAllTasksInfo or similar apis showing info about the stuck tasks? (In reply to Nir Soffer from comment #14) > (In reply to nijin ashok from comment #13) > Thanks for clarifying this issue. > > Do you know how to reproduce these stuck tasks? > Nop, I was not able to reproduce this. > Also, do you have the output of getAllTasksInfo or similar apis showing info > about > the stuck tasks? cat sos_strings/vdsm/Host.getAllTasksInfo {"e5ad2e3a-5506-4305-b0d5-9f738a04c2bf": {"verb": "createVolume", "id": "e5ad2e3a-5506-4305-b0d5-9f738a04c2bf"}, "773ae23d-fd76-48ba-a1f7-7d425e86cdb1": {"verb": "createVolume", "id": "773ae23d-fd76-48ba-a1f7-7d425e86cdb1"}, "ab896f90-e87b-4acb-a728-9c59b009d9cc": {"verb": "createVolume", "id": "ab896f90-e87b-4acb-a728-9c59b009d9cc"}, "bf7257df-47e3-46ef-aaba-a81228d0f330": {"verb": "createVolume", "id": "bf7257df-47e3-46ef-aaba-a81228d0f330"}, "c43d42a3-b329-4581-9443-71de80015c5c": {"verb": "createVolume", "id": "c43d42a3-b329-4581-9443-71de80015c5c"}, "84d7336b-d262-4a2a-8c70-a89e18f236ad": {"verb": "createVolume", "id": "84d7336b-d262-4a2a-8c70-a89e18f236ad"}, "30116e4a-1b70-4b6f-8c58-cc9067ea918b": {"verb": "createVolume", "id": "30116e4a-1b70-4b6f-8c58-cc9067ea918b"}} This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. sync2jira sync2jira This has happened to a customer when trying to cancel a task that was stuck for several days.
Versions:
RHV-H 4.3.7.1
vdsm-4.30.38-1.el7ev.x86_64
ovirt-engine-4.3.7.2-0.1.el7.noarch
## engine.log
## The merge command fails, causing the taks to become stuck:
2020-03-22 04:47:31,010Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-6445) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Lock Acquired to object 'EngineLock:{exclusiveLocks='[9a405876-2660-4490-afa8-38f557b7594d=VM]', sharedLocks=''}'
2020-03-22 04:47:31,080Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-6445) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: RemoveSnapshotCommand internal: false. Entities affected : ID: 9a405876-2660-4490-afa8-38f557b7594d Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER
2020-03-22 04:47:31,082Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-6445) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Lock freed to object 'EngineLock:{exclusiveLocks='[9a405876-2660-4490-afa8-38f557b7594d=VM]', sharedLocks=''}'
2020-03-22 04:47:31,119Z INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-6445) [] EVENT_ID: USER_REMOVE_SNAPSHOT(342), Snapshot '_GX_BACKUP_vm01_335513_29318_server01-bkp' deletion for VM 'vm01' was initiated by backup.
2020-03-22 04:47:31,123Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-8) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: RemoveSnapshotSingleDiskLiveCommand internal: true. Entities affected : ID: 59e7b50b-6fc4-4db0-910b-f32e52da0a40 Type: Storage
2020-03-22 04:47:31,350Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-22) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Executing Live Merge command step 'EXTEND'
2020-03-22 04:47:31,369Z INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-22) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshot' (id: 'f050a543-622a-4b93-8ef9-3fadc418d9fa') waiting on child command id: 'ed3508db-2269-4d82-a803-93608e56f91e
' type:'RemoveSnapshotSingleDiskLive' to complete
2020-03-22 04:47:31,369Z INFO [org.ovirt.engine.core.bll.MergeExtendCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: MergeExtendCommand internal: true. Entities affected : ID: 59e7b50b-6fc4-4db0-910b-f32e52da0a40 Type: Storage
2020-03-22 04:47:31,369Z INFO [org.ovirt.engine.core.bll.MergeExtendCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Refreshing volume 63c60c27-86b3-4b3c-98f3-a28935fdad2e on host d3987c96-d6f3-4c81-8099-1bbbdd1aec71
2020-03-22 04:47:31,375Z INFO [org.ovirt.engine.core.bll.RefreshVolumeCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: RefreshVolumeCommand internal: true.
2020-03-22 04:47:31,376Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshVolumeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] START, RefreshVolumeVDSCommand(HostName = host01, RefreshVolumeVDSCommandParameters:{hostId='d3987c96-d6f3-4c81-8099-1bbbdd1aec71', storagePoolId='bf
44d153-5113-467d-9232-0384452a39c9', storageDomainId='59e7b50b-6fc4-4db0-910b-f32e52da0a40', imageGroupId='c497a0a3-25bd-4868-871f-85e086f2b6c9', imageId='63c60c27-86b3-4b3c-98f3-a28935fdad2e'}), log id: 4165c76d
2020-03-22 04:47:31,907Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshVolumeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] FINISH, RefreshVolumeVDSCommand, return: , log id: 4165c76d
2020-03-22 04:47:31,907Z INFO [org.ovirt.engine.core.bll.RefreshVolumeCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-7) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Successfully refreshed volume '63c60c27-86b3-4b3c-98f3-a28935fdad2e' on host 'd3987c96-d6f3-4c81-8099-1bbbdd1aec71'
2020-03-22 04:47:33,381Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-65) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Executing Live Merge command step 'MERGE'
2020-03-22 04:47:33,393Z INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-65) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshot' (id: 'f050a543-622a-4b93-8ef9-3fadc418d9fa') waiting on child command id: 'ed3508db-2269-4d82-a803-93608e56f91e
' type:'RemoveSnapshotSingleDiskLive' to complete
2020-03-22 04:47:33,397Z INFO [org.ovirt.engine.core.bll.MergeCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: MergeCommand internal: true. Entities affected : ID: 59e7b50b-6fc4-4db0-910b-f32e52da0a40 Type: Storage
2020-03-22 04:47:33,398Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] START, MergeVDSCommand(HostName = host01, MergeVDSCommandParameters:{hostId='d3987c96-d6f3-4c81-8099-1bbbdd1aec71', vmId='9a405876-2660-4490-afa8-38f557b7594
d', storagePoolId='bf44d153-5113-467d-9232-0384452a39c9', storageDomainId='59e7b50b-6fc4-4db0-910b-f32e52da0a40', imageGroupId='c497a0a3-25bd-4868-871f-85e086f2b6c9', imageId='3bea2515-420e-403f-9571-3589bb103f6a', baseImageId='63c60c27-86b3-4b3c-98f3-a28935fdad2e', topImageId='3bea2515-420e-403f-9571-3589bb103f6a', bandwidth='0'}), log id: 74a21
a9
2020-03-22 04:47:33,402Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Failed in 'MergeVDS' method
2020-03-22 04:47:33,404Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host01 command MergeVDS failed: Drive image file could not be found
2020-03-22 04:47:33,404Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturn [status=Status [code=13, message=Drive image file could not be
found]]'
2020-03-22 04:47:33,404Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] HostName = host01
2020-03-22 04:47:33,404Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'MergeVDSCommand(HostName = host01, MergeVDSCommandParameters:{hostId='d3987c96-d6f3-4c81-8099-1bbbdd1aec71', vmId='9a405876-2660-4490-afa8-38f557b75
94d', storagePoolId='bf44d153-5113-467d-9232-0384452a39c9', storageDomainId='59e7b50b-6fc4-4db0-910b-f32e52da0a40', imageGroupId='c497a0a3-25bd-4868-871f-85e086f2b6c9', imageId='3bea2515-420e-403f-9571-3589bb103f6a', baseImageId='63c60c27-86b3-4b3c-98f3-a28935fdad2e', topImageId='3bea2515-420e-403f-9571-3589bb103f6a', bandwidth='0'})' execution f
ailed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13
2020-03-22 04:47:33,404Z INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] FINISH, MergeVDSCommand, return: , log id: 74a21a9
2020-03-22 04:47:33,404Z ERROR [org.ovirt.engine.core.bll.MergeCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-1) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorExceptio
n: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Drive image file could not be found, code = 13 (Failed with error imageErr and code 13)
at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:]
at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
[...]
2020-03-22 04:47:34,400Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-58) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshotSingleDiskLive' (id: 'ed3508db-2269-4d82-a803-93608e56f91e') waiting on child command id: '6126d5fd-ef2a-4c63-92d1-efb969c19246' type:'Merge' to complete
2020-03-22 04:47:35,407Z INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-26) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Merge command (jobId = null) has completed for images '63c60c27-86b3-4b3c-98f3-a28935fdad2e'..'3bea2515-420e-403f-9571-3589bb103f6a'
2020-03-22 04:47:36,412Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-56) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Executing Live Merge command step 'MERGE_STATUS'
2020-03-22 04:47:36,427Z INFO [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Running command: MergeStatusCommand internal: true. Entities affected : ID: 59e7b50b-6fc4-4db0-910b-f32e52da0a40 Type: Storage
2020-03-22 04:47:36,482Z INFO [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Successfully removed volume 3bea2515-420e-403f-9571-3589bb103f6a from the chain
2020-03-22 04:47:36,483Z INFO [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-10) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Volume merge type 'COMMIT'
2020-03-22 04:47:37,430Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Executing Live Merge command step 'DESTROY_IMAGE'
2020-03-22 04:47:37,445Z INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DestroyImageVDSCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] START, DestroyImageVDSCommand( DestroyImageVDSCommandParameters:{storagePoolId='bf44d153-5113-467d-9232-0384452a39c9', ignoreFailoverLimit='false', storageDomainId='59e7b50b-6fc4-4db0-910b-f32e52da0a40', imageGroupId='c497a0a3-25bd-4868-871f-85e086f2b6c9', imageId='00000000-0000-0000-0000-000000000000', imageList='[3bea2515-420e-403f-9571-3589bb103f6a]', postZero='false', force='false'}), log id: 5721b62c
2020-03-22 04:47:37,503Z INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command '62e5b18e-0130-413b-a51b-d37ea09438b7'
2020-03-22 04:47:37,503Z INFO [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] CommandMultiAsyncTasks::attachTask: Attaching task 'ae230537-6e02-42f1-9bc1-f29f6563230b' to command '62e5b18e-0130-413b-a51b-d37ea09438b7'.
2020-03-22 04:47:37,507Z INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Adding task 'ae230537-6e02-42f1-9bc1-f29f6563230b' (Parent Command 'DestroyImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
2020-03-22 04:47:37,507Z INFO [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommand] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Successfully started task to remove orphaned volumes
2020-03-22 04:47:37,509Z INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] BaseAsyncTask::startPollingTask: Starting to poll task 'ae230537-6e02-42f1-9bc1-f29f6563230b'.
2020-03-22 04:47:37,509Z INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedExecutorService-commandCoordinator-Thread-9) [8848ef2e-90e2-482f-83c5-7a5906cbb145] BaseAsyncTask::startPollingTask: Starting to poll task 'ae230537-6e02-42f1-9bc1-f29f6563230b'.
## Task stuck for 3 days:
2020-03-22 04:47:39,445Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-8) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshotSingleDiskLive' (id: 'ed3508db-2269-4d82-a803-93608e56f91e') waiting on child command id: '62e5b18e-0130-413b-a51b-d37ea09438b7' type:'DestroyImage' to complete
2020-03-22 04:47:39,446Z INFO [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-8) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Waiting on destroy image command to complete the task (taskId = ae230537-6e02-42f1-9bc1-f29f6563230b)
2020-03-22 04:47:43,456Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshotSingleDiskLive' (id: 'ed3508db-2269-4d82-a803-93608e56f91e') waiting on child command id: '62e5b18e-0130-413b-a51b-d37ea09438b7' type:'DestroyImage' to complete
2020-03-22 04:47:43,457Z INFO [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Waiting on destroy image command to complete the task (taskId = ae230537-6e02-42f1-9bc1-f29f6563230b)
2020-03-22 04:47:45,463Z INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-5) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshot' (id: 'f050a543-622a-4b93-8ef9-3fadc418d9fa') waiting on child command id: 'ed3508db-2269-4d82-a803-93608e56f91e' type:'RemoveSnapshotSingleDiskLive' to complete
[...]
2020-03-24 03:09:19,645Z INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-11) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshot' (id: 'f050a543-622a-4b93-8ef9-3fadc418d9fa') waiting on child command id: 'ed3508db-2269-4d82-a803-93608e56f91e
' type:'RemoveSnapshotSingleDiskLive' to complete
2020-03-24 03:09:25,661Z INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Command 'RemoveSnapshotSingleDiskLive' (id: 'ed3508db-2269-4d82-a803-93608e56f91e') waiting on child command id: '62e5b18e-
0130-413b-a51b-d37ea09438b7' type:'DestroyImage' to complete
2020-03-24 03:09:25,663Z INFO [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommandCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [8848ef2e-90e2-482f-83c5-7a5906cbb145] Waiting on destroy image command to complete the task (taskId = ae230537-6e02-42f1-9bc1-f29f6563230b)
[...]
2020-03-25 00:32:01,561Z INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-18) [] Task id 'ae230537-6e02-42f1-9bc1-f29f6563230b' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis.
## SPM host
sos_commands/vdsm/vdsm-client_Host_getAllTasksInfo
{
"ae230537-6e02-42f1-9bc1-f29f6563230b": {
"verb": "deleteVolume",
"id": "ae230537-6e02-42f1-9bc1-f29f6563230b"
}
}
## Trying to cancel the task:
# vdsm-client Task stop taskID=ae230537-6e02-42f1-9bc1-f29f6563230b
vdsm-client: Command Task.stop with args {'taskID': 'ae230537-6e02-42f1-9bc1-f29f6563230b'} failed:
(code=411, message=Task is aborted: u'ae230537-6e02-42f1-9bc1-f29f6563230b' - code 411)
This impacts the ability to troubleshoot and recover customer environments. Multiple customer cases attached. Please consider fixing. The issues that are attached to this bz are on unsupported versions (4.2/4.3 which means they are based on RHEL 7) and we don't know of new issues that required this ability of force-stopping a task with 4.4 If this still happens with 4.4, it would be better to investigate the root cause rather than forcing-stop a task which might lead to future problems that would be then difficult to explain That said, the ability to force-stop a task as described in the attached KCS should still work for extreme cases in which it is needed |
Description of problem: Currently, "force" option for "stopTask" is not exposed to vdsm-client/vdsClient. So it's impossible to stop the task which is in the aborted stage other than manual steps. It will fail with the error below. === 2018-06-01 15:39:37,703+1200 ERROR (Thread-1450) [storage.TaskManager.Task] (Task='ecc67ee5-7b53-421f-89a6-cd1cfcf10845') Unexpected error (task:872) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 879, in _run return fn(*args, **kargs) File "<string>", line 2, in stopTask File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2243, in stopTask return self.taskMng.stopTask(taskID=taskID, force=force) File "/usr/share/vdsm/storage/taskManager.py", line 158, in stopTask t.stop(force=force) File "/usr/share/vdsm/storage/task.py", line 1253, in stop self._incref(force) File "/usr/share/vdsm/storage/task.py", line 985, in _incref raise se.TaskAborted(unicode(self)) TaskAborted: Task is aborted: u'e5ad2e3a-5506-4305-b0d5-9f738a04c2bf' - code 411 === Version-Release number of selected component (if applicable): rhvm-4.2.3 How reproducible: 100% Steps to Reproduce: Actual results: No "force" option for stopTask via vdsm-client Expected results: Provide "force" option for stopTask via vdsm-client Additional info: