Hide Forgot
Description of problem: SPM cannot be relocate via right click on node -> "Select as SPM". OVirt gives error "Error while executing action: Cannot force select SPM. The Storage Pool has running taks" Version-Release number of selected component (if applicable): OVirt 3.6.7 VDSM 4.17.28-1 How reproducible: 100% Steps to Reproduce: 1. Right click on Non-SPM Node 2. Select as SPM 3. Actual results: Popup "Error while executing action: Cannot force select SPM. The Storage Pool has running taks" appears Expected results: SPM should be relocated Additional info: Running "vdsClient -s 0 getAllTasks" on SPM gives no output. Restarting enginge does not solve the error Restarting vdsm on SPM node force SPM relocation. Nevertheless the behaviour stays the same. After datacenter has recovered user cannot relocate SPM using OVirt GUI. Enging logs gives: 2016-09-14 21:00:05,781 INFO [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-40) [] Fetched 8 VMs from VDS 'd6e30607-3712-4052-b7b4-40a6f28b21d6' 2016-09-14 21:00:10,162 INFO [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-36) [] Fetched 21 VMs from VDS '880512c6-5bc3-41f3-85fb-e16d62842ace' 2016-09-14 21:00:12,981 INFO [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-16) [] Fetched 12 VMs from VDS '5f32a62f-f2a6-42ae-ba90-f22776668dcb' 2016-09-14 21:00:13,033 INFO [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-15) [] Fetched 17 VMs from VDS '30f2b140-06d5-4968-bf71-ce4042558680' 2016-09-14 21:00:13,087 INFO [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-75) [] Fetched 8 VMs from VDS 'ede50b24-2100-4178-9522-dbae6df741e5' 2016-09-14 21:00:13,098 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-75) [] VM job 'e019808d-7358-48f2-8792-a7b1daf7dd95': In progress (no change) 2016-09-14 21:00:13,122 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-15) [] VM job '859f5241-db87-476a-bca2-5a800fa6d194': In progress (no change) 2016-09-14 21:00:13,132 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-75) [] VM job '7e84c4a4-82a3-4542-a5ef-e7dfc3eb2a43': In progress (no change) 2016-09-14 21:00:13,132 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-75) [] VM job 'e01b705a-52b5-46e5-819d-40fe72fcafa1': In progress (no change) 2016-09-14 21:00:15,447 INFO [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-7) [] Fetched 21 VMs from VDS 'e149d50d-3f26-472d-b61e-462b198fd4c1' 2016-09-14 21:00:20,938 INFO [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-4) [] Fetched 15 VMs from VDS '8771ba2d-e8d4-4a4b-8079-3a65eedd9d5c' 2016-09-14 21:00:21,024 INFO [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-62) [] Fetched 8 VMs from VDS 'd6e30607-3712-4052-b7b4-40a6f28b21d6' 2016-09-14 21:00:25,283 WARN [org.ovirt.engine.core.bll.ForceSelectSPMCommand] (default task-6) [6b0d2a1f] CanDoAction of action 'ForceSelectSPM' failed for user stockhausen@collogia.de. Reasons: VAR__ACTION__FORCE_SELECT,VAR__TYPE__SPM,$VdsName colovn06,CANNOT_FORCE_SELECT_SPM_STORAGE_POOL_HAS_RUNNING_TASKS 2016-09-14 21:00:25,976 INFO [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-40) [] Fetched 21 VMs from VDS '880512c6-5bc3-41f3-85fb-e16d62842ace'
Can you share the output of: vdsClient -s 0 getAllTasks If you see any tasks on the spm, please provide vdsm and engine logs since these tasks started. If there are no tasks in vdsm, you probably have some tasks ids in engine logs, we will need logs singe these tasks were created.
Created attachment 1200945 [details] SPM
No running tasks in VDSM on SPM. see screenshot attached.
See also http://lists.ovirt.org/pipermail/users/2016-September/042721.html
We cannot do anything without logs, please see comment 1, thanks!
I can provide more logs if you like. The only problem is that we do not know when all of this started. The engine ran for month without obvious issues. So it is hard to send you gigabytes of logs. First of all you could help to dentify the blocker. So why does engine think there are still running tasks? From looking at the DB there should be nothing left. engine=# select * from job order by start_time desc; job_id | action_type | description | status | owner_id | visible | ... --------+-------------+-------------+--------+----------+---------+---- (0 rows)
Ok here the Output from the taskcleaner: ./share/ovirt-engine/setup/dbutils/taskcleaner.sh -u engine -d engine -z t 92ff6bd7-ae6b-4078-84c3-9f65972ef339 | 8 | 2 | 2016-08-18 19:36:38.805+02 | 0 | 1010 | 5153efbf-26a8-44c4-b4cf-cda9625877 07 | fd9617b0-e830-42cf-9bc3-26601904f365 | 94ed7a19-fade-4bd6-83f2-2cbb2f730b95
I'm sorry to say that we have no more logs from that date. Is there anything more I can provide before I cleanup this task?
From reading different BZs that means we have a running (status=2) disk migration task (type 1010). Regarding the VM: engine=# select * from async_tasks_entities; async_task_id | entity_id |tp ------------------------------------+------------------------------------+-- 92ff6bd7-ae6b-4078-84c3-9f65972ef339|2ee5fd27-73d6-4efc-bbbd-0bcfac0715bf|VM engine=# select vm_guid,status,vm_host from vm_dynamic where vm_guid = '2ee5fd27-73d6-4efc-bbbd-0bcfac0715bf'; vm_guid | status | vm_host --------------------------------------+--------+--------------------- 2ee5fd27-73d6-4efc-bbbd-0bcfac0715bf | 1 | colvm01.collogia.de Why doesn't engine cleanup this task?
We want to fix the error state. Can we do the task cleanup or do you need other data?
Hi Markus, what is the current status of the disk that was being migrated? is it LOCKED? I'd suggest to not clear the task by yourself, but updating its status so that the migration operation will converge (and the task will be cleared) - otherwise the disk will remain locked.
We fixed the state with taskcleaner. The disk state (locked) was resetted manually.
This is by design, you can not relocate the SPM until the tasks are done or cleaned. We are making a effort to move some tasks to HSM which will help here.