Bug 1376156 - Cannot relocate SPM: The Storage Pool has running tasks.
Summary: Cannot relocate SPM: The Storage Pool has running tasks.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 3.6.7
Hardware: Unspecified
OS: Unspecified
unspecified
high vote
Target Milestone: ovirt-4.0.6
: ---
Assignee: Liron Aravot
QA Contact: Aharon Canan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-14 19:01 UTC by Markus Stockhausen
Modified: 2016-10-31 12:43 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-31 12:43:46 UTC
oVirt Team: Storage
tnisan: ovirt-4.0.z?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
SPM (52.74 KB, image/png)
2016-09-14 19:19 UTC, Markus Stockhausen
no flags Details

Description Markus Stockhausen 2016-09-14 19:01:30 UTC
Description of problem:

SPM cannot be relocate via right click on node -> "Select as SPM". OVirt gives error "Error while executing action: Cannot force select SPM. The Storage Pool has running taks"


Version-Release number of selected component (if applicable):

OVirt 3.6.7
VDSM 4.17.28-1

How reproducible:

100%

Steps to Reproduce:
1. Right click on Non-SPM Node
2. Select as SPM
3.

Actual results:

Popup "Error while executing action: Cannot force select SPM. The Storage Pool has running taks" appears

Expected results:

SPM should be relocated

Additional info:

Running "vdsClient -s 0 getAllTasks" on SPM gives no output.

Restarting enginge does not solve the error

Restarting vdsm on SPM node force SPM relocation. Nevertheless the behaviour stays the same. After datacenter has recovered user cannot relocate SPM using OVirt GUI.

Enging logs gives:

2016-09-14 21:00:05,781 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-40) [] Fetched 8 VMs from VDS 'd6e30607-3712-4052-b7b4-40a6f28b21d6'
2016-09-14 21:00:10,162 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-36) [] Fetched 21 VMs from VDS '880512c6-5bc3-41f3-85fb-e16d62842ace'
2016-09-14 21:00:12,981 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-16) [] Fetched 12 VMs from VDS '5f32a62f-f2a6-42ae-ba90-f22776668dcb'
2016-09-14 21:00:13,033 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-15) [] Fetched 17 VMs from VDS '30f2b140-06d5-4968-bf71-ce4042558680'
2016-09-14 21:00:13,087 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-75) [] Fetched 8 VMs from VDS 'ede50b24-2100-4178-9522-dbae6df741e5'
2016-09-14 21:00:13,098 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-75) [] VM job 'e019808d-7358-48f2-8792-a7b1daf7dd95': In progress (no change)
2016-09-14 21:00:13,122 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-15) [] VM job '859f5241-db87-476a-bca2-5a800fa6d194': In progress (no change)
2016-09-14 21:00:13,132 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-75) [] VM job '7e84c4a4-82a3-4542-a5ef-e7dfc3eb2a43': In progress (no change)
2016-09-14 21:00:13,132 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-75) [] VM job 'e01b705a-52b5-46e5-819d-40fe72fcafa1': In progress (no change)
2016-09-14 21:00:15,447 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-7) [] Fetched 21 VMs from VDS 'e149d50d-3f26-472d-b61e-462b198fd4c1'
2016-09-14 21:00:20,938 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-4) [] Fetched 15 VMs from VDS '8771ba2d-e8d4-4a4b-8079-3a65eedd9d5c'
2016-09-14 21:00:21,024 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-62) [] Fetched 8 VMs from VDS 'd6e30607-3712-4052-b7b4-40a6f28b21d6'
2016-09-14 21:00:25,283 WARN  [org.ovirt.engine.core.bll.ForceSelectSPMCommand] (default task-6) [6b0d2a1f] CanDoAction of action 'ForceSelectSPM' failed for user stockhausen@collogia.de. Reasons: VAR__ACTION__FORCE_SELECT,VAR__TYPE__SPM,$VdsName colovn06,CANNOT_FORCE_SELECT_SPM_STORAGE_POOL_HAS_RUNNING_TASKS
2016-09-14 21:00:25,976 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-40) [] Fetched 21 VMs from VDS '880512c6-5bc3-41f3-85fb-e16d62842ace'

Comment 1 Nir Soffer 2016-09-14 19:15:47 UTC
Can you share the output of:

    vdsClient -s 0 getAllTasks

If you see any tasks on the spm, please provide vdsm and engine logs since these
tasks started.

If there are no tasks in vdsm, you probably have some tasks ids in engine logs, we
will need logs singe these tasks were created.

Comment 2 Markus Stockhausen 2016-09-14 19:19:30 UTC
Created attachment 1200945 [details]
SPM

Comment 3 Markus Stockhausen 2016-09-14 19:20:01 UTC
No running tasks in VDSM on SPM. see screenshot attached.

Comment 5 Nir Soffer 2016-09-14 19:30:24 UTC
We cannot do anything without logs, please see comment 1, thanks!

Comment 6 Markus Stockhausen 2016-09-15 05:36:54 UTC
I can provide more logs if you like. The only problem is that we do not know when all of this started. The engine ran for month without obvious issues. So it is hard to send you gigabytes of logs.

First of all you could help to dentify the blocker. So why does engine think there are still running tasks? From looking at the DB there should be nothing left.

engine=# select * from job order by start_time desc;
 job_id | action_type | description | status | owner_id | visible | ...
--------+-------------+-------------+--------+----------+---------+----
(0 rows)

Comment 7 Markus Stockhausen 2016-09-15 05:50:12 UTC
Ok here the Output from the taskcleaner:

./share/ovirt-engine/setup/dbutils/taskcleaner.sh -u engine -d engine -z
 t
 92ff6bd7-ae6b-4078-84c3-9f65972ef339 |         8 |      2 | 2016-08-18 19:36:38.805+02 |      0 |         1010 | 5153efbf-26a8-44c4-b4cf-cda9625877
07 | fd9617b0-e830-42cf-9bc3-26601904f365 | 94ed7a19-fade-4bd6-83f2-2cbb2f730b95

Comment 8 Markus Stockhausen 2016-09-15 05:53:50 UTC
I'm sorry to say that we have no more logs from that date. Is there anything more I can provide before I cleanup this task?

Comment 9 Markus Stockhausen 2016-09-15 06:12:56 UTC
From reading different BZs that means we have a running (status=2) disk migration task (type 1010).

Regarding the VM:

engine=# select * from async_tasks_entities;
           async_task_id            |             entity_id              |tp 
------------------------------------+------------------------------------+--
92ff6bd7-ae6b-4078-84c3-9f65972ef339|2ee5fd27-73d6-4efc-bbbd-0bcfac0715bf|VM

engine=# select vm_guid,status,vm_host from 
         vm_dynamic where vm_guid = '2ee5fd27-73d6-4efc-bbbd-0bcfac0715bf';
               vm_guid                | status |       vm_host
--------------------------------------+--------+---------------------
 2ee5fd27-73d6-4efc-bbbd-0bcfac0715bf |      1 | colvm01.collogia.de


Why doesn't engine cleanup this task?

Comment 10 Markus Stockhausen 2016-09-15 19:26:36 UTC
We want to fix the error state. Can we do the task cleanup or do you need other data?

Comment 11 Liron Aravot 2016-09-15 21:09:18 UTC
Hi Markus,
what is the current status of the disk that was being migrated? is it LOCKED?

I'd suggest to not clear the task by yourself, but updating its status so that the migration operation will converge (and the task will be cleared) - otherwise the disk will remain locked.

Comment 12 Markus Stockhausen 2016-09-29 05:33:55 UTC
We fixed the state with taskcleaner. The disk state (locked) was resetted manually.

Comment 13 Yaniv Lavi 2016-10-31 12:43:46 UTC
This is by design, you can not relocate the SPM until the tasks are done or cleaned.
We are making a effort to move some tasks to HSM which will help here.


Note You need to log in before you can comment on or make changes to this bug.