Created attachment 923871 [details]
Description of problem:
slowness was discovered around remove vms action, due to task stacked on finish state around ~70 sec .
I have tested few sections to avoid any slowness or latency around the NFS storage .
-up to 6500 vms
-2 storage domain
vms disk template:
-at the SPM side looks like the call for '11:05:05,267::logUtils::44::dispatcher::(wrapper) Run and protect: deleteImage' was come in after 2 min.
by the logs looks looks like the removing taking ~70 sec, (the task stuck on 'finish' state for this period of time, which means the actual removing is very quick)
Thread-138::INFO::2014-08-04 11:03:49,011::logUtils::44::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='68957f61-33a4-47ea-9b7d-4e0a84639841', spUUID='5d43076e-f0b2-48ca-9984-c6788e9adb31', imgUUID='d5
f9fa80-ec63-4c49-86cc-840d212a2ae5', postZero='false', force='false')
Thread-138::INFO::2014-08-04 11:05:04,726::logUtils::47::dispatcher::(wrapper) Run and protect: deleteImage, Return response: None
and after one more minuet the SPM abort the action due to un-exist files.
Thread-138::INFO::2014-08-04 11:06:13,665::task::1168::TaskManager.Task::(prepare) Task=`b9e8b131-057a-405e-9247-606293cbd8bb`::aborting: Task is aborted: 'Image does not exist in domain' - code 268
thats explain why the engine waiting for locks too long.
-by looking at vdsClient -s 0 getAllTasks the 'deleteImage' stuck on state finish for ~70 sec like described in the log.
-looks like the remove always doing by the same thread 'Thread-138' the removing action using multiple threads?
running rm -dfr from the SPM host for 'master/vms/<vm>/<vm>.ovf' and 'images/<image>/*' (disk in size of 20gb) was very quick less then ~2sec.
after all cleaned from the mount (which no latency found there).
I tired to remove the vm from engine, which this action should be very quick now since he have "no files to remove from the mount".
both of the machines running well no overload was found.
this issue probably reproduced for vm creation too (since this action also running slow).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.up to ~6000 vms
removing vms takes more then 2 min each, in parallel much more critical.
in time removing files in size of 20gb from the NFS takes less than 2 sec, removing vms should be similar 2-3 sec.
First order of business - see what consumes the time there - whether it's in the storage subsystem or the tasks infra.
(In reply to Allon Mureinik from comment #1)
> First order of business - see what consumes the time there - whether it's in
> the storage subsystem or the tasks infra.
We should revisit in 3.6.0 after the "tasks" rehaul.
Closing old bugs, as per Itamar's guidlines.
If you think this bug is worth fixing, please feel free to reopen.