Bug 1126450
Summary: | [Scale] - remove vms running too long due vdsm stuck on state finish | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Eldad Marciano <emarcian> | ||||
Component: | vdsm | Assignee: | Nir Soffer <nsoffer> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Aharon Canan <acanan> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.4.1-1 | CC: | amureini, bazulay, ecohen, gklein, iheim, lpeer, michal.skrivanek, oourfali, scohen, yeylon | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-04-19 15:44:15 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
First order of business - see what consumes the time there - whether it's in the storage subsystem or the tasks infra. (In reply to Allon Mureinik from comment #1) > First order of business - see what consumes the time there - whether it's in > the storage subsystem or the tasks infra. We should revisit in 3.6.0 after the "tasks" rehaul. Closing old bugs, as per Itamar's guidlines. If you think this bug is worth fixing, please feel free to reopen. |
Created attachment 923871 [details] vdsm.zip Description of problem: slowness was discovered around remove vms action, due to task stacked on finish state around ~70 sec [1]. I have tested few sections to avoid any slowness or latency around the NFS storage [2]. setup distribution: -up to 6500 vms -2 storage domain -37 hosts -NFS storage. vms disk template: -thin provision. -postzero false. -20GB -at the SPM side looks like the call for '11:05:05,267::logUtils::44::dispatcher::(wrapper) Run and protect: deleteImage' was come in after 2 min. by the logs looks looks like the removing taking ~70 sec, (the task stuck on 'finish' state for this period of time, which means the actual removing is very quick) Thread-138::INFO::2014-08-04 11:03:49,011::logUtils::44::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='68957f61-33a4-47ea-9b7d-4e0a84639841', spUUID='5d43076e-f0b2-48ca-9984-c6788e9adb31', imgUUID='d5 f9fa80-ec63-4c49-86cc-840d212a2ae5', postZero='false', force='false') Thread-138::INFO::2014-08-04 11:05:04,726::logUtils::47::dispatcher::(wrapper) Run and protect: deleteImage, Return response: None and after one more minuet the SPM abort the action due to un-exist files. Thread-138::INFO::2014-08-04 11:06:13,665::task::1168::TaskManager.Task::(prepare) Task=`b9e8b131-057a-405e-9247-606293cbd8bb`::aborting: Task is aborted: 'Image does not exist in domain' - code 268 thats explain why the engine waiting for locks too long. -by looking at vdsClient -s 0 getAllTasks the 'deleteImage' stuck on state finish for ~70 sec like described in the log. -looks like the remove always doing by the same thread 'Thread-138' the removing action using multiple threads? [2]. running rm -dfr from the SPM host for 'master/vms/<vm>/<vm>.ovf' and 'images/<image>/*' (disk in size of 20gb) was very quick less then ~2sec. after all cleaned from the mount (which no latency found there). I tired to remove the vm from engine, which this action should be very quick now since he have "no files to remove from the mount". both of the machines running well no overload was found. this issue probably reproduced for vm creation too (since this action also running slow). Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1.up to ~6000 vms 2.remove vms Actual results: removing vms takes more then 2 min each, in parallel much more critical. Expected results: in time removing files in size of 20gb from the NFS takes less than 2 sec, removing vms should be similar 2-3 sec. Additional info: logs attached.