Bug 1308651

Summary: VM startup takes several minutes
Product: Red Hat Enterprise Virtualization Manager Reporter: Gordon Watson <gwatson>
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED CURRENTRELEASE QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.3CC: amureini, bazulay, gklein, lsurette, nsoffer, tnisan, ycui, yeylon, ykaul, ylavi
Target Milestone: ovirt-3.6.3   
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-22 14:51:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gordon Watson 2016-02-15 16:57:18 UTC
Description of problem:

VMs can take several minutes (e.g. 7 minutes in the example provided below, but some take longer) to start up. The delay appears to be in VDSM in the 'prepareImage' sequence.


Version-Release number of selected component (if applicable):

RHEV 3.5.3
RHEL 6.6 hosts w/vdsm-4.16.20-1


How reproducible:

Not.


Steps to Reproduce:
1.
2.
3.

Actual results:

Starting VMs can take several minutes. This is across different hosts. The same VMs can later start up much quicker.

An example of one such VM is that it consists of a single disk with a single volume on an NFS storage domain.


Expected results:

VMs start up in a timely fashion.


Additional info:

Comment 8 Allon Mureinik 2016-02-16 15:41:02 UTC
Nir, this sounds awfully familiar - does this ring any bells?

Comment 9 Nir Soffer 2016-02-16 17:35:35 UTC
(In reply to Allon Mureinik from comment #8)
> Nir, this sounds awfully familiar - does this ring any bells?

vdsm 4.16.20 does not include this fix:

46a328b fileSD: Optimize getAllVolumes on file storage

Which can cause long delays in prepareImage, and mixed with many cores and many
concurrent prepareImage calls, can be related.

I will need to look in the logs to understand if there is another issue.

So the best advice for now is to upgrade to 4.16.21, or better to version
that support cpu_affinity option.