Bug 783920

Summary: [ovirt] [vdsm] service restart is not eminent in case there are running vms
Product: [Retired] oVirt Reporter: Haim <hateya>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED WONTFIX QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, acathrow, bazulay, iheim, mgoldboi, yeylon, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-12 15:55:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Haim 2012-01-23 09:05:25 UTC
Description of problem:

case: 

- host runs vms 
- restart vdsm service 

takes time (depends on the number of vms) till service is restarted (till i see the "I am" entry inside the log), and during that time (pre-restart), nothing is written to vdsm.log. 

This is what I see in logs when 40 vms are running: 

Thread-1630::DEBUG::2012-01-23 03:47:47,921::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1630::DEBUG::2012-01-23 03:47:47,923::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'Storage.b1fe6ae5-d7e7-4847-96a5-7119f0cde67a'
Thread-1630::DEBUG::2012-01-23 03:47:47,923::resourceManager::553::ResourceManager::(releaseResource) Released resource 'Storage.b1fe6ae5-d7e7-4847-96a5-7119f0cde67a' (6 active users)
Thread-1630::DEBUG::2012-01-23 03:47:47,923::task::980::TaskManager.Task::(_decref) Task=`2c9ee487-e161-4196-9fcb-a90a0e5a2a33`::ref 0 aborting False
Thread-1630::ERROR::2012-01-23 03:47:47,965::utils::399::vm.Vm::(collect) vmId=`80303818-82a5-4b46-9a0d-33134235c7ad`::Stats function failed: <AdvancedStatsFunction _sampleDiskLatency at 0x12563e8>
Traceback (most recent call last):
  File "/usr/share/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/share/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 155, in _sampleDiskLatency
    stats = _blockstatsParses(out)
  File "/usr/share/vdsm/libvirtvm.py", line 142, in _blockstatsParses
    'flush_op':devStats['flush_operations'],
KeyError: 'flush_operations'
MainThread::INFO::2012-01-23 03:49:20,795::vdsm::71::vds::(run) I am the actual vdsm 4.9-0

You can see a delay of approx 2 minutes where nothing gets written during that time.

note that vdsm restart with 0 vms is very atomic in that sense (happens immediately)  

git commit hash: 5a0b2c912fb0ea5a305f191e9b558385ef249caa

Comment 1 Haim 2012-01-23 09:16:21 UTC
please ignore the KeyError when evaluating this bug, its a known\different issue with our sampling method.

Comment 2 Itamar Heim 2013-03-12 15:55:28 UTC
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.