Created attachment 917633 [details] vdsm+engine logs Description of problem: when having a running vm with one virtio disk and one or more disks that have a different interface,vdsm throws _getDiskStats and _getDiskLatency errors every several seconds. GuestMonitor-vm_1::ERROR::2014-07-13 18:37:32,025::vm::533::vm.Vm::(_getDiskLatency) vmId=`8be16854-2bc9-49dd-a5c5-8ea38536be99`::Disk sda latency not available Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 531, in _getDiskLatency dLatency = _avgLatencyCalc(sInfo[dName], eInfo[dName]) KeyError: u'sda' GuestMonitor-vm_1::DEBUG::2014-07-13 18:37:32,025::vm::423::vm.Vm::(_getUserCpuTuneInfo) vmId=`8be16854-2bc9-49dd-a5c5-8ea38536be99`::Domain Metadata is not set GuestMonitor-vm_1::ERROR::2014-07-13 18:37:32,026::vm::491::vm.Vm::(_getDiskStats) vmId=`8be16854-2bc9-49dd-a5c5-8ea38536be99`::Disk sda stats not available Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 487, in _getDiskStats (eInfo[dName][1] - sInfo[dName][1]) / sampleInterval) KeyError: u'sda' Version-Release number of selected component (if applicable): beta How reproducible: 100% Steps to Reproduce: 1.add vm+virtio disk +virtio-iscsi disk 2.run vm Actual results: vdsm fails on vdsm's logs and floods them with errors Expected results: no errors should be reported Additional info:
the issue seems to be triggered by the addition of one hotplug disk.
reproduced on today's VDSM master. Steps to reproduce: 1. boota a VM 2. attach a virtIO disk (VDSM verb hotplugDisk triggered) what happens here is the new disk is added to the list of VM drives. When stats are asked, VDSM iterares on that list and look up for disk samples in order to build the stats. But the stats are collected (by default) every 60s, and VDSM considers the oldest and the newst samples; so, until the oldest samples collected has the values for the new disk, we'll see this behaviour. We have a vulerabilility window up to (sampling_window * sampling_interval) in the worst case. With default values is 2 * 60s = 120s. After that, everything should go back to normal: it worked here, stats for the new disk appears and the error go away. I believe the best way to fix this is just to ignore missing samples while building disk stats.
patch available, and the issue is self-resolving when VDSM gathers enough stats. So decreasing severity
oVirt 3.5 has been released and should include the fix for this issue.