Bug 1280223 - VmStats - Errors in vdsm.log for (_getRunningVmStats) from time to time
VmStats - Errors in vdsm.log for (_getRunningVmStats) from time to time
Status: CLOSED CURRENTRELEASE
Product: vdsm
Classification: oVirt
Component: Core (Show other bugs)
4.17.10
x86_64 Linux
unspecified Severity medium (vote)
: ovirt-3.6.2
: 4.17.12
Assigned To: Francesco Romani
Michael Burman
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-11 04:20 EST by Michael Burman
Modified: 2016-02-18 06:15 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-18 06:15:20 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
mgoldboi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
logs and xmls (1.71 MB, application/x-gzip)
2015-11-11 04:20 EST, Michael Burman
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 48478 ovirt-3.6 MERGED vmstats: network: avoid ZeroDivisionError Never

  None (edit)
Description Michael Burman 2015-11-11 04:20:25 EST
Created attachment 1092620 [details]
logs and xmls

Description of problem:
VmStats - Errors in vdsm.log for (_getRunningVmStats) from time to time .

There are some errors in vdsm log while trying to collect running VMs stats -->
  
- I'm not sure what invoking this or what the reason for those errors.
For example i have 2 different servers, on each i have a running VM for a while(more then a week) and each vdsm.log shows same and different errors for (_getRunningVmStats) --> 

Every hour or so i see this errors in the logs.

- Thread-148270::WARNING::2015-11-11 08:24:05,444::vmstats::108::root::(cpu) invalid interval 0 when computing CPU stats
Thread-148270::ERROR::2015-11-11 08:24:05,444::vm::1390::virt.vm::(_getRunningVmStats) vmId=`02b265ee-51c8-4feb-bcb4-a9c9b033c329`::Error fetching vm stats
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 1387, in _getRunningVmStats
    vm_sample.interval)
  File "/usr/share/vdsm/virt/vmstats.py", line 42, in produce
    networks(vm, stats, first_sample, last_sample, interval)
  File "/usr/share/vdsm/virt/vmstats.py", line 228, in networks
    interval)
  File "/usr/share/vdsm/virt/vmstats.py", line 188, in nic_traffic
    interval / ifSpeed / _MBPS_TO_BPS)
ZeroDivisionError: float division by zero


- Thread-19808::ERROR::2015-11-11 08:24:02,400::vm::1390::virt.vm::(_getRunningVmStats) vmId=`0d211260-2c0b-4626-a438-4fc8fc8c76a3`::Error fetching vm stats
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 1387, in _getRunningVmStats
    vm_sample.interval)
  File "/usr/share/vdsm/virt/vmstats.py", line 43, in produce
    disks(vm, stats, first_sample, last_sample, interval)
  File "/usr/share/vdsm/virt/vmstats.py", line 262, in disks
    interval))
  File "/usr/share/vdsm/virt/vmstats.py", line 292, in _disk_rate
    first_sample['block.%d.rd.bytes' % first_index])
KeyError: 'block.0.rd.bytes'

Version-Release number of selected component (if applicable):
3.6.0.3-0.1.el6
vdsm-4.17.10.1-0.el7ev.noarch
libvirt-1.2.17-13.el7.x86_64
qemu-kvm-common-rhev-2.3.0-31.el7.x86_64
Comment 1 Yaniv Kaul 2015-11-12 03:54:07 EST
Michael, those are two unrelated errors (I'd open two bugs)
BTW - what is the NIC type in the first case? What happens when you use SR-IOV?
Comment 2 Michael Burman 2015-11-12 04:07:47 EST
Hi Yaniv)

So you think i should open another BZ for second error? although i can't really reproduce it.. 

It's not related to SR-IOV. This servers have no capable sr-iov NICs. 
The vNIC types are VirtIO .
Comment 3 Yaniv Kaul 2015-11-12 04:40:26 EST
(In reply to Michael Burman from comment #2)
> Hi Yaniv)
> 
> So you think i should open another BZ for second error? although i can't
> really reproduce it.. 

Yes please. We can always close it later as WORKSFORME, but we want it documented.

> 
> It's not related to SR-IOV. This servers have no capable sr-iov NICs. 
> The vNIC types are VirtIO .

I know it isn't, just wondering what will happen in case of SR-IOV.
(Reason: if you look at https://github.com/oVirt/vdsm/blob/master/vdsm/virt/vmstats.py#L178 :
ifSpeed = [100, 1000][model in ('e1000', 'virtio')]

I was wondering what will happen if the NIC is neither e1000 nor virtio.
Did not bother to look any further - perhaps the code isn't even run when it's SR-IOV, etc.)
Comment 4 Michael Burman 2015-11-12 04:48:38 EST
For SR-IOV i reported this one - BZ 1273837
Comment 5 Michal Skrivanek 2015-11-12 05:27:04 EST
I'd like to eliminate as much noise from the logs as possible by 3.6.3
Comment 6 Francesco Romani 2015-11-12 05:33:47 EST
(In reply to Michael Burman from comment #0)
> Every hour or so i see this errors in the logs.
> 
> - Thread-148270::WARNING::2015-11-11 08:24:05,444::vmstats::108::root::(cpu)
> invalid interval 0 when computing CPU stats
> Thread-148270::ERROR::2015-11-11
> 08:24:05,444::vm::1390::virt.vm::(_getRunningVmStats)
> vmId=`02b265ee-51c8-4feb-bcb4-a9c9b033c329`::Error fetching vm stats
> Traceback (most recent call last):
>   File "/usr/share/vdsm/virt/vm.py", line 1387, in _getRunningVmStats
>     vm_sample.interval)
>   File "/usr/share/vdsm/virt/vmstats.py", line 42, in produce
>     networks(vm, stats, first_sample, last_sample, interval)
>   File "/usr/share/vdsm/virt/vmstats.py", line 228, in networks
>     interval)
>   File "/usr/share/vdsm/virt/vmstats.py", line 188, in nic_traffic
>     interval / ifSpeed / _MBPS_TO_BPS)
> ZeroDivisionError: float division by zero

Fixed in master. Backport pending on 3.6.x branch:
https://gerrit.ovirt.org/48478

> 
> - Thread-19808::ERROR::2015-11-11
> 08:24:02,400::vm::1390::virt.vm::(_getRunningVmStats)
> vmId=`0d211260-2c0b-4626-a438-4fc8fc8c76a3`::Error fetching vm stats
> Traceback (most recent call last):
>   File "/usr/share/vdsm/virt/vm.py", line 1387, in _getRunningVmStats
>     vm_sample.interval)
>   File "/usr/share/vdsm/virt/vmstats.py", line 43, in produce
>     disks(vm, stats, first_sample, last_sample, interval)
>   File "/usr/share/vdsm/virt/vmstats.py", line 262, in disks
>     interval))
>   File "/usr/share/vdsm/virt/vmstats.py", line 292, in _disk_rate
>     first_sample['block.%d.rd.bytes' % first_index])
> KeyError: 'block.0.rd.bytes'

Checking this one
Comment 7 Michael Burman 2015-11-12 05:46:47 EST
Hi, 

Note that i reported new bug for the second error - BZ 1281337
Comment 8 Francesco Romani 2015-11-12 06:03:01 EST
(In reply to Michael Burman from comment #7)
> Hi, 
> 
> Note that i reported new bug for the second error - BZ 1281337

OK, let's continue there, then.
Comment 9 Sandro Bonazzola 2015-12-01 10:23:14 EST
This bug is targeted to 3.6.3 milestone and is in Modified state and referenced by vdsm 4.17.12 git log.
Please check.
Comment 10 Michael Burman 2015-12-29 04:31:30 EST
Verified on - vdsm-4.17.14-0.el7ev.noarch

Note You need to log in before you can comment on or make changes to this bug.