Bug 1551521 - KeyError exception in the VDSM when accessing stats['cpuUsage']
Summary: KeyError exception in the VDSM when accessing stats['cpuUsage']
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.20.19
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.2.3
: ---
Assignee: Michal Skrivanek
QA Contact: Liran Rotenberg
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-05 10:37 UTC by Steven Rosenberg
Modified: 2018-05-10 06:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Lack of initialization of values Consequence: Caused intermittent KeyError exception Fix: Result:
Clone Of:
Environment:
Last Closed: 2018-05-10 06:32:11 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 88471 0 None MERGED vmstats: initialize cpuUsage 2020-08-11 15:11:10 UTC
oVirt gerrit 90348 0 ovirt-4.2 MERGED vmstats: initialize cpuUsage 2020-08-11 15:11:11 UTC

Description Steven Rosenberg 2018-03-05 10:37:32 UTC
Description of problem:

QA reported an error where the trace reports a KeyError exception when accessing the stats['cpuUsage'] entry:

-----------------------------------------------------------------
lrotenbe__: 2018-02-27 10:54:06,544+0200 ERROR (periodic/12) [virt.vmstats] VM metrics collection failed (vmstats:264)
[2:33pm] lrotenbe__: Traceback (most recent call last):
[2:33pm] lrotenbe__:   File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics
[2:33pm] lrotenbe__:     data[prefix + '.cpu.usage'] = stat['cpuUsage']
[2:33pm] lrotenbe__: KeyError: 'cpuUsage'
-------------------------------------------------------------------

When tracing the code it was found that the stats['cpuUsage'] entry was not being initialized in two places where other entries were:


---------------------------------------------------------------

1. 
Module: ./lib/vdsm/virt/vmstats.py:

Function:   def cpu(stats, first_sample, last_sample, interval):

Bug: stats['cpuUsage'] is not initialized before the if statements whereas cpuSys and cpuUser are.


2. 
Module: ./lib/vdsm/host/stats.py:

Function: def produce(first_sample, last_sample):

Possible bug: stats['cpuUsage'] is not initialized here whereas cpuSys and cpuUser are.

--------------------------------------------------------------


Version-Release number of selected component (if applicable): ovirt 4.2 and 4.1


How reproducible:

I did not reproduce this, but Liran simulated this when disabling Ballooning on the VM.


Possible Steps to Reproduce:

This error showed up in hot-plug memory test.
In this scenario i saw it is in the test that do the next things:

1. create a VM (memory ballooning is disabled on the VM)
2. start the vm
2. Add 5 times a 256MB of memory

Actual results:

When it occurs it prints the KeyError after throwing an exception.

Expected results:

The stats['cpuUsage'] should be initialized so that the exception can be avoided.


Additional info:

Comment 1 Michal Skrivanek 2018-04-16 15:10:42 UTC
depends on the fate of vdsm's collectd stats gathering - https://bugzilla.redhat.com/show_bug.cgi?id=1532692#c11

nevertheless, it can be fixed easily, it's already merged to master for some time

Comment 2 Liran Rotenberg 2018-04-24 08:37:51 UTC
Tried to verify on:
vdsm-4.20.25-1.el7ev.x86_64
ovirt-engine-4.2.3-0.1.el7.noarch

Steps:
1. Started VM
2. Wait to see if the above traceback is shown.

Result:
The Traceback is still found in the vdsm log

for example:
2018-04-24 10:18:13,104+0300 ERROR (periodic/2) [virt.vmstats] VM metrics collection failed (vmstats:264)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics
    data[prefix + '.cpu.usage'] = stat['cpuUsage']
KeyError: 'cpuUsage'


2018-04-24 10:25:42,995+0300 ERROR (periodic/4) [virt.vmstats] VM metrics collection failed (vmstats:264)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics
    data[prefix + '.cpu.usage'] = stat['cpuUsage']
KeyError: 'cpuUsage'


2018-04-24 10:26:57,982+0300 WARN  (periodic/0) [virt.periodic.VmDispatcher] could not run <class 'vdsm.virt.periodic.BlockjobMonitor'> on ['3e685cf8-ea27-4a87-9c51-0f7edb550e64'] (periodic:323)
2018-04-24 10:26:58,037+0300 ERROR (periodic/4) [virt.vmstats] VM metrics collection failed (vmstats:264)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics
    data[prefix + '.cpu.usage'] = stat['cpuUsage']
KeyError: 'cpuUsage'


2018-04-24 10:26:58,037+0300 ERROR (periodic/4) [virt.vmstats] VM metrics collection failed (vmstats:264)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics
    data[prefix + '.cpu.usage'] = stat['cpuUsage']
KeyError: 'cpuUsage'


Additional Info:
I didn't see these tracebacks after the vm is on, not when idle and not when doing the steps in comment #0.
I saw errors regarding cpuUsage and vmstats when powering on and shutting down vms.
Therefore, for reproducing try to see the logs when/right after starting a vm and shutting it down.

2018-04-24 11:24:30,268+0300 ERROR (libvirt/events) [vds] Error running VM callback (clientIF:683)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 646, in dispatchLibvirtEvents
    v.onLibvirtLifecycleEvent(event, detail, None)

2018-04-24 10:25:35,503+0300 WARN  (vdsm.Scheduler) [Executor] executor state: count=5 workers=set([<Worker name=periodic/3 running <Task <Operation action=<vdsm.virt.sampling.HostMonitor object at 0x7f41dc151b9
0> at 0x7f41dc151c10> timeout=15, duration=3 at 0x7f41b81f3610> task#=309 at 0x7f41dc151210>, <Worker name=periodic/2 waiting task#=319 at 0x7f41dc0c6e90>, <Worker name=periodic/6 waiting task#=0 at 0x7f41b86cbd
50>, <Worker name=periodic/0 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f41dc1519d0> at 0x7f41dc151b50> timeout=7.5, duration=7 at 0x7f41dc0c6490> discarded t
ask#=314 at 0x7f41b83d8bd0>, <Worker name=periodic/5 waiting task#=266 at 0x7f41dc0c6b50>]) (executor:213)
2018-04-24 10:25:35,504+0300 INFO  (vdsm.Scheduler) [Executor] Worker discarded: <Worker name=periodic/0 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f41dc1519d
0> at 0x7f41dc151b50> timeout=7.5, duration=7 at 0x7f41dc0c6490> discarded task#=314 at 0x7f41b83d8bd0> (executor:355)
2018-04-24 10:25:38,088+0300 WARN  (periodic/0) [virt.vmstats] Missing stat: 'net.0.rx.errs' for vm db72df1c-4155-4135-8b6e-703ccc0b2d29 (vmstats:558)
2018-04-24 10:25:38,088+0300 WARN  (periodic/0) [virt.vmstats] Missing stat: 'net.0.rx.bytes' for vm db72df1c-4155-4135-8b6e-703ccc0b2d29 (vmstats:558)
2018-04-24 10:25:38,089+0300 WARN  (periodic/0) [virt.vmstats] Missing stat: 'net.0.rx.errs' for vm a2d3cf99-c095-438f-8030-3a58e7775cb0 (vmstats:558)
2018-04-24 10:25:38,089+0300 WARN  (periodic/0) [virt.vmstats] Missing stat: 'net.0.rx.bytes' for vm a2d3cf99-c095-438f-8030-3a58e7775cb0 (vmstats:558)
2018-04-24 10:25:38,090+0300 ERROR (periodic/0) [virt.vmstats] VM metrics collection failed (vmstats:264)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 248, in send_metrics
    data[netprefix + '.rx_bytes'] = if_info['rx']
KeyError: 'rx'

Comment 3 Michal Skrivanek 2018-04-24 08:42:52 UTC
you should verify with vdsm-4.20.26, not 25

Comment 4 Liran Rotenberg 2018-04-24 10:58:30 UTC
Verified on:
ovirt-engine-4.2.3.2-0.1.el7.noarch
vdsm-4.20.26-1.el7ev.x86_64

I did the steps on comment #2.

Result:
The traceback from comment #0 did not showed in vdsm logs.

Comment 5 Sandro Bonazzola 2018-05-10 06:32:11 UTC
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.