Description of problem: QA reported an error where the trace reports a KeyError exception when accessing the stats['cpuUsage'] entry: ----------------------------------------------------------------- lrotenbe__: 2018-02-27 10:54:06,544+0200 ERROR (periodic/12) [virt.vmstats] VM metrics collection failed (vmstats:264) [2:33pm] lrotenbe__: Traceback (most recent call last): [2:33pm] lrotenbe__: File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics [2:33pm] lrotenbe__: data[prefix + '.cpu.usage'] = stat['cpuUsage'] [2:33pm] lrotenbe__: KeyError: 'cpuUsage' ------------------------------------------------------------------- When tracing the code it was found that the stats['cpuUsage'] entry was not being initialized in two places where other entries were: --------------------------------------------------------------- 1. Module: ./lib/vdsm/virt/vmstats.py: Function: def cpu(stats, first_sample, last_sample, interval): Bug: stats['cpuUsage'] is not initialized before the if statements whereas cpuSys and cpuUser are. 2. Module: ./lib/vdsm/host/stats.py: Function: def produce(first_sample, last_sample): Possible bug: stats['cpuUsage'] is not initialized here whereas cpuSys and cpuUser are. -------------------------------------------------------------- Version-Release number of selected component (if applicable): ovirt 4.2 and 4.1 How reproducible: I did not reproduce this, but Liran simulated this when disabling Ballooning on the VM. Possible Steps to Reproduce: This error showed up in hot-plug memory test. In this scenario i saw it is in the test that do the next things: 1. create a VM (memory ballooning is disabled on the VM) 2. start the vm 2. Add 5 times a 256MB of memory Actual results: When it occurs it prints the KeyError after throwing an exception. Expected results: The stats['cpuUsage'] should be initialized so that the exception can be avoided. Additional info:
depends on the fate of vdsm's collectd stats gathering - https://bugzilla.redhat.com/show_bug.cgi?id=1532692#c11 nevertheless, it can be fixed easily, it's already merged to master for some time
Tried to verify on: vdsm-4.20.25-1.el7ev.x86_64 ovirt-engine-4.2.3-0.1.el7.noarch Steps: 1. Started VM 2. Wait to see if the above traceback is shown. Result: The Traceback is still found in the vdsm log for example: 2018-04-24 10:18:13,104+0300 ERROR (periodic/2) [virt.vmstats] VM metrics collection failed (vmstats:264) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics data[prefix + '.cpu.usage'] = stat['cpuUsage'] KeyError: 'cpuUsage' 2018-04-24 10:25:42,995+0300 ERROR (periodic/4) [virt.vmstats] VM metrics collection failed (vmstats:264) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics data[prefix + '.cpu.usage'] = stat['cpuUsage'] KeyError: 'cpuUsage' 2018-04-24 10:26:57,982+0300 WARN (periodic/0) [virt.periodic.VmDispatcher] could not run <class 'vdsm.virt.periodic.BlockjobMonitor'> on ['3e685cf8-ea27-4a87-9c51-0f7edb550e64'] (periodic:323) 2018-04-24 10:26:58,037+0300 ERROR (periodic/4) [virt.vmstats] VM metrics collection failed (vmstats:264) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics data[prefix + '.cpu.usage'] = stat['cpuUsage'] KeyError: 'cpuUsage' 2018-04-24 10:26:58,037+0300 ERROR (periodic/4) [virt.vmstats] VM metrics collection failed (vmstats:264) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 197, in send_metrics data[prefix + '.cpu.usage'] = stat['cpuUsage'] KeyError: 'cpuUsage' Additional Info: I didn't see these tracebacks after the vm is on, not when idle and not when doing the steps in comment #0. I saw errors regarding cpuUsage and vmstats when powering on and shutting down vms. Therefore, for reproducing try to see the logs when/right after starting a vm and shutting it down. 2018-04-24 11:24:30,268+0300 ERROR (libvirt/events) [vds] Error running VM callback (clientIF:683) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 646, in dispatchLibvirtEvents v.onLibvirtLifecycleEvent(event, detail, None) 2018-04-24 10:25:35,503+0300 WARN (vdsm.Scheduler) [Executor] executor state: count=5 workers=set([<Worker name=periodic/3 running <Task <Operation action=<vdsm.virt.sampling.HostMonitor object at 0x7f41dc151b9 0> at 0x7f41dc151c10> timeout=15, duration=3 at 0x7f41b81f3610> task#=309 at 0x7f41dc151210>, <Worker name=periodic/2 waiting task#=319 at 0x7f41dc0c6e90>, <Worker name=periodic/6 waiting task#=0 at 0x7f41b86cbd 50>, <Worker name=periodic/0 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f41dc1519d0> at 0x7f41dc151b50> timeout=7.5, duration=7 at 0x7f41dc0c6490> discarded t ask#=314 at 0x7f41b83d8bd0>, <Worker name=periodic/5 waiting task#=266 at 0x7f41dc0c6b50>]) (executor:213) 2018-04-24 10:25:35,504+0300 INFO (vdsm.Scheduler) [Executor] Worker discarded: <Worker name=periodic/0 running <Task discardable <Operation action=<vdsm.virt.sampling.VMBulkstatsMonitor object at 0x7f41dc1519d 0> at 0x7f41dc151b50> timeout=7.5, duration=7 at 0x7f41dc0c6490> discarded task#=314 at 0x7f41b83d8bd0> (executor:355) 2018-04-24 10:25:38,088+0300 WARN (periodic/0) [virt.vmstats] Missing stat: 'net.0.rx.errs' for vm db72df1c-4155-4135-8b6e-703ccc0b2d29 (vmstats:558) 2018-04-24 10:25:38,088+0300 WARN (periodic/0) [virt.vmstats] Missing stat: 'net.0.rx.bytes' for vm db72df1c-4155-4135-8b6e-703ccc0b2d29 (vmstats:558) 2018-04-24 10:25:38,089+0300 WARN (periodic/0) [virt.vmstats] Missing stat: 'net.0.rx.errs' for vm a2d3cf99-c095-438f-8030-3a58e7775cb0 (vmstats:558) 2018-04-24 10:25:38,089+0300 WARN (periodic/0) [virt.vmstats] Missing stat: 'net.0.rx.bytes' for vm a2d3cf99-c095-438f-8030-3a58e7775cb0 (vmstats:558) 2018-04-24 10:25:38,090+0300 ERROR (periodic/0) [virt.vmstats] VM metrics collection failed (vmstats:264) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 248, in send_metrics data[netprefix + '.rx_bytes'] = if_info['rx'] KeyError: 'rx'
you should verify with vdsm-4.20.26, not 25
Verified on: ovirt-engine-4.2.3.2-0.1.el7.noarch vdsm-4.20.26-1.el7ev.x86_64 I did the steps on comment #2. Result: The traceback from comment #0 did not showed in vdsm logs.
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.