Bug 1119775 - mom error parsing vdsm stats if cpu tune information is missing
Summary: mom error parsing vdsm stats if cpu tune information is missing
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: mom
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.5.0
Assignee: Martin Sivák
QA Contact: meital avital
URL:
Whiteboard: sla
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-15 13:26 UTC by Francesco Romani
Modified: 2016-02-10 19:42 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-17 12:39:17 UTC
oVirt Team: SLA


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 28977 master ABANDONED vcpuCount is optional in VDSM API, this fixes KeyError caused by that Never
oVirt gerrit 30181 master MERGED Fixing vcpuCount periodic error. Never
oVirt gerrit 30486 mom-0.4.1 MERGED Fixing vcpuCount periodic error. Never

Description Francesco Romani 2014-07-15 13:26:07 UTC
Description of problem:

while verifying http://gerrit.ovirt.org/#/c/12820/11 I observed the functional
test fail. The MOM log shows this:

2014-07-15 09:25:26,879 - mom - INFO - MOM starting
2014-07-15 09:25:26,952 - mom - INFO - hypervisor interface vdsm
2014-07-15 09:25:26,952 - mom.HostMonitor - INFO - Host Monitor starting
2014-07-15 09:25:26,958 - mom.GuestManager - INFO - Guest Manager starting
2014-07-15 09:25:26,972 - mom.Policy - INFO - Loaded policy '00-defines'
2014-07-15 09:25:26,995 - mom.HostMonitor - INFO - HostMonitor is ready
2014-07-15 09:25:27,010 - mom.Policy - INFO - Loaded policy '02-balloon'
2014-07-15 09:25:27,047 - mom.Policy - INFO - Loaded policy '03-ksm'
2014-07-15 09:25:27,114 - mom.Policy - INFO - Loaded policy '04-cputune'
2014-07-15 09:25:27,115 - mom.PolicyEngine - INFO - Policy Engine starting
2014-07-15 09:25:27,116 - mom.RPCServer - INFO - RPC Server is disabled
2014-07-15 09:25:37,181 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:0 run:0 sleep_millisecs:0
2014-07-15 09:26:02,011 - mom.Monitor - INFO - GuestMonitor-vdsm_testBalloonVM starting
2014-07-15 09:26:02,012 - mom.Collectors.GuestMemory - WARNING - getVmMemoryStats() error: The ovirt-guest-agent is not active
2014-07-15 09:26:02,013 - mom.Monitor - ERROR - GuestMonitor-vdsm_testBalloonVM crashed
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/mom/GuestMonitor.py", line 56, in run
self.collect()
File "/usr/lib/python2.6/site-packages/mom/Monitor.py", line 91, in collect
collected = c.collect()
File "/usr/lib/python2.6/site-packages/mom/Collectors/GuestCpuTune.py", line 44, in collect
stat = self.hypervisor_iface.getVmCpuTuneInfo(self.uuid)
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 184, in getVmCpuTuneInfo
vcpuCount = response['statsList'][0]['vcpuCount']
KeyError: 'vcpuCount'
2014-07-15 09:26:12,023 - mom.Monitor - INFO - GuestMonitor-vdsm_testBalloonVM starting
2014-07-15 09:26:12,023 - mom.Collectors.GuestMemory - WARNING - getVmMemoryStats() error: The ovirt-guest-agent is not active
2014-07-15 09:26:12,024 - mom.Monitor - ERROR - GuestMonitor-vdsm_testBalloonVM crashed
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/mom/GuestMonitor.py", line 56, in run
self.collect()
File "/usr/lib/python2.6/site-packages/mom/Monitor.py", line 91, in collect
collected = c.collect()
File "/usr/lib/python2.6/site-packages/mom/Collectors/GuestCpuTune.py", line 44, in collect
stat = self.hypervisor_iface.getVmCpuTuneInfo(self.uuid)
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 184, in getVmCpuTuneInfo
vcpuCount = response['statsList'][0]['vcpuCount']
KeyError: 'vcpuCount'
2014-07-15 09:26:22,024 - mom.Monitor - INFO - GuestMonitor-vdsm_testBalloonVM starting
2014-07-15 09:26:22,025 - mom.Collectors.GuestMemory - WARNING - getVmMemoryStats() error: The ovirt-guest-agent is not active
2014-07-15 09:26:30,924 - mom.RPCServer - INFO - setPolicy()
2014-07-15 09:26:32,032 - mom.vdsmInterface - ERROR - {'status': {'message': 'Virtual machine does not exist', 'code': 1}}
2014-07-15 09:26:32,033 - mom.vdsmInterface - ERROR - Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 146, in getVmBalloonInfo
self._check_status(response)
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 46, in _check_status
raise vdsmException(response, self.logger)
vdsmException

2014-07-15 09:26:32,033 - mom.vdsmInterface - ERROR - {'status': {'message': 'Virtual machine does not exist', 'code': 1}}
2014-07-15 09:26:32,033 - mom.vdsmInterface - ERROR - Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 171, in getVmCpuTuneInfo
self._check_status(response)
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 46, in _check_status
raise vdsmException(response, self.logger)
vdsmException

2014-07-15 09:26:37,039 - mom.Monitor - INFO - GuestMonitor-vdsm_testBalloonVM ending


Version-Release number of selected component (if applicable):
platform: RHEL 6.5 with updates
VDSM from today's master (2014-07-15)
mom version: mom-0.4.1-2.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. run VDSM + mom on a (supported) host which does not provides cpu tune information, like RHEL 6.5
2. just run the tests as per http://gerrit.ovirt.org/#/c/12820/11
3. probably could be enough to just run VDSM + mom on such host

Actual results:
MOM's GuestMonitor ends prematurely


Expected results:
MOM's GuestMonitor continues to run


Additional info:

Comment 1 Francesco Romani 2014-07-15 13:27:02 UTC
According to Adam Litke, a possible solution could be:
"[...] The KeyError should be caught in the vdsmInterface and treated as a CollectionError.[...]"

Comment 2 Adam Litke 2014-07-15 13:39:56 UTC
Kobi, please take a look.

Comment 3 Michal Skrivanek 2014-07-16 12:02:02 UTC
likely a 3.5 blocker

Comment 5 Sandro Bonazzola 2014-10-17 12:39:17 UTC
oVirt 3.5 has been released and should include the fix for this issue.


Note You need to log in before you can comment on or make changes to this bug.