Bug 1119775

Summary: mom error parsing vdsm stats if cpu tune information is missing
Product: [Retired] oVirt Reporter: Francesco Romani <fromani>
Component: momAssignee: Martin Sivák <msivak>
Status: CLOSED CURRENTRELEASE QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: dfediuck, gklein, iheim, michal.skrivanek, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-17 12:39:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Francesco Romani 2014-07-15 13:26:07 UTC
Description of problem:

while verifying http://gerrit.ovirt.org/#/c/12820/11 I observed the functional
test fail. The MOM log shows this:

2014-07-15 09:25:26,879 - mom - INFO - MOM starting
2014-07-15 09:25:26,952 - mom - INFO - hypervisor interface vdsm
2014-07-15 09:25:26,952 - mom.HostMonitor - INFO - Host Monitor starting
2014-07-15 09:25:26,958 - mom.GuestManager - INFO - Guest Manager starting
2014-07-15 09:25:26,972 - mom.Policy - INFO - Loaded policy '00-defines'
2014-07-15 09:25:26,995 - mom.HostMonitor - INFO - HostMonitor is ready
2014-07-15 09:25:27,010 - mom.Policy - INFO - Loaded policy '02-balloon'
2014-07-15 09:25:27,047 - mom.Policy - INFO - Loaded policy '03-ksm'
2014-07-15 09:25:27,114 - mom.Policy - INFO - Loaded policy '04-cputune'
2014-07-15 09:25:27,115 - mom.PolicyEngine - INFO - Policy Engine starting
2014-07-15 09:25:27,116 - mom.RPCServer - INFO - RPC Server is disabled
2014-07-15 09:25:37,181 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:0 run:0 sleep_millisecs:0
2014-07-15 09:26:02,011 - mom.Monitor - INFO - GuestMonitor-vdsm_testBalloonVM starting
2014-07-15 09:26:02,012 - mom.Collectors.GuestMemory - WARNING - getVmMemoryStats() error: The ovirt-guest-agent is not active
2014-07-15 09:26:02,013 - mom.Monitor - ERROR - GuestMonitor-vdsm_testBalloonVM crashed
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/mom/GuestMonitor.py", line 56, in run
self.collect()
File "/usr/lib/python2.6/site-packages/mom/Monitor.py", line 91, in collect
collected = c.collect()
File "/usr/lib/python2.6/site-packages/mom/Collectors/GuestCpuTune.py", line 44, in collect
stat = self.hypervisor_iface.getVmCpuTuneInfo(self.uuid)
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 184, in getVmCpuTuneInfo
vcpuCount = response['statsList'][0]['vcpuCount']
KeyError: 'vcpuCount'
2014-07-15 09:26:12,023 - mom.Monitor - INFO - GuestMonitor-vdsm_testBalloonVM starting
2014-07-15 09:26:12,023 - mom.Collectors.GuestMemory - WARNING - getVmMemoryStats() error: The ovirt-guest-agent is not active
2014-07-15 09:26:12,024 - mom.Monitor - ERROR - GuestMonitor-vdsm_testBalloonVM crashed
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/mom/GuestMonitor.py", line 56, in run
self.collect()
File "/usr/lib/python2.6/site-packages/mom/Monitor.py", line 91, in collect
collected = c.collect()
File "/usr/lib/python2.6/site-packages/mom/Collectors/GuestCpuTune.py", line 44, in collect
stat = self.hypervisor_iface.getVmCpuTuneInfo(self.uuid)
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 184, in getVmCpuTuneInfo
vcpuCount = response['statsList'][0]['vcpuCount']
KeyError: 'vcpuCount'
2014-07-15 09:26:22,024 - mom.Monitor - INFO - GuestMonitor-vdsm_testBalloonVM starting
2014-07-15 09:26:22,025 - mom.Collectors.GuestMemory - WARNING - getVmMemoryStats() error: The ovirt-guest-agent is not active
2014-07-15 09:26:30,924 - mom.RPCServer - INFO - setPolicy()
2014-07-15 09:26:32,032 - mom.vdsmInterface - ERROR - {'status': {'message': 'Virtual machine does not exist', 'code': 1}}
2014-07-15 09:26:32,033 - mom.vdsmInterface - ERROR - Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 146, in getVmBalloonInfo
self._check_status(response)
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 46, in _check_status
raise vdsmException(response, self.logger)
vdsmException

2014-07-15 09:26:32,033 - mom.vdsmInterface - ERROR - {'status': {'message': 'Virtual machine does not exist', 'code': 1}}
2014-07-15 09:26:32,033 - mom.vdsmInterface - ERROR - Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 171, in getVmCpuTuneInfo
self._check_status(response)
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 46, in _check_status
raise vdsmException(response, self.logger)
vdsmException

2014-07-15 09:26:37,039 - mom.Monitor - INFO - GuestMonitor-vdsm_testBalloonVM ending


Version-Release number of selected component (if applicable):
platform: RHEL 6.5 with updates
VDSM from today's master (2014-07-15)
mom version: mom-0.4.1-2.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. run VDSM + mom on a (supported) host which does not provides cpu tune information, like RHEL 6.5
2. just run the tests as per http://gerrit.ovirt.org/#/c/12820/11
3. probably could be enough to just run VDSM + mom on such host

Actual results:
MOM's GuestMonitor ends prematurely


Expected results:
MOM's GuestMonitor continues to run


Additional info:

Comment 1 Francesco Romani 2014-07-15 13:27:02 UTC
According to Adam Litke, a possible solution could be:
"[...] The KeyError should be caught in the vdsmInterface and treated as a CollectionError.[...]"

Comment 2 Adam Litke 2014-07-15 13:39:56 UTC
Kobi, please take a look.

Comment 3 Michal Skrivanek 2014-07-16 12:02:02 UTC
likely a 3.5 blocker

Comment 5 Sandro Bonazzola 2014-10-17 12:39:17 UTC
oVirt 3.5 has been released and should include the fix for this issue.