Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1092373

Summary: agent dies while monitoring the engine
Product: Red Hat Enterprise Virtualization Manager Reporter: Jiri Moskovcak <jmoskovc>
Component: ovirt-hosted-engine-haAssignee: Jiri Moskovcak <jmoskovc>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: acathrow, adahms, dfediuck, gklein, iheim, jmoskovc, mavital, sbonazzo, sherold, yeylon
Target Milestone: ---   
Target Release: 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sla
Fixed In Version: ovirt-hosted-engine-ha-1.1.2-2.el6ev Doc Type: Bug Fix
Doc Text:
Previously, the ovirt-ha-agent would fail under certain circumstances. This was caused by the error handling logic used by the agent, whereby attempts to convert a null value into a floating point number would result in an uncaught exception. Now, the error handling logic has been revised so that this exception is caught correctly, preventing the agent from failing under these circumstances.
Story Points: ---
Clone Of: 1091360 Environment:
Last Closed: 2014-06-09 14:26:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1091360    
Bug Blocks:    

Description Jiri Moskovcak 2014-04-29 07:56:38 UTC
+++ This bug was initially created as a clone of Bug #1091360 +++

Description of problem:
This bugzilla is based on a report from ovirt-users ml.

Actual results:
MainThread::WARNING::2014-04-02 17:46:15,463::hosted_engine::334::ovirt_hosted_e                           ngine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Error while monito                           ring engine: float() argument must be a string or a number
MainThread::WARNING::2014-04-02 17:46:15,464::hosted_engine::337::ovirt_hosted_e                           ngine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Unexpected error
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_eng                           ine.py", line 323, in start_monitoring
    state.score(self._log))
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/states.py"                           , line 160, in score
    lm, logger, score, score_cfg)
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/states.py"                           , line 61, in _penalize_memory
    if self._float_or_default(lm['mem-free'], 0) < vm_mem:
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/states.py"                           , line 51, in _float_or_default
    return float(value)
TypeError: float() argument must be a string or a number
MainThread::ERROR::2014-04-02 17:46:15,464::hosted_engine::350::ovirt_hosted_eng                           ine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Shutting down the ag                           ent because of 3 failures in a row!
MainThread::INFO::2014-04-02 17:46:15,466::agent::116::ovirt_hosted_engine_ha.ag                           ent.agent.Agent::(run) Agent shutting down

Comment 2 Nikolai Sednev 2014-05-01 15:14:57 UTC
Hi,
We need clear steps for reproduction and expected results, please kindly supply them.

Comment 3 Jiri Moskovcak 2014-05-15 12:34:25 UTC
Running the ovirt-ha-agent when the hosted engine VM is down should result in the exception.

Comment 4 Nikolai Sednev 2014-05-15 12:41:54 UTC
(In reply to Jiri Moskovcak from comment #3)
> Running the ovirt-ha-agent when the hosted engine VM is down should result
> in the exception.

Looks very familiar with this one:
https://bugzilla.redhat.com/show_bug.cgi?id=1097767

Comment 5 Nikolai Sednev 2014-05-15 12:59:35 UTC
Tested on 
libvirt-0.10.2-29.el6_5.7.x86_64
sanlock-2.8-1.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.9.x86_64
ovirt-hosted-engine-ha-1.1.2-3.el6ev.noarch
vdsm-4.14.7-2.el6ev.x86_64


[root@rose05 subsys]# service ovirt-ha-agent status
ovirt-ha-agent (pid 13255) is running...



[root@master-vds10 subsys]# hosted-engine --vm-status                                                                             


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : 10.35.64.85
Host ID                            : 1          
Engine status                      : {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
Score                              : 0                                                                                            
Local maintenance                  : False                                                                                        
Host timestamp                     : 1400158653                                                                                   
Extra metadata (valid at timestamp):                                                                                              
        metadata_parse_version=1                                                                                                  
        metadata_feature_version=1                                                                                                
        timestamp=1400158653 (Thu May 15 15:57:33 2014)                                                                           
        host-id=1                                                                                                                 
        score=0                                                                                                                   
        maintenance=False                                                                                                         
        state=EngineUnexpectedlyDown                                                                                              
        timeout=Thu May 15 16:02:29 2014                                                                                          


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : 10.35.97.36
Host ID                            : 2          
Engine status                      : {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
Score                              : 0                                                                                            
Local maintenance                  : False                                                                                        
Host timestamp                     : 1400158644                                                                                   
Extra metadata (valid at timestamp):                                                                                              
        metadata_parse_version=1                                                                                                  
        metadata_feature_version=1                                                                                                
        timestamp=1400158644 (Thu May 15 15:57:24 2014)                                                                           
        host-id=2                                                                                                                 
        score=0                                                                                                                   
        maintenance=False                                                                                                         
        state=EngineUnexpectedlyDown                                                                                              
        timeout=Thu May 15 15:59:40 2014

Comment 6 errata-xmlrpc 2014-06-09 14:26:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0671.html