Bug 1786458

Summary: Python3: broker fails to update engine health status
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Yedidyah Bar David <didi>
Component: BrokerAssignee: Yedidyah Bar David <didi>
Status: CLOSED CURRENTRELEASE QA Contact: Polina <pagranat>
Severity: urgent Docs Contact:
Priority: high    
Version: 2.4.0CC: bugs
Target Milestone: ovirt-4.4.0Flags: sbonazzo: ovirt-4.4?
sbonazzo: planning_ack?
sbonazzo: devel_ack+
mavital: testing_ack+
Target Release: 2.4.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-ha-2.4.2 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-20 20:02:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1665138    

Description Yedidyah Bar David 2019-12-25 11:03:17 UTC
Description of problem:

After deploy, shutdown and startup, 'hosted-engine --vm-status' keeps showing 'Engine status                      : null'. broker.log has:
                                                   
Thread-209::ERROR::2019-12-25 12:56:39,241::submonitor_base::119::ovirt_hosted_engine_ha.broker.submonitor_base.SubmonitorBase::(_worker) Error executing submonitor engine-health, args {'add
ress': '0', 'use_ssl': 'true', 'vm_uuid': 'b3bc7f7b-2b88-4758-8192-05242f61ba21'}
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitor_base.py", line 115, in _worker                                                                             
    self.action(self._options)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors/engine_health.py", line 117, in action                                                                    
    self._update_stats(stats, vdsm_ts, local_ts)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors/engine_health.py", line 122, in _update_stats                                                             
    if not self._newer_timestamp(vdsm_ts, local_ts):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors/engine_health.py", line 160, in _newer_timestamp                                                          
    return local_ts > self._stats_local_timestamp
TypeError: '>' not supported between instances of 'int' and 'NoneType'

Version-Release number of selected component (if applicable):
Current master

How reproducible:
Always, I think. Not sure how it does not happen right after deploy, but it does happen after reboot.

Steps to Reproduce:
1. deploy hosted-engine
2. set global maintenance, shutdown engine machine, shutdown hosts
3. start hosts
4. disable maintenance

Actual results:
See above

Expected results:
Should show correct engine status

Additional info:

Comment 1 Polina 2020-04-07 17:00:29 UTC
verified on http://bob-dr.lab.eng.brq.redhat.com/builds/4.4/rhv-4.4.0-29

scenario1:
hosted-engine --set-maintenance --mode=global

hosted-engine --vm-status

hosted-engine --vm-poweroff
status after poweroff  http://pastebin.test.redhat.com/852935

hosted-engine --vm-start
Command VM.getStats with args {'vmID': '9862d825-5d39-493b-b692-597dcb8496be'} failed:
(code=1, message=Virtual machine does not exist: {'vmId': '9862d825-5d39-493b-b692-597dcb8496be'})
VM in WaitForLaunch
status after start http://pastebin.test.redhat.com/852934

hosted-engine --set-maintenance --mode=none

scenario2:

hosted-engine --set-maintenance --mode=global
hosted-engine --vm-status
hosted-engine --vm-poweroff
poweroff all three hosts in the setup
start the hosts 
hosted-engine --vm-start
hosted-engine --set-maintenance --mode=none
hosted-engine --vm-status

correct status, no error

...

--== Host ocelot03.qa.lab.tlv.redhat.com (id: 3) status ==--

Host ID                            : 3
Host timestamp                     : 360
Score                              : 3400
Engine status                      : {"vm": "up", "health": "good", "detail": "Up"}
Hostname                           : ocelot03.qa.lab.tlv.redhat.com
Local maintenance                  : False
stopped                            : False
crc32                              : c841b4e0
conf_on_shared_storage             : True
local_conf_timestamp               : 360
Status up-to-date                  : True
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=360 (Tue Apr  7 19:59:07 2020)
	host-id=3
	score=3400
	vm_conf_refresh_time=360 (Tue Apr  7 19:59:07 2020)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineUp
	stopped=False

Comment 2 Sandro Bonazzola 2020-05-20 20:02:41 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.