Bug 1786458 - Python3: broker fails to update engine health status
Summary: Python3: broker fails to update engine health status
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Broker
Version: 2.4.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ovirt-4.4.0
: 2.4.2
Assignee: Yedidyah Bar David
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks: 1665138
TreeView+ depends on / blocked
 
Reported: 2019-12-25 11:03 UTC by Yedidyah Bar David
Modified: 2020-05-20 20:02 UTC (History)
1 user (show)

Fixed In Version: ovirt-hosted-engine-ha-2.4.2
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-20 20:02:41 UTC
oVirt Team: Integration
Embargoed:
sbonazzo: ovirt-4.4?
sbonazzo: planning_ack?
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 105937 0 master MERGED broker: engine_health: Fix for python3 2020-05-18 08:40:52 UTC

Description Yedidyah Bar David 2019-12-25 11:03:17 UTC
Description of problem:

After deploy, shutdown and startup, 'hosted-engine --vm-status' keeps showing 'Engine status                      : null'. broker.log has:
                                                   
Thread-209::ERROR::2019-12-25 12:56:39,241::submonitor_base::119::ovirt_hosted_engine_ha.broker.submonitor_base.SubmonitorBase::(_worker) Error executing submonitor engine-health, args {'add
ress': '0', 'use_ssl': 'true', 'vm_uuid': 'b3bc7f7b-2b88-4758-8192-05242f61ba21'}
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitor_base.py", line 115, in _worker                                                                             
    self.action(self._options)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors/engine_health.py", line 117, in action                                                                    
    self._update_stats(stats, vdsm_ts, local_ts)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors/engine_health.py", line 122, in _update_stats                                                             
    if not self._newer_timestamp(vdsm_ts, local_ts):
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors/engine_health.py", line 160, in _newer_timestamp                                                          
    return local_ts > self._stats_local_timestamp
TypeError: '>' not supported between instances of 'int' and 'NoneType'

Version-Release number of selected component (if applicable):
Current master

How reproducible:
Always, I think. Not sure how it does not happen right after deploy, but it does happen after reboot.

Steps to Reproduce:
1. deploy hosted-engine
2. set global maintenance, shutdown engine machine, shutdown hosts
3. start hosts
4. disable maintenance

Actual results:
See above

Expected results:
Should show correct engine status

Additional info:

Comment 1 Polina 2020-04-07 17:00:29 UTC
verified on http://bob-dr.lab.eng.brq.redhat.com/builds/4.4/rhv-4.4.0-29

scenario1:
hosted-engine --set-maintenance --mode=global

hosted-engine --vm-status

hosted-engine --vm-poweroff
status after poweroff  http://pastebin.test.redhat.com/852935

hosted-engine --vm-start
Command VM.getStats with args {'vmID': '9862d825-5d39-493b-b692-597dcb8496be'} failed:
(code=1, message=Virtual machine does not exist: {'vmId': '9862d825-5d39-493b-b692-597dcb8496be'})
VM in WaitForLaunch
status after start http://pastebin.test.redhat.com/852934

hosted-engine --set-maintenance --mode=none

scenario2:

hosted-engine --set-maintenance --mode=global
hosted-engine --vm-status
hosted-engine --vm-poweroff
poweroff all three hosts in the setup
start the hosts 
hosted-engine --vm-start
hosted-engine --set-maintenance --mode=none
hosted-engine --vm-status

correct status, no error

...

--== Host ocelot03.qa.lab.tlv.redhat.com (id: 3) status ==--

Host ID                            : 3
Host timestamp                     : 360
Score                              : 3400
Engine status                      : {"vm": "up", "health": "good", "detail": "Up"}
Hostname                           : ocelot03.qa.lab.tlv.redhat.com
Local maintenance                  : False
stopped                            : False
crc32                              : c841b4e0
conf_on_shared_storage             : True
local_conf_timestamp               : 360
Status up-to-date                  : True
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=360 (Tue Apr  7 19:59:07 2020)
	host-id=3
	score=3400
	vm_conf_refresh_time=360 (Tue Apr  7 19:59:07 2020)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineUp
	stopped=False

Comment 2 Sandro Bonazzola 2020-05-20 20:02:41 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.