1469143 – Hosted Engine HA state is in Local Maintenance when upgrading RHV-H

Bug 1469143 - Hosted Engine HA state is in Local Maintenance when upgrading RHV-H

Summary: Hosted Engine HA state is in Local Maintenance when upgrading RHV-H

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-hosted-engine-setup
Classification:	oVirt
Component:	Tools
Sub Component:
Version:	---
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.2.2
Target Release:	2.2.15
Assignee:	Yanir Quinn
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1359499
Blocks:	1458709 1489982 1536286
TreeView+	depends on / blocked

Reported:	2017-07-10 13:26 UTC by RamaKasturi
Modified:	2021-12-10 15:15 UTC (History)
CC List:	11 users (show)
Fixed In Version:	ovirt-hosted-engine-setup-2.2.15-1
Clone Of:
Clones:	1536286 (view as bug list)
Environment:
Last Closed:	2018-05-04 10:47:40 UTC
oVirt Team:	SLA
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.2+ sasundar: testing_ack+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHV-44197	None	None	None	2021-12-10 15:15:28 UTC
oVirt gerrit	88586	master	MERGED	agent: Add local maintenance manual mode	2020-11-16 16:13:03 UTC
oVirt gerrit	88599	master	MERGED	he: Set local maintenance manual mode	2020-11-16 16:12:41 UTC
oVirt gerrit	88908	v2.2.z	MERGED	agent: Add local maintenance manual mode	2020-11-16 16:12:41 UTC
oVirt gerrit	88909	ovirt-hosted-engine-setup-2.2	MERGED	he: Set local maintenance manual mode	2020-11-16 16:12:42 UTC

Description RamaKasturi 2017-07-10 13:26:40 UTC

Description of problem:
I see that when ever an upgrade of RHV-H 4.1.2 to 4.1.3 is done Hosted Engine Ha state is in Local Maintenance.

Version-Release number of selected component (if applicable):
ovirt-host-deploy-1.6.6-1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Install HC setup with RHV-H 4.1.2 async build
2. Now add all the required repos
3. There is an upgrade symbol next to the hypervisor.
4. click on that.


Actual results:
 RHV-H host gets upgraded to 4.1.3 leaving the Hosted Engine HA state in 'Local Maintenance"

Expected results:
 RHV-H host gets upgraded to 4.1.3 leaving the Hosted Engine HA state should not be in 'Local Maintenance"

Additional info:

Adding hosted-engine --vm-status before and after upgrade:

> Output of hosted-engine --vm-status before upgrade:
> =======================================================
> 
> [root@yarrow ~]# hosted-engine --vm-status
> 
> 
> --== Host 1 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : yarrow.lab.eng.blr.redhat.com
> Host ID                            : 1
> Engine status                      : {"health": "good", "vm": "up",
> "detail": "up"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : b4359588
> local_conf_timestamp               : 75583
> Host timestamp                     : 75567
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=75567 (Thu Jul  6 15:09:26 2017)
> 	host-id=1
> 	score=3400
> 	vm_conf_refresh_time=75583 (Thu Jul  6 15:09:42 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineUp
> 	stopped=False
> 
> 
> --== Host 2 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : tettnang.lab.eng.blr.redhat.com
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 1800
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 7bfbbfd5
> local_conf_timestamp               : 1440
> Host timestamp                     : 1423
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=1423 (Thu Jul  6 15:09:07 2017)
> 	host-id=2
> 	score=1800
> 	vm_conf_refresh_time=1440 (Thu Jul  6 15:09:23 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> 
> --== Host 3 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : zod.lab.eng.blr.redhat.com
> Host ID                            : 3
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 7caabb48
> local_conf_timestamp               : 75597
> Host timestamp                     : 75581
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=75581 (Thu Jul  6 15:09:23 2017)
> 	host-id=3
> 	score=3400
> 	vm_conf_refresh_time=75597 (Thu Jul  6 15:09:39 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> Output of hosted-engine --vm-status after upgrade:
> ===================================================
> 
> [root@yarrow ~]# hosted-engine --vm-status
> 
> 
> --== Host 1 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : yarrow.lab.eng.blr.redhat.com
> Host ID                            : 1
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 0
> stopped                            : False
> Local maintenance                  : True
> crc32                              : bc34659d
> local_conf_timestamp               : 7624
> Host timestamp                     : 7608
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=7608 (Thu Jul  6 17:50:33 2017)
> 	host-id=1
> 	score=0
> 	vm_conf_refresh_time=7624 (Thu Jul  6 17:50:48 2017)
> 	conf_on_shared_storage=True
> 	maintenance=True
> 	state=LocalMaintenance
> 	stopped=False
> 
> 
> --== Host 2 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : tettnang.lab.eng.blr.redhat.com
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 1800
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 521f80d4
> local_conf_timestamp               : 11121
> Host timestamp                     : 11105
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=11105 (Thu Jul  6 17:50:29 2017)
> 	host-id=2
> 	score=1800
> 	vm_conf_refresh_time=11121 (Thu Jul  6 17:50:45 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> 
> --== Host 3 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : zod.lab.eng.blr.redhat.com
> Host ID                            : 3
> Engine status                      : {"health": "good", "vm": "up",
> "detail": "up"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 77b3a2d6
> local_conf_timestamp               : 85262
> Host timestamp                     : 85246
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=85246 (Thu Jul  6 17:50:28 2017)
> 	host-id=3
> 	score=3400
> 	vm_conf_refresh_time=85262 (Thu Jul  6 17:50:44 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineUp
> 	stopped=False
> 
> 
> cat /var/lib/ovirt-hosted-engine-ha/ha.conf
> local_maintenance=True

Comment 1 Yaniv Lavi 2017-07-17 09:19:22 UTC

Can you check for a regression in the hot activation flow? It is supposed to move the host out of local maintenance.

Comment 2 Artyom 2017-07-19 11:48:04 UTC

So it does not a regression in the host activation flow, the problem is:
1) Move host to maintenance via engine(will activate HE "LocalMaintenance" state)
2) Upgrade the host via the engine, after the upgrade host moved straight forward to up state, so from the engine side host is UP, but from the HE side, the host has state "LocalMaintenance", because no one ran activate command on the engine side.
See also bug with the similar problem - https://bugzilla.redhat.com/show_bug.cgi?id=1468875

Comment 3 Sandro Bonazzola 2017-11-18 07:50:36 UTC

Denis is this going to land in 4.2.0? If not please re-target.

Comment 4 Yaniv Lavi 2018-02-14 12:50:07 UTC

This is severe and should not be targeted so far in the future.
The maintenance mode for HE should be lock the the engine maintenance mode, if the engine is up.

Maintaining this in upgrade it elementary. Retargeting.

Comment 5 Sandro Bonazzola 2018-03-06 12:08:01 UTC

Moving back to ASSIGNED since this patch is on ovirt-host-deploy but no patches are attaced to this bug and no open patches are pushed in gerrit against ovirt-host-deploy.
Please update the status of this bug setting the correct product and adding references to the patches being pushed for review.

Comment 6 Yanir Quinn 2018-03-07 14:09:16 UTC

(In reply to Sandro Bonazzola from comment #5)
> Moving back to ASSIGNED since this patch is on ovirt-host-deploy but no
> patches are attaced to this bug and no open patches are pushed in gerrit
> against ovirt-host-deploy.
> Please update the status of this bug setting the correct product and adding
> references to the patches being pushed for review.

Its tricky,
Product should be either ovirt-engine/ovirt-hosted-engine-ha/ovirt-hosted-engine-setup.

I have patches for ovirt-hosted-engine-ha and ovirt-hosted-engine-setup.

And for this bug (and other bugs) we also need the ovirt-engine patch here as well:
https://gerrit.ovirt.org/#/c/86645/

Comment 7 SATHEESARAN 2018-05-03 17:59:02 UTC

Tested with ovirt-hosted-engine-setup-2.2.19

While updating the host from previous nightly build to other, the state of the node is back to UP.

Comment 8 Sandro Bonazzola 2018-05-04 10:47:40 UTC

This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.