Bug 1469143 - Hosted Engine HA state is in Local Maintenance when upgrading RHV-H
Summary: Hosted Engine HA state is in Local Maintenance when upgrading RHV-H
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: Tools
Version: ---
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.2.2
: 2.2.15
Assignee: Yanir Quinn
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On: 1359499
Blocks: 1458709 1489982 1536286
TreeView+ depends on / blocked
 
Reported: 2017-07-10 13:26 UTC by RamaKasturi
Modified: 2021-12-10 15:15 UTC (History)
11 users (show)

Fixed In Version: ovirt-hosted-engine-setup-2.2.15-1
Doc Type: Bug Fix
Doc Text:
Cause: Changing the hosted engine local maintenance state could be done via the hosted engine tool and also by engine logic Consequence: Combining both the tool and engine logic could lead to an inconsistent local maintenance state Fix: when setting the local maintenance state with the hosted engine tool : - hosted-engine --set-maintenance --mode=local (Local maintenance = true) it could change only by using the tool again: - hosted-engine --set-maintenance --mode=none (Local maintenance = false) Result: The HA agent will to be aware of manual mode of local maintenance in addition to the existing local maintenance mode.
Clone Of:
: 1536286 (view as bug list)
Environment:
Last Closed: 2018-05-04 10:47:40 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-4.2+
sasundar: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-44197 0 None None None 2021-12-10 15:15:28 UTC
oVirt gerrit 88586 0 master MERGED agent: Add local maintenance manual mode 2020-11-16 16:13:03 UTC
oVirt gerrit 88599 0 master MERGED he: Set local maintenance manual mode 2020-11-16 16:12:41 UTC
oVirt gerrit 88908 0 v2.2.z MERGED agent: Add local maintenance manual mode 2020-11-16 16:12:41 UTC
oVirt gerrit 88909 0 ovirt-hosted-engine-setup-2.2 MERGED he: Set local maintenance manual mode 2020-11-16 16:12:42 UTC

Description RamaKasturi 2017-07-10 13:26:40 UTC
Description of problem:
I see that when ever an upgrade of RHV-H 4.1.2 to 4.1.3 is done Hosted Engine Ha state is in Local Maintenance.

Version-Release number of selected component (if applicable):
ovirt-host-deploy-1.6.6-1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Install HC setup with RHV-H 4.1.2 async build
2. Now add all the required repos
3. There is an upgrade symbol next to the hypervisor.
4. click on that.


Actual results:
 RHV-H host gets upgraded to 4.1.3 leaving the Hosted Engine HA state in 'Local Maintenance"

Expected results:
 RHV-H host gets upgraded to 4.1.3 leaving the Hosted Engine HA state should not be in 'Local Maintenance"

Additional info:

Adding hosted-engine --vm-status before and after upgrade:

> Output of hosted-engine --vm-status before upgrade:
> =======================================================
> 
> [root@yarrow ~]# hosted-engine --vm-status
> 
> 
> --== Host 1 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : yarrow.lab.eng.blr.redhat.com
> Host ID                            : 1
> Engine status                      : {"health": "good", "vm": "up",
> "detail": "up"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : b4359588
> local_conf_timestamp               : 75583
> Host timestamp                     : 75567
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=75567 (Thu Jul  6 15:09:26 2017)
> 	host-id=1
> 	score=3400
> 	vm_conf_refresh_time=75583 (Thu Jul  6 15:09:42 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineUp
> 	stopped=False
> 
> 
> --== Host 2 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : tettnang.lab.eng.blr.redhat.com
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 1800
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 7bfbbfd5
> local_conf_timestamp               : 1440
> Host timestamp                     : 1423
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=1423 (Thu Jul  6 15:09:07 2017)
> 	host-id=2
> 	score=1800
> 	vm_conf_refresh_time=1440 (Thu Jul  6 15:09:23 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> 
> --== Host 3 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : zod.lab.eng.blr.redhat.com
> Host ID                            : 3
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 7caabb48
> local_conf_timestamp               : 75597
> Host timestamp                     : 75581
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=75581 (Thu Jul  6 15:09:23 2017)
> 	host-id=3
> 	score=3400
> 	vm_conf_refresh_time=75597 (Thu Jul  6 15:09:39 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> Output of hosted-engine --vm-status after upgrade:
> ===================================================
> 
> [root@yarrow ~]# hosted-engine --vm-status
> 
> 
> --== Host 1 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : yarrow.lab.eng.blr.redhat.com
> Host ID                            : 1
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 0
> stopped                            : False
> Local maintenance                  : True
> crc32                              : bc34659d
> local_conf_timestamp               : 7624
> Host timestamp                     : 7608
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=7608 (Thu Jul  6 17:50:33 2017)
> 	host-id=1
> 	score=0
> 	vm_conf_refresh_time=7624 (Thu Jul  6 17:50:48 2017)
> 	conf_on_shared_storage=True
> 	maintenance=True
> 	state=LocalMaintenance
> 	stopped=False
> 
> 
> --== Host 2 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : tettnang.lab.eng.blr.redhat.com
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 1800
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 521f80d4
> local_conf_timestamp               : 11121
> Host timestamp                     : 11105
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=11105 (Thu Jul  6 17:50:29 2017)
> 	host-id=2
> 	score=1800
> 	vm_conf_refresh_time=11121 (Thu Jul  6 17:50:45 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> 
> --== Host 3 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : zod.lab.eng.blr.redhat.com
> Host ID                            : 3
> Engine status                      : {"health": "good", "vm": "up",
> "detail": "up"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 77b3a2d6
> local_conf_timestamp               : 85262
> Host timestamp                     : 85246
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=85246 (Thu Jul  6 17:50:28 2017)
> 	host-id=3
> 	score=3400
> 	vm_conf_refresh_time=85262 (Thu Jul  6 17:50:44 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineUp
> 	stopped=False
> 
> 
> cat /var/lib/ovirt-hosted-engine-ha/ha.conf
> local_maintenance=True

Comment 1 Yaniv Lavi 2017-07-17 09:19:22 UTC
Can you check for a regression in the hot activation flow? It is supposed to move the host out of local maintenance.

Comment 2 Artyom 2017-07-19 11:48:04 UTC
So it does not a regression in the host activation flow, the problem is:
1) Move host to maintenance via engine(will activate HE "LocalMaintenance" state)
2) Upgrade the host via the engine, after the upgrade host moved straight forward to up state, so from the engine side host is UP, but from the HE side, the host has state "LocalMaintenance", because no one ran activate command on the engine side.
See also bug with the similar problem - https://bugzilla.redhat.com/show_bug.cgi?id=1468875

Comment 3 Sandro Bonazzola 2017-11-18 07:50:36 UTC
Denis is this going to land in 4.2.0? If not please re-target.

Comment 4 Yaniv Lavi 2018-02-14 12:50:07 UTC
This is severe and should not be targeted so far in the future.
The maintenance mode for HE should be lock the the engine maintenance mode, if the engine is up.

Maintaining this in upgrade it elementary. Retargeting.

Comment 5 Sandro Bonazzola 2018-03-06 12:08:01 UTC
Moving back to ASSIGNED since this patch is on ovirt-host-deploy but no patches are attaced to this bug and no open patches are pushed in gerrit against ovirt-host-deploy.
Please update the status of this bug setting the correct product and adding references to the patches being pushed for review.

Comment 6 Yanir Quinn 2018-03-07 14:09:16 UTC
(In reply to Sandro Bonazzola from comment #5)
> Moving back to ASSIGNED since this patch is on ovirt-host-deploy but no
> patches are attaced to this bug and no open patches are pushed in gerrit
> against ovirt-host-deploy.
> Please update the status of this bug setting the correct product and adding
> references to the patches being pushed for review.

Its tricky,
Product should be either ovirt-engine/ovirt-hosted-engine-ha/ovirt-hosted-engine-setup.

I have patches for ovirt-hosted-engine-ha and ovirt-hosted-engine-setup.

And for this bug (and other bugs) we also need the ovirt-engine patch here as well:
https://gerrit.ovirt.org/#/c/86645/

Comment 7 SATHEESARAN 2018-05-03 17:59:02 UTC
Tested with ovirt-hosted-engine-setup-2.2.19

While updating the host from previous nightly build to other, the state of the node is back to UP.

Comment 8 Sandro Bonazzola 2018-05-04 10:47:40 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.