Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1469143

Summary: Hosted Engine HA state is in Local Maintenance when upgrading RHV-H
Product: [oVirt] ovirt-hosted-engine-setup Reporter: RamaKasturi <knarra>
Component: ToolsAssignee: Yanir Quinn <yquinn>
Status: CLOSED CURRENTRELEASE QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: high    
Version: ---CC: alukiano, apinnick, bugs, dchaplyg, dfediuck, mavital, msivak, rgolan, stirabos, ylavi, yquinn
Target Milestone: ovirt-4.2.2Flags: rule-engine: ovirt-4.2+
sasundar: testing_ack+
Target Release: 2.2.15   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-setup-2.2.15-1 Doc Type: Bug Fix
Doc Text:
Cause: Changing the hosted engine local maintenance state could be done via the hosted engine tool and also by engine logic Consequence: Combining both the tool and engine logic could lead to an inconsistent local maintenance state Fix: when setting the local maintenance state with the hosted engine tool : - hosted-engine --set-maintenance --mode=local (Local maintenance = true) it could change only by using the tool again: - hosted-engine --set-maintenance --mode=none (Local maintenance = false) Result: The HA agent will to be aware of manual mode of local maintenance in addition to the existing local maintenance mode.
Story Points: ---
Clone Of:
: 1536286 (view as bug list) Environment:
Last Closed: 2018-05-04 10:47:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1359499    
Bug Blocks: 1458709, 1489982, 1536286    

Description RamaKasturi 2017-07-10 13:26:40 UTC
Description of problem:
I see that when ever an upgrade of RHV-H 4.1.2 to 4.1.3 is done Hosted Engine Ha state is in Local Maintenance.

Version-Release number of selected component (if applicable):
ovirt-host-deploy-1.6.6-1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Install HC setup with RHV-H 4.1.2 async build
2. Now add all the required repos
3. There is an upgrade symbol next to the hypervisor.
4. click on that.


Actual results:
 RHV-H host gets upgraded to 4.1.3 leaving the Hosted Engine HA state in 'Local Maintenance"

Expected results:
 RHV-H host gets upgraded to 4.1.3 leaving the Hosted Engine HA state should not be in 'Local Maintenance"

Additional info:

Adding hosted-engine --vm-status before and after upgrade:

> Output of hosted-engine --vm-status before upgrade:
> =======================================================
> 
> [root@yarrow ~]# hosted-engine --vm-status
> 
> 
> --== Host 1 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : yarrow.lab.eng.blr.redhat.com
> Host ID                            : 1
> Engine status                      : {"health": "good", "vm": "up",
> "detail": "up"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : b4359588
> local_conf_timestamp               : 75583
> Host timestamp                     : 75567
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=75567 (Thu Jul  6 15:09:26 2017)
> 	host-id=1
> 	score=3400
> 	vm_conf_refresh_time=75583 (Thu Jul  6 15:09:42 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineUp
> 	stopped=False
> 
> 
> --== Host 2 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : tettnang.lab.eng.blr.redhat.com
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 1800
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 7bfbbfd5
> local_conf_timestamp               : 1440
> Host timestamp                     : 1423
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=1423 (Thu Jul  6 15:09:07 2017)
> 	host-id=2
> 	score=1800
> 	vm_conf_refresh_time=1440 (Thu Jul  6 15:09:23 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> 
> --== Host 3 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : zod.lab.eng.blr.redhat.com
> Host ID                            : 3
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 7caabb48
> local_conf_timestamp               : 75597
> Host timestamp                     : 75581
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=75581 (Thu Jul  6 15:09:23 2017)
> 	host-id=3
> 	score=3400
> 	vm_conf_refresh_time=75597 (Thu Jul  6 15:09:39 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> Output of hosted-engine --vm-status after upgrade:
> ===================================================
> 
> [root@yarrow ~]# hosted-engine --vm-status
> 
> 
> --== Host 1 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : yarrow.lab.eng.blr.redhat.com
> Host ID                            : 1
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 0
> stopped                            : False
> Local maintenance                  : True
> crc32                              : bc34659d
> local_conf_timestamp               : 7624
> Host timestamp                     : 7608
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=7608 (Thu Jul  6 17:50:33 2017)
> 	host-id=1
> 	score=0
> 	vm_conf_refresh_time=7624 (Thu Jul  6 17:50:48 2017)
> 	conf_on_shared_storage=True
> 	maintenance=True
> 	state=LocalMaintenance
> 	stopped=False
> 
> 
> --== Host 2 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : tettnang.lab.eng.blr.redhat.com
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 1800
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 521f80d4
> local_conf_timestamp               : 11121
> Host timestamp                     : 11105
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=11105 (Thu Jul  6 17:50:29 2017)
> 	host-id=2
> 	score=1800
> 	vm_conf_refresh_time=11121 (Thu Jul  6 17:50:45 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> 
> --== Host 3 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : zod.lab.eng.blr.redhat.com
> Host ID                            : 3
> Engine status                      : {"health": "good", "vm": "up",
> "detail": "up"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 77b3a2d6
> local_conf_timestamp               : 85262
> Host timestamp                     : 85246
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=85246 (Thu Jul  6 17:50:28 2017)
> 	host-id=3
> 	score=3400
> 	vm_conf_refresh_time=85262 (Thu Jul  6 17:50:44 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineUp
> 	stopped=False
> 
> 
> cat /var/lib/ovirt-hosted-engine-ha/ha.conf
> local_maintenance=True

Comment 1 Yaniv Lavi 2017-07-17 09:19:22 UTC
Can you check for a regression in the hot activation flow? It is supposed to move the host out of local maintenance.

Comment 2 Artyom 2017-07-19 11:48:04 UTC
So it does not a regression in the host activation flow, the problem is:
1) Move host to maintenance via engine(will activate HE "LocalMaintenance" state)
2) Upgrade the host via the engine, after the upgrade host moved straight forward to up state, so from the engine side host is UP, but from the HE side, the host has state "LocalMaintenance", because no one ran activate command on the engine side.
See also bug with the similar problem - https://bugzilla.redhat.com/show_bug.cgi?id=1468875

Comment 3 Sandro Bonazzola 2017-11-18 07:50:36 UTC
Denis is this going to land in 4.2.0? If not please re-target.

Comment 4 Yaniv Lavi 2018-02-14 12:50:07 UTC
This is severe and should not be targeted so far in the future.
The maintenance mode for HE should be lock the the engine maintenance mode, if the engine is up.

Maintaining this in upgrade it elementary. Retargeting.

Comment 5 Sandro Bonazzola 2018-03-06 12:08:01 UTC
Moving back to ASSIGNED since this patch is on ovirt-host-deploy but no patches are attaced to this bug and no open patches are pushed in gerrit against ovirt-host-deploy.
Please update the status of this bug setting the correct product and adding references to the patches being pushed for review.

Comment 6 Yanir Quinn 2018-03-07 14:09:16 UTC
(In reply to Sandro Bonazzola from comment #5)
> Moving back to ASSIGNED since this patch is on ovirt-host-deploy but no
> patches are attaced to this bug and no open patches are pushed in gerrit
> against ovirt-host-deploy.
> Please update the status of this bug setting the correct product and adding
> references to the patches being pushed for review.

Its tricky,
Product should be either ovirt-engine/ovirt-hosted-engine-ha/ovirt-hosted-engine-setup.

I have patches for ovirt-hosted-engine-ha and ovirt-hosted-engine-setup.

And for this bug (and other bugs) we also need the ovirt-engine patch here as well:
https://gerrit.ovirt.org/#/c/86645/

Comment 7 SATHEESARAN 2018-05-03 17:59:02 UTC
Tested with ovirt-hosted-engine-setup-2.2.19

While updating the host from previous nightly build to other, the state of the node is back to UP.

Comment 8 Sandro Bonazzola 2018-05-04 10:47:40 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.