Bug 1353600 - hosted-engine-host maintenance mode is not attached engine maintenance status
Summary: hosted-engine-host maintenance mode is not attached engine maintenance status
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-host-deploy
Classification: oVirt
Component: Plugins.Hosted-Engine
Version: 1.5.1
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ovirt-4.0.5
: 1.5.3
Assignee: Jenny Tokar
QA Contact: Nikolai Sednev
URL:
Whiteboard: sla
Depends On:
Blocks: 1379992
TreeView+ depends on / blocked
 
Reported: 2016-07-07 14:24 UTC by Nikolai Sednev
Modified: 2017-07-17 05:28 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When adding a host that was put in local maintenance before to the engine the host would remain in local maintenance even though it should have been active. Consequence: The host would appear as up in the engine but would still be in local maintenance mode and would appear as if it has no hosted engine abilities. Fix: When deploying hosted engine host from the ui the maintenance mode will always be set to none. Result: The hosted engine host maintenance modes are aligned both in the engine and in the host itself.
Clone Of:
Environment:
Last Closed: 2017-07-17 05:28:08 UTC
oVirt Team: SLA
Embargoed:
knarra: needinfo-
rule-engine: ovirt-4.0.z+
ylavi: planning_ack+
rgolan: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
sosreport from the engine (18.93 MB, application/x-xz)
2016-07-07 14:31 UTC, Nikolai Sednev
no flags Details
sosreport from host that being added (alma03) (7.37 MB, application/x-xz)
2016-07-07 14:32 UTC, Nikolai Sednev
no flags Details
screencast with reproduction of a working fix (13.57 MB, application/octet-stream)
2017-07-06 13:58 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1370907 0 high CLOSED When upgrading hosted engine host from the ui the host remains in maintenance mode 2021-02-22 00:41:40 UTC
oVirt gerrit 62237 0 master MERGED hosted-engine: set local maintenance to false when installing host 2020-02-22 12:12:12 UTC
oVirt gerrit 63384 0 ovirt-host-deploy-1.5 ABANDONED hosted-engine: set local maintenance to false when installing host 2020-02-22 12:12:12 UTC
oVirt gerrit 64202 0 master MERGED Revert "hosted-engine: set local maintenance to false when installing host" 2020-02-22 12:12:12 UTC
oVirt gerrit 64208 0 master MERGED hosted-engine: set local maintenance to false when installing host 2020-02-22 12:12:12 UTC
oVirt gerrit 64213 0 ovirt-host-deploy-1.5 MERGED hosted-engine: set local maintenance to false when installing host 2020-02-22 12:12:12 UTC

Internal Links: 1370907

Description Nikolai Sednev 2016-07-07 14:24:50 UTC
Description of problem:
Re-deployed via WEBUI and REST-API hosted-engine-host being added in local maintenance. 

I've tried to set host to maintenance in WEBUI and then return it back to active and this is working just fine, "/var/lib/ovirt-hosted-engine-ha/ha.conf" being set to "local_maintenance=False", once host is activated back in WEBUI.

Working work-around is to set "hosted-engine --set-maintenance --mode=none"
manually, via CLI.

Version-Release number of selected component (if applicable):
Host:
ovirt-vmconsole-host-1.0.3-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.5.x86_64
ovirt-host-deploy-1.5.0-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.17.x86_64
mom-0.5.5-1.el7ev.noarch
ovirt-vmconsole-1.0.3-1.el7ev.noarch
ovirt-imageio-common-0.3.0-0.el7ev.noarch
vdsm-4.18.5.1-1.el7ev.x86_64
rhev-release-4.0.1-1-001.noarch
sanlock-3.2.4-2.el7_2.x86_64
ovirt-imageio-daemon-0.3.0-0.el7ev.noarch
Linux version 3.10.0-327.28.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Jun 27 14:48:28 EDT 2016
Linux 3.10.0-327.28.2.el7.x86_64 #1 SMP Mon Jun 27 14:48:28 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Engine:
rhevm-doc-4.0.0-2.el7ev.noarch
rhevm-setup-plugins-4.0.0.1-1.el7ev.noarch
rhevm-spice-client-x64-msi-4.0-2.el7ev.noarch
rhevm-4.0.2-0.2.rc1.el7ev.noarch
rhev-release-4.0.0-19-001.noarch
rhev-release-4.0.1-1-001.noarch
rhevm-guest-agent-common-1.0.12-2.el7ev.noarch
rhevm-dependencies-4.0.0-1.el7ev.noarch
rhevm-branding-rhev-4.0.0-2.el7ev.noarch
rhevm-spice-client-x86-msi-4.0-2.el7ev.noarch
rhev-guest-tools-iso-4.0-2.el7ev.noarch
Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016
Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)


How reproducible:
100%

Steps to Reproduce:
1.Deploy HE over NFS with two hosted-engine-hosts.
2.Set one of the hosts in to maintenance via WEBUI.
3.Remove the host that was set to maintenance via WEBUI.
4)Add the removed host as hosted-engine-host via WEBUI or REST-API.

Actual results:
In WEBUI host becomes active, but not as hosted-engine-host, because its "/var/lib/ovirt-hosted-engine-ha/ha.conf" appears with "local_maintenance=True" and "hosted-engine --vm-status Local maintenance: True".


Expected results:
Once re-added/re-deployed hosted-engine-host via WEBUI becomes active in WEBUI, it should also have "/var/lib/ovirt-hosted-engine-ha/ha.conf" set to "local_maintenance=False".

Additional info:

Comment 1 Nikolai Sednev 2016-07-07 14:31:15 UTC
Created attachment 1177349 [details]
sosreport from the engine

Comment 2 Nikolai Sednev 2016-07-07 14:32:38 UTC
Created attachment 1177351 [details]
sosreport from host that being added (alma03)

Comment 3 Simone Tiraboschi 2016-08-24 11:15:24 UTC
It seams that it also affects host upgrade using upgrade manager: at the end the host is in local maintenance mode.

Comment 4 Jenny Tokar 2016-08-24 11:50:01 UTC
(In reply to Simone Tiraboschi from comment #3)
> It seams that it also affects host upgrade using upgrade manager: at the end
> the host is in local maintenance mode.

Do you mean using upgrade from the ui? Or something else?

Comment 5 Yaniv Kaul 2016-08-24 17:31:30 UTC
(In reply to Simone Tiraboschi from comment #3)
> It seams that it also affects host upgrade using upgrade manager: at the end
> the host is in local maintenance mode.

So do we need this for 4.0.4?

Comment 6 Simone Tiraboschi 2016-08-25 08:03:21 UTC
(In reply to Jenny Tokar from comment #4)

> Do you mean using upgrade from the ui? Or something else?

Yes, starting the host upgrade from the GUI.
I'm thinking that an approach like https://gerrit.ovirt.org/#/c/62237/ that acts on host-deploy is not the right path since it doesn't cover all the scenarios.

The issue is that when the engine sets an host into maintenance mode, the engine will also set the hosted-engine local maintenance mode but when the engine terminates the maintenance mode of an host, it will not terminate also the hosted-engine local maintenance mode.
This happens also during the host upgrade triggered by the engine as for comment 3.
I think that simply syncing the engine maintenance with hosted-engine maintenance mode is more robust solution.


(In reply to Yaniv Kaul from comment #5)
> So do we need this for 4.0.4?

I think so since due to this bug the user has to connect to the host and manually stop hosted-engine local maintenance mode via cli and this is pretty annoying.

Comment 7 Jenny Tokar 2016-08-25 08:37:55 UTC
(In reply to Simone Tiraboschi from comment #6)
> (In reply to Jenny Tokar from comment #4)
> 
> > Do you mean using upgrade from the ui? Or something else?
> 
> Yes, starting the host upgrade from the GUI.
> I'm thinking that an approach like https://gerrit.ovirt.org/#/c/62237/ that
> acts on host-deploy is not the right path since it doesn't cover all the
> scenarios.
> 
> The issue is that when the engine sets an host into maintenance mode, the
> engine will also set the hosted-engine local maintenance mode but when the
> engine terminates the maintenance mode of an host, it will not terminate
> also the hosted-engine local maintenance mode.
> This happens also during the host upgrade triggered by the engine as for
> comment 3.
> I think that simply syncing the engine maintenance with hosted-engine
> maintenance mode is more robust solution.

I agree it's a more robust solution, however I don't think it's that simple. When deploying a hosted engine host the engine doesn't know about it's hosted engine capabilities until the vds statistics are retrieved for the first time. By that time you don't know how to sync it, if the host is in "up" status in the engine but is actually in "local maintenance" how would you know if it should be activated (after removing and deploying) or actually set to maintenance in the engine (if the user decided to set it to local maintenance manually)?
 
The issue with the upgrade is slightly different since the engine already knows the host is a hosted engine host and can easily send the command to remove the host from local maintenance mode just like it was able to send the command to set it to local maintenance.

Comment 8 Roy Golan 2016-08-28 08:55:22 UTC
(In reply to Jenny Tokar from comment #7)

Agree with Jenny here and I want to add that on top of it being non-trivial task, by not syncing those activity modes (engine and HA) you get the flexibility of putting one into maintenance while having the other active.

Comment 9 Sandro Bonazzola 2016-09-01 08:20:06 UTC
Doron, according to comment #6 this is a nice to have in 4.0.4.
This bug is currently targeted 4.0.6, plans to re-target?

Comment 10 SATHEESARAN 2016-09-09 07:41:28 UTC
For your information,
I have also seen a case where post updating the host(RHEL 7.2 node in HC setup) from UI, the host becomes active, but still in maintenance state.

I have raised a bug[1] for that issue

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1374593

Comment 11 Simone Tiraboschi 2016-09-09 15:53:35 UTC
(In reply to SATHEESARAN from comment #10)
> For your information,
> I have also seen a case where post updating the host(RHEL 7.2 node in HC
> setup) from UI, the host becomes active, but still in maintenance state.
> 
> I have raised a bug[1] for that issue
> 
> [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1374593

Closing it as duplicate of 1370907

Comment 12 Doron Fediuck 2016-09-15 13:05:21 UTC
(In reply to Sandro Bonazzola from comment #9)
> Doron, according to comment #6 this is a nice to have in 4.0.4.
> This bug is currently targeted 4.0.6, plans to re-target?

We can do this for 4.0.5 if it's ready. Too late for 4.0.4.

Comment 13 Nikolai Sednev 2016-11-02 08:34:07 UTC
After following after reproduction steps, I see that:
# cat /var/lib/ovirt-hosted-engine-ha/ha.conf
local_maintenance=False

# hosted-engine --vm-status
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=58480 (Wed Nov  2 10:31:12 2016)
        host-id=2
        score=3400
        maintenance=False
        state=EngineDown
        stopped=False

Works for me on these components on hosts:
rhev-release-4.0.5-5-001.noarch
sanlock-3.2.4-3.el7_2.x86_64
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
vdsm-4.18.15.2-1.el7ev.x86_64
libvirt-client-1.2.17-13.el7_2.6.x86_64
ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch
ovirt-imageio-common-0.3.0-0.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.23.x86_64
ovirt-hosted-engine-setup-2.0.3-2.el7ev.noarch
ovirt-host-deploy-1.5.3-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-daemon-0.4.0-0.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
rhevm-appliance-20160922.0-1.el7ev.noarch
Linux version 3.10.0-327.36.3.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Oct 20 04:56:07 EDT 2016
Linux alma03.qa.lab.tlv.redhat.com 3.10.0-327.36.3.el7.x86_64 #1 SMP Thu Oct 20 04:56:07 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Components on engine:
ovirt-engine-dwh-4.0.5-1.el7ev.noarch
ovirt-engine-dwh-setup-4.0.5-1.el7ev.noarch
ovirt-vmconsole-proxy-1.0.4-1.el7ev.noarch
eap7-wildfly-web-console-eap-2.8.27-1.Final_redhat_1.1.ep7.el7.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-engine-vmconsole-proxy-helper-4.0.5.4-0.1.el7ev.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.0.5.4-0.1.el7ev.noarch
qemu-guest-agent-2.3.0-4.el7.x86_64
rhevm-guest-agent-common-1.0.12-3.el7ev.noarch
rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch
rhevm-branding-rhev-4.0.0-5.el7ev.noarch
rhevm-dependencies-4.0.0-1.el7ev.noarch
rhev-release-4.0.5-5-001.noarch
rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch
rhevm-4.0.5.4-0.1.el7ev.noarch
rhevm-guest-agent-common-1.0.12-3.el7ev.noarch
rhevm-setup-plugins-4.0.0.3-1.el7ev.noarch
rhev-guest-tools-iso-4.0-6.el7ev.noarch
rhevm-doc-4.0.5-1.el7ev.noarch
Linux version 3.10.0-327.36.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Wed Aug 17 03:02:37 EDT 2016
Linux nsednev-he-1.qa.lab.tlv.redhat.com 3.10.0-327.36.1.el7.x86_64 #1 SMP Wed Aug 17 03:02:37 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Comment 14 RamaKasturi 2017-07-06 13:06:23 UTC
Re-opening this bug because i see that the issue happens again with 4.1.3.

1) Have RHV-H 4.1.2 async build.
2) Upgrade engine to 4.1.3
3) Now upgrade RHV-H nodes to the latest bits.
4) Once upgrade finishes i see that node is still in localmaintenance.
5) Node which was upgraded was yarrow.lab.eng.blr.redhat.com

Output of hosted-engine --vm-status before upgrade:
=======================================================

[root@yarrow ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : yarrow.lab.eng.blr.redhat.com
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : b4359588
local_conf_timestamp               : 75583
Host timestamp                     : 75567
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=75567 (Thu Jul  6 15:09:26 2017)
	host-id=1
	score=3400
	vm_conf_refresh_time=75583 (Thu Jul  6 15:09:42 2017)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineUp
	stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : tettnang.lab.eng.blr.redhat.com
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 1800
stopped                            : False
Local maintenance                  : False
crc32                              : 7bfbbfd5
local_conf_timestamp               : 1440
Host timestamp                     : 1423
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=1423 (Thu Jul  6 15:09:07 2017)
	host-id=2
	score=1800
	vm_conf_refresh_time=1440 (Thu Jul  6 15:09:23 2017)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineDown
	stopped=False


--== Host 3 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : zod.lab.eng.blr.redhat.com
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 7caabb48
local_conf_timestamp               : 75597
Host timestamp                     : 75581
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=75581 (Thu Jul  6 15:09:23 2017)
	host-id=3
	score=3400
	vm_conf_refresh_time=75597 (Thu Jul  6 15:09:39 2017)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineDown
	stopped=False

Output of hosted-engine --vm-status after upgrade:
===================================================

[root@yarrow ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : yarrow.lab.eng.blr.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : bc34659d
local_conf_timestamp               : 7624
Host timestamp                     : 7608
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=7608 (Thu Jul  6 17:50:33 2017)
	host-id=1
	score=0
	vm_conf_refresh_time=7624 (Thu Jul  6 17:50:48 2017)
	conf_on_shared_storage=True
	maintenance=True
	state=LocalMaintenance
	stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : tettnang.lab.eng.blr.redhat.com
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 1800
stopped                            : False
Local maintenance                  : False
crc32                              : 521f80d4
local_conf_timestamp               : 11121
Host timestamp                     : 11105
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=11105 (Thu Jul  6 17:50:29 2017)
	host-id=2
	score=1800
	vm_conf_refresh_time=11121 (Thu Jul  6 17:50:45 2017)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineDown
	stopped=False


--== Host 3 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : zod.lab.eng.blr.redhat.com
Host ID                            : 3
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 77b3a2d6
local_conf_timestamp               : 85262
Host timestamp                     : 85246
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=85246 (Thu Jul  6 17:50:28 2017)
	host-id=3
	score=3400
	vm_conf_refresh_time=85262 (Thu Jul  6 17:50:44 2017)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineUp
	stopped=False


cat /var/lib/ovirt-hosted-engine-ha/ha.conf
local_maintenance=True

Comment 15 Red Hat Bugzilla Rules Engine 2017-07-06 13:06:32 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 16 Nikolai Sednev 2017-07-06 13:56:44 UTC
(In reply to RamaKasturi from comment #14)
> Re-opening this bug because i see that the issue happens again with 4.1.3.
> 
> 1) Have RHV-H 4.1.2 async build.
> 2) Upgrade engine to 4.1.3
> 3) Now upgrade RHV-H nodes to the latest bits.
> 4) Once upgrade finishes i see that node is still in localmaintenance.
> 5) Node which was upgraded was yarrow.lab.eng.blr.redhat.com
> 
> Output of hosted-engine --vm-status before upgrade:
> =======================================================
> 
> [root@yarrow ~]# hosted-engine --vm-status
> 
> 
> --== Host 1 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : yarrow.lab.eng.blr.redhat.com
> Host ID                            : 1
> Engine status                      : {"health": "good", "vm": "up",
> "detail": "up"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : b4359588
> local_conf_timestamp               : 75583
> Host timestamp                     : 75567
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=75567 (Thu Jul  6 15:09:26 2017)
> 	host-id=1
> 	score=3400
> 	vm_conf_refresh_time=75583 (Thu Jul  6 15:09:42 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineUp
> 	stopped=False
> 
> 
> --== Host 2 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : tettnang.lab.eng.blr.redhat.com
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 1800
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 7bfbbfd5
> local_conf_timestamp               : 1440
> Host timestamp                     : 1423
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=1423 (Thu Jul  6 15:09:07 2017)
> 	host-id=2
> 	score=1800
> 	vm_conf_refresh_time=1440 (Thu Jul  6 15:09:23 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> 
> --== Host 3 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : zod.lab.eng.blr.redhat.com
> Host ID                            : 3
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 7caabb48
> local_conf_timestamp               : 75597
> Host timestamp                     : 75581
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=75581 (Thu Jul  6 15:09:23 2017)
> 	host-id=3
> 	score=3400
> 	vm_conf_refresh_time=75597 (Thu Jul  6 15:09:39 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> Output of hosted-engine --vm-status after upgrade:
> ===================================================
> 
> [root@yarrow ~]# hosted-engine --vm-status
> 
> 
> --== Host 1 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : yarrow.lab.eng.blr.redhat.com
> Host ID                            : 1
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 0
> stopped                            : False
> Local maintenance                  : True
> crc32                              : bc34659d
> local_conf_timestamp               : 7624
> Host timestamp                     : 7608
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=7608 (Thu Jul  6 17:50:33 2017)
> 	host-id=1
> 	score=0
> 	vm_conf_refresh_time=7624 (Thu Jul  6 17:50:48 2017)
> 	conf_on_shared_storage=True
> 	maintenance=True
> 	state=LocalMaintenance
> 	stopped=False
> 
> 
> --== Host 2 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : tettnang.lab.eng.blr.redhat.com
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 1800
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 521f80d4
> local_conf_timestamp               : 11121
> Host timestamp                     : 11105
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=11105 (Thu Jul  6 17:50:29 2017)
> 	host-id=2
> 	score=1800
> 	vm_conf_refresh_time=11121 (Thu Jul  6 17:50:45 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineDown
> 	stopped=False
> 
> 
> --== Host 3 status ==--
> 
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : zod.lab.eng.blr.redhat.com
> Host ID                            : 3
> Engine status                      : {"health": "good", "vm": "up",
> "detail": "up"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 77b3a2d6
> local_conf_timestamp               : 85262
> Host timestamp                     : 85246
> Extra metadata (valid at timestamp):
> 	metadata_parse_version=1
> 	metadata_feature_version=1
> 	timestamp=85246 (Thu Jul  6 17:50:28 2017)
> 	host-id=3
> 	score=3400
> 	vm_conf_refresh_time=85262 (Thu Jul  6 17:50:44 2017)
> 	conf_on_shared_storage=True
> 	maintenance=False
> 	state=EngineUp
> 	stopped=False
> 
> 
> cat /var/lib/ovirt-hosted-engine-ha/ha.conf
> local_maintenance=True

This is a totally different reproduction flow and it is RHEVH specific and has nothing in common with what I've initially reported.
Please open a different bug to keep these two thing separated and lets not mix two different flows within the same bug.

Moving back to verified as this is working for me on latest components as follows:
1.Deploy HE over NFS with two hosted-engine-hosts.
2.Set one of the hosts in to maintenance via WEBUI.
3.Remove the host that was set to maintenance via WEBUI.
4)Add the removed host as hosted-engine-host via WEBUI or REST-API.

Components on engine:
rhev-guest-tools-iso-4.1-5.el7ev.noarch
rhevm-dependencies-4.1.1-1.el7ev.noarch
rhevm-doc-4.1.3-1.el7ev.noarch
rhevm-branding-rhev-4.1.0-2.el7ev.noarch
rhevm-4.1.3.5-0.1.el7.noarch
rhevm-setup-plugins-4.1.2-1.el7ev.noarch
Linux version 3.10.0-514.21.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sun May 28 17:08:21 EDT 2017
Linux 3.10.0-514.21.2.el7.x86_64 #1 SMP Sun May 28 17:08:21 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.4 (Maipo)

Host:
qemu-kvm-rhev-2.9.0-14.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.9-1.el7ev.noarch
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
ovirt-setup-lib-1.1.3-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
vdsm-4.19.20-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.1.4-1.el7ev.noarch
libvirt-client-3.2.0-14.el7.x86_64
ovirt-hosted-engine-setup-2.1.3.3-1.el7ev.noarch
sanlock-3.5.0-1.el7.x86_64
ovirt-host-deploy-1.6.6-1.el7ev.noarch
Linux version 3.10.0-691.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jun 29 10:30:04 EDT 2017
Linux 3.10.0-691.el7.x86_64 #1 SMP Thu Jun 29 10:30:04 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.4 (Maipo)

Screencast being attached with reproduction that is working for me just fine.

Comment 17 Nikolai Sednev 2017-07-06 13:58:40 UTC
Created attachment 1294974 [details]
screencast with reproduction of a working fix


Note You need to log in before you can comment on or make changes to this bug.