1560574 – [downstream clone - 4.1.11] HE host is not taken out of Local Maintenance after reinstall or upgrade

Bug 1560574 - [downstream clone - 4.1.11] HE host is not taken out of Local Maintenance after reinstall or upgrade

Summary: [downstream clone - 4.1.11] HE host is not taken out of Local Maintenance aft...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.1.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.1.11
Target Release:	---
Assignee:	Ravi Nori
QA Contact:	Nikolai Sednev
Docs Contact:
URL:
Whiteboard:
Depends On:	1489982
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-26 13:18 UTC by RHV bug bot
Modified:	2021-09-09 13:31 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1489982
Environment:
Last Closed:	2018-04-24 15:30:28 UTC
oVirt Team:	Infra
Target Upstream Version:
Embargoed:
Flags:	lsvaty: testing_plan_complete-

Attachments	(Terms of Use)
Screenshot from 2018-04-12 19-32-32.png (164.25 KB, image/png) 2018-04-12 16:32 UTC, Nikolai Sednev	no flags	Details
sosreport from alma04 (10.36 MB, application/x-xz) 2018-04-12 16:43 UTC, Nikolai Sednev	no flags	Details
engine logs (9.24 MB, application/x-xz) 2018-04-12 16:44 UTC, Nikolai Sednev	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack gerrit	89381	None	None	None	2018-03-26 13:32:50 UTC
Red Hat Product Errata	RHBA-2018:1219	None	None	None	2018-04-24 15:31:17 UTC
oVirt gerrit	86645	None	None	None	2018-03-26 13:32:18 UTC
oVirt gerrit	88214	master	ABANDONED	engine : HE host is not taken out of Local Maintenance after reinstall or upgrade	2018-03-26 13:30:34 UTC
oVirt gerrit	89456	ovirt-engine-4.1	MERGED	core: honor ServerRebootTimeout during host reboot	2018-03-27 07:02:32 UTC

Description RHV bug bot 2018-03-26 13:18:28 UTC

+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1489982 +++
======================================================================

Description of problem:
HE host is not taken out of Local Maintenance after Reinstall. Which is incorrect, since it was put into HE Local maintenance when we enabled regular maintenance for it in UI. So once it is activated back, HE local maintenance should be canceled as well.

Version-Release number of selected component (if applicable):
4.1.4


Steps to Reproduce:
1. Put host in maintenance.
2. Select Reinstall option in UI and wait till reinstall is performed and the host is active back in UI.


Actual results:
Host is still in local HE maintenance and requires manual intervention from the command line to disable the HE maintenance.

Expected results:
Host should be fully operational once it is activated.
Or, if impossible, we should at least provide a UI option to disable local HE maintenance.

(Originally by Marina Kalinin)

Comment 1 RHV bug bot 2018-03-26 13:18:40 UTC

I think, it actually should be high.
The end user would expect the host to be out of HE maintenance. And if it does not go back out of maintenance automatically, without informing the user, it is not right flow.

(Originally by Marina Kalinin)

Comment 6 RHV bug bot 2018-03-26 13:19:08 UTC

*** Bug 1501016 has been marked as a duplicate of this bug. ***

(Originally by Sandro Bonazzola)

Comment 13 RHV bug bot 2018-03-26 13:19:52 UTC

I am unable to reproduce this on master and 4.2 trying 4.1

(Originally by Ravi Shankar Nori)

Comment 14 RHV bug bot 2018-03-26 13:19:59 UTC

Works in ovirt-engine-backend-4.1.9.1-1.el7.centos.noarch too.

Tried with both node-ng and vdsm-4.19.45-1.el7.centos.x86_64 on cent os 7. The host is activated after reinstall.

Please check with latest 4.1 build

(Originally by Ravi Shankar Nori)

Comment 16 RHV bug bot 2018-03-26 13:20:13 UTC

Created attachment 1388412 [details]
Activate hosts python script

(Originally by Ravi Shankar Nori)

Comment 17 RHV bug bot 2018-03-26 13:20:20 UTC

(In reply to Ravi Nori from comment #13)
> Works in ovirt-engine-backend-4.1.9.1-1.el7.centos.noarch too.
> 
> Tried with both node-ng and vdsm-4.19.45-1.el7.centos.x86_64 on cent os 7.
> The host is activated after reinstall.
> 
> Please check with latest 4.1 build

Nori, if you tested it on 4.1.9 and it didn't reproduce for you, i.e. after reinstall HE Local maintenance was disabled on the host - let's close it as if it works in 4.1.9.

(Originally by Marina Kalinin)

Comment 18 RHV bug bot 2018-03-26 13:20:28 UTC

Nikolai, maybe you can help Nori verifying this bug? Thank you!

(Originally by Marina Kalinin)

Comment 19 RHV bug bot 2018-03-26 13:20:35 UTC

That is fixed in the latest version of 4.2 beta

(Originally by Laurent Domb)

Comment 20 RHV bug bot 2018-03-26 13:20:43 UTC

The operation works just fine on 4.2.1.5-0.1.el7.
rhvm-appliance-4.2-20180202.0.el7.noarch
ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.9-1.el7ev.noarch
Linux 3.10.0-693.17.1.el7.x86_64 #1 SMP Sun Jan 14 10:36:03 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

(Originally by Nikolai Sednev)

Comment 21 RHV bug bot 2018-03-26 13:20:49 UTC

Reopening to backport fix to 4.1.10.

(Originally by ylavi)

Comment 22 RHV bug bot 2018-03-26 13:20:56 UTC

(In reply to Yaniv Lavi from comment #20)
> Reopening to backport fix to 4.1.10.

What do you want to backport? According to Comment 13 it works fine in 4.1.9

(Originally by Martin Perina)

Comment 23 RHV bug bot 2018-03-26 13:21:03 UTC

Added to 4.1.10 errata and moving to ON_QA. Nikolai, could you please verify, that every flow works as expected in 4.1.10 and we haven't missed anything?

(Originally by Martin Perina)

Comment 24 RHV bug bot 2018-03-26 13:21:11 UTC

INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops

(Originally by rhv-bugzilla-bot)

Comment 25 RHV bug bot 2018-03-26 13:21:18 UTC

INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops

(Originally by rhv-bugzilla-bot)

Comment 26 RHV bug bot 2018-03-26 13:21:25 UTC

(In reply to Martin Perina from comment #22)
> Added to 4.1.10 errata and moving to ON_QA. Nikolai, could you please
> verify, that every flow works as expected in 4.1.10 and we haven't missed
> anything?

Could you please define required flows?

(Originally by Nikolai Sednev)

Comment 27 RHV bug bot 2018-03-26 13:21:33 UTC

Original issue is still being reproduced on latest 4.1.10.1-0.1.el7

Reproduction steps:
1.Deployed rhevm-4.1.9.1-0.1.el7.noarch on pair of 4.1.9 ha-hosts, engine was running on RHEL7.4, hosts on RHEL7.5.
2.Set global maintenance via UI.
3."yum update -y ovirt-engine-setup" to rhevm-4.1.10.1-0.1.el7.noarch.
4.Upgraded the engine to rhevm-4.1.10.1-0.1.el7.noarch using "engine-setup".
5."yum update -y" on engine to get RHEL7.4 updated to RHEL7.5.
6.Rebooted the engine from the engine.
7.Started engine from host using "hosted-engine --vm-start".
8.Removed global maintenance from ha-hosts.
9.Logged in to the engine's UI and set one of two hosts *alma03, the first host that was not hosting SHE-VM and it was not SPM) in to maintenance and then reinstalled it, after reinstall, host recovered and got automatically activated.
10.Reinstalled ha-host became in local maintenance in CLI, and in UI it was appeared as "Unavailable due to HA score".

See result in CLI:
alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : bb19601a
local_conf_timestamp               : 9806
Host timestamp                     : 9804
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=9804 (Sun Feb 25 19:03:13 2018)
        host-id=1
        score=0
        vm_conf_refresh_time=9806 (Sun Feb 25 19:03:15 2018)
        conf_on_shared_storage=True
        maintenance=True
        state=LocalMaintenance
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 915c08da
local_conf_timestamp               : 9769
Host timestamp                     : 9767
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=9767 (Sun Feb 25 19:03:19 2018)
        host-id=2
        score=3400
        vm_conf_refresh_time=9769 (Sun Feb 25 19:03:21 2018)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False

Print screen from UI and sosreports from both hosts and the engine are attached.

Moving back to assigned.

(Originally by Nikolai Sednev)

Comment 28 RHV bug bot 2018-03-26 13:21:42 UTC

Created attachment 1400614 [details]
Screenshot from 2018-02-25 19-07-06.png

(Originally by Nikolai Sednev)

Comment 29 RHV bug bot 2018-03-26 13:21:48 UTC

Created attachment 1400615 [details]
engine logs

(Originally by Nikolai Sednev)

Comment 30 RHV bug bot 2018-03-26 13:21:55 UTC

Created attachment 1400616 [details]
alma03 in local maintenance

(Originally by Nikolai Sednev)

Comment 31 RHV bug bot 2018-03-26 13:22:03 UTC

Created attachment 1400617 [details]
alma04 logs

(Originally by Nikolai Sednev)

Comment 32 RHV bug bot 2018-03-26 13:22:10 UTC

To enable alma03, I manually hade to cast "hosted-engine --set-maintenance --mode=none" from CLI.
See also attached screencast.

(Originally by Nikolai Sednev)

Comment 33 RHV bug bot 2018-03-26 13:22:16 UTC

Created attachment 1400618 [details]
screencast

(Originally by Nikolai Sednev)

Comment 34 RHV bug bot 2018-03-26 13:22:23 UTC

*** Bug 1536286 has been marked as a duplicate of this bug. ***

(Originally by ylavi)

Comment 35 RHV bug bot 2018-03-26 13:22:29 UTC

Ravi, are you looking into this?

(Originally by Yaniv Kaul)

Comment 36 RHV bug bot 2018-03-26 13:22:36 UTC

I was able to reproduce the issue on 4.1.9. The patch https://gerrit.ovirt.org/#/c/86645/ for BZ 1532709 fixes the issue and has not been merged.

(Originally by Ravi Shankar Nori)

Comment 38 RHV bug bot 2018-03-26 13:22:48 UTC

This is not going to make it to 4.1.10 - please re-target.

(Originally by Yaniv Kaul)

Comment 40 Nikolai Sednev 2018-04-12 16:24:49 UTC

In CLI host reports its status correctly:
--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : False
Hostname                           : alma04
Host ID                            : 2
Engine status                      : unknown stale-data
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 585c8d69
local_conf_timestamp               : 12922
Host timestamp                     : 13079
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=13079 (Thu Apr 12 19:11:47 2018)
        host-id=2
        score=3400
        vm_conf_refresh_time=12922 (Thu Apr 12 19:09:10 2018)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False

Although in WEBUI I clearly see that host has "Hosted Engine HA:
Not Active" instead of 3400 and it appears up, although not as ha-capable-host.

Moving back to assigned.

Tested on these components:
ovirt-hosted-engine-setup-2.1.4.2-1.el7ev.noarch
ovirt-hosted-engine-ha-2.1.11-1.el7ev.noarch
rhvm-appliance-4.1.20180125.0-1.el7.noarch
Red Hat Enterprise Linux Server release 7.5 (Maipo)
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

Comment 41 Nikolai Sednev 2018-04-12 16:31:35 UTC

After some time, host appears in UI as: 
Hosted Engine HA:
Local Maintenance Enabled

In CLI host also appears as in local maintenance:
--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : 4f42e83e
local_conf_timestamp               : 14048
Host timestamp                     : 14206
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=14206 (Thu Apr 12 19:30:33 2018)
        host-id=2
        score=0
        vm_conf_refresh_time=14048 (Thu Apr 12 19:27:55 2018)
        conf_on_shared_storage=True
        maintenance=True
        state=LocalMaintenance
        stopped=False

Comment 42 Nikolai Sednev 2018-04-12 16:32:59 UTC

Created attachment 1420937 [details]
Screenshot from 2018-04-12 19-32-32.png

Comment 43 Nikolai Sednev 2018-04-12 16:43:22 UTC

Created attachment 1420951 [details]
sosreport from alma04

Comment 44 Nikolai Sednev 2018-04-12 16:44:21 UTC

Created attachment 1420952 [details]
engine logs

Comment 45 Martin Perina 2018-04-13 08:49:55 UTC

(In reply to RHV Bugzilla Automation and Verification Bot from comment #27)
> Original issue is still being reproduced on latest 4.1.10.1-0.1.el7
> 
> Reproduction steps:
> 1.Deployed rhevm-4.1.9.1-0.1.el7.noarch on pair of 4.1.9 ha-hosts, engine
> was running on RHEL7.4, hosts on RHEL7.5.
> 2.Set global maintenance via UI.
> 3."yum update -y ovirt-engine-setup" to rhevm-4.1.10.1-0.1.el7.noarch.
> 4.Upgraded the engine to rhevm-4.1.10.1-0.1.el7.noarch using "engine-setup".

As mentioned in the Target Milestone this fix is included in 4.1.11, please retest with correct version

Comment 46 Nikolai Sednev 2018-04-14 15:03:08 UTC

Works for me on these components:
Host:
ovirt-hosted-engine-setup-2.1.4.2-1.el7ev.noarch
ovirt-hosted-engine-ha-2.1.11-1.el7ev.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Engine:
ovirt-engine-4.1.11.1-0.1.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 48 Nikolai Sednev 2018-04-22 14:35:57 UTC

Moving back forth to https://bugzilla.redhat.com/show_bug.cgi?id=1560574#c46.

Comment 52 errata-xmlrpc 2018-04-24 15:30:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1219

Comment 53 Franta Kust 2019-05-16 13:09:23 UTC

BZ<2>Jira Resync

Comment 54 Daniel Gur 2019-08-28 13:15:10 UTC

sync2jira

Comment 55 Daniel Gur 2019-08-28 13:20:12 UTC

sync2jira

Note You need to log in before you can comment on or make changes to this bug.