Bug 1489982 - HE host is not taken out of Local Maintenance after reinstall or upgrade
Summary: HE host is not taken out of Local Maintenance after reinstall or upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.1.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.2.3
: ---
Assignee: Ravi Nori
QA Contact: Nikolai Sednev
URL:
Whiteboard:
: 1501016 1536286 (view as bug list)
Depends On: 1469143
Blocks: CEECIR_RHV43_proposed 1536286 1540310 1560574
TreeView+ depends on / blocked
 
Reported: 2017-09-08 21:42 UTC by Marina Kalinin
Modified: 2019-08-28 13:19 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1560574 (view as bug list)
Environment:
Last Closed: 2018-05-15 17:43:37 UTC
oVirt Team: Infra
Target Upstream Version:


Attachments (Terms of Use)
Activate hosts python script (882 bytes, text/plain)
2018-01-30 14:01 UTC, Ravi Nori
no flags Details
Screenshot from 2018-02-25 19-07-06.png (29.96 KB, image/png)
2018-02-25 17:07 UTC, Nikolai Sednev
no flags Details
engine logs (9.61 MB, application/x-xz)
2018-02-25 17:14 UTC, Nikolai Sednev
no flags Details
alma03 in local maintenance (10.55 MB, application/x-xz)
2018-02-25 17:16 UTC, Nikolai Sednev
no flags Details
alma04 logs (10.20 MB, application/x-xz)
2018-02-25 17:17 UTC, Nikolai Sednev
no flags Details
screencast (9.79 MB, application/octet-stream)
2018-02-25 17:24 UTC, Nikolai Sednev
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:1488 None None None 2018-05-15 17:45:35 UTC
oVirt gerrit 86645 'None' MERGED core: honor ServerRebootTimeout during host reboot 2020-06-10 11:46:01 UTC
oVirt gerrit 88214 'None' ABANDONED engine : HE host is not taken out of Local Maintenance after reinstall or upgrade 2020-06-10 11:46:01 UTC
oVirt gerrit 89381 'None' MERGED core: honor ServerRebootTimeout during host reboot 2020-06-10 11:46:01 UTC

Description Marina Kalinin 2017-09-08 21:42:08 UTC
Description of problem:
HE host is not taken out of Local Maintenance after Reinstall. Which is incorrect, since it was put into HE Local maintenance when we enabled regular maintenance for it in UI. So once it is activated back, HE local maintenance should be canceled as well.

Version-Release number of selected component (if applicable):
4.1.4


Steps to Reproduce:
1. Put host in maintenance.
2. Select Reinstall option in UI and wait till reinstall is performed and the host is active back in UI.


Actual results:
Host is still in local HE maintenance and requires manual intervention from the command line to disable the HE maintenance.

Expected results:
Host should be fully operational once it is activated.
Or, if impossible, we should at least provide a UI option to disable local HE maintenance.

Comment 1 Marina Kalinin 2017-09-12 17:34:03 UTC
I think, it actually should be high.
The end user would expect the host to be out of HE maintenance. And if it does not go back out of maintenance automatically, without informing the user, it is not right flow.

Comment 5 Sandro Bonazzola 2017-10-12 05:44:41 UTC
*** Bug 1501016 has been marked as a duplicate of this bug. ***

Comment 12 Ravi Nori 2018-01-23 19:37:24 UTC
I am unable to reproduce this on master and 4.2 trying 4.1

Comment 13 Ravi Nori 2018-01-24 15:52:04 UTC
Works in ovirt-engine-backend-4.1.9.1-1.el7.centos.noarch too.

Tried with both node-ng and vdsm-4.19.45-1.el7.centos.x86_64 on cent os 7. The host is activated after reinstall.

Please check with latest 4.1 build

Comment 15 Ravi Nori 2018-01-30 14:01:16 UTC
Created attachment 1388412 [details]
Activate hosts python script

Comment 16 Marina Kalinin 2018-01-30 18:35:15 UTC
(In reply to Ravi Nori from comment #13)
> Works in ovirt-engine-backend-4.1.9.1-1.el7.centos.noarch too.
> 
> Tried with both node-ng and vdsm-4.19.45-1.el7.centos.x86_64 on cent os 7.
> The host is activated after reinstall.
> 
> Please check with latest 4.1 build

Nori, if you tested it on 4.1.9 and it didn't reproduce for you, i.e. after reinstall HE Local maintenance was disabled on the host - let's close it as if it works in 4.1.9.

Comment 17 Marina Kalinin 2018-01-30 18:36:02 UTC
Nikolai, maybe you can help Nori verifying this bug? Thank you!

Comment 18 ldomb 2018-01-30 19:18:45 UTC
That is fixed in the latest version of 4.2 beta

Comment 19 Nikolai Sednev 2018-02-12 11:47:42 UTC
The operation works just fine on 4.2.1.5-0.1.el7.
rhvm-appliance-4.2-20180202.0.el7.noarch
ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.9-1.el7ev.noarch
Linux 3.10.0-693.17.1.el7.x86_64 #1 SMP Sun Jan 14 10:36:03 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

Comment 20 Yaniv Lavi 2018-02-21 13:43:30 UTC
Reopening to backport fix to 4.1.10.

Comment 21 Martin Perina 2018-02-21 14:36:52 UTC
(In reply to Yaniv Lavi from comment #20)
> Reopening to backport fix to 4.1.10.

What do you want to backport? According to Comment 13 it works fine in 4.1.9

Comment 22 Martin Perina 2018-02-22 07:28:18 UTC
Added to 4.1.10 errata and moving to ON_QA. Nikolai, could you please verify, that every flow works as expected in 4.1.10 and we haven't missed anything?

Comment 23 RHV bug bot 2018-02-22 16:02:04 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops@redhat.com

Comment 24 RHV bug bot 2018-02-22 16:07:46 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops@redhat.com

Comment 25 Nikolai Sednev 2018-02-25 13:03:43 UTC
(In reply to Martin Perina from comment #22)
> Added to 4.1.10 errata and moving to ON_QA. Nikolai, could you please
> verify, that every flow works as expected in 4.1.10 and we haven't missed
> anything?

Could you please define required flows?

Comment 26 Nikolai Sednev 2018-02-25 17:07:19 UTC
Original issue is still being reproduced on latest 4.1.10.1-0.1.el7

Reproduction steps:
1.Deployed rhevm-4.1.9.1-0.1.el7.noarch on pair of 4.1.9 ha-hosts, engine was running on RHEL7.4, hosts on RHEL7.5.
2.Set global maintenance via UI.
3."yum update -y ovirt-engine-setup" to rhevm-4.1.10.1-0.1.el7.noarch.
4.Upgraded the engine to rhevm-4.1.10.1-0.1.el7.noarch using "engine-setup".
5."yum update -y" on engine to get RHEL7.4 updated to RHEL7.5.
6.Rebooted the engine from the engine.
7.Started engine from host using "hosted-engine --vm-start".
8.Removed global maintenance from ha-hosts.
9.Logged in to the engine's UI and set one of two hosts *alma03, the first host that was not hosting SHE-VM and it was not SPM) in to maintenance and then reinstalled it, after reinstall, host recovered and got automatically activated.
10.Reinstalled ha-host became in local maintenance in CLI, and in UI it was appeared as "Unavailable due to HA score".

See result in CLI:
alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : bb19601a
local_conf_timestamp               : 9806
Host timestamp                     : 9804
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=9804 (Sun Feb 25 19:03:13 2018)
        host-id=1
        score=0
        vm_conf_refresh_time=9806 (Sun Feb 25 19:03:15 2018)
        conf_on_shared_storage=True
        maintenance=True
        state=LocalMaintenance
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 915c08da
local_conf_timestamp               : 9769
Host timestamp                     : 9767
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=9767 (Sun Feb 25 19:03:19 2018)
        host-id=2
        score=3400
        vm_conf_refresh_time=9769 (Sun Feb 25 19:03:21 2018)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False

Print screen from UI and sosreports from both hosts and the engine are attached.

Moving back to assigned.

Comment 27 Nikolai Sednev 2018-02-25 17:07:49 UTC
Created attachment 1400614 [details]
Screenshot from 2018-02-25 19-07-06.png

Comment 28 Nikolai Sednev 2018-02-25 17:14:58 UTC
Created attachment 1400615 [details]
engine logs

Comment 29 Nikolai Sednev 2018-02-25 17:16:18 UTC
Created attachment 1400616 [details]
alma03 in local maintenance

Comment 30 Nikolai Sednev 2018-02-25 17:17:24 UTC
Created attachment 1400617 [details]
alma04 logs

Comment 31 Nikolai Sednev 2018-02-25 17:20:18 UTC
To enable alma03, I manually hade to cast "hosted-engine --set-maintenance --mode=none" from CLI.
See also attached screencast.

Comment 32 Nikolai Sednev 2018-02-25 17:24:41 UTC
Created attachment 1400618 [details]
screencast

Comment 33 Yaniv Lavi 2018-02-28 13:19:05 UTC
*** Bug 1536286 has been marked as a duplicate of this bug. ***

Comment 34 Yaniv Kaul 2018-03-14 11:12:14 UTC
Ravi, are you looking into this?

Comment 35 Ravi Nori 2018-03-14 13:13:15 UTC
I was able to reproduce the issue on 4.1.9. The patch https://gerrit.ovirt.org/#/c/86645/ for BZ 1532709 fixes the issue and has not been merged.

Comment 37 Yaniv Kaul 2018-03-19 13:54:19 UTC
This is not going to make it to 4.1.10 - please re-target.

Comment 39 Martin Perina 2018-04-03 10:46:42 UTC
Moving to MODIFIED as fix for BZ1532709 fixes also this issue

Comment 41 Nikolai Sednev 2018-04-22 15:37:51 UTC
Works for me on these components:
rhvm-appliance-4.2-20180420.0.el7.noarch
ovirt-hosted-engine-setup-2.2.18-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
ovirt-engine-setup-4.2.3.2-0.1.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 45 errata-xmlrpc 2018-05-15 17:43:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 46 Franta Kust 2019-05-16 12:54:27 UTC
BZ<2>Jira re-sync

Comment 47 Franta Kust 2019-05-16 13:08:13 UTC
BZ<2>Jira Resync

Comment 48 Daniel Gur 2019-08-28 13:14:30 UTC
sync2jira

Comment 49 Daniel Gur 2019-08-28 13:19:33 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.