+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1489982 +++ ====================================================================== Description of problem: HE host is not taken out of Local Maintenance after Reinstall. Which is incorrect, since it was put into HE Local maintenance when we enabled regular maintenance for it in UI. So once it is activated back, HE local maintenance should be canceled as well. Version-Release number of selected component (if applicable): 4.1.4 Steps to Reproduce: 1. Put host in maintenance. 2. Select Reinstall option in UI and wait till reinstall is performed and the host is active back in UI. Actual results: Host is still in local HE maintenance and requires manual intervention from the command line to disable the HE maintenance. Expected results: Host should be fully operational once it is activated. Or, if impossible, we should at least provide a UI option to disable local HE maintenance. (Originally by Marina Kalinin)
I think, it actually should be high. The end user would expect the host to be out of HE maintenance. And if it does not go back out of maintenance automatically, without informing the user, it is not right flow. (Originally by Marina Kalinin)
*** Bug 1501016 has been marked as a duplicate of this bug. *** (Originally by Sandro Bonazzola)
I am unable to reproduce this on master and 4.2 trying 4.1 (Originally by Ravi Shankar Nori)
Works in ovirt-engine-backend-4.1.9.1-1.el7.centos.noarch too. Tried with both node-ng and vdsm-4.19.45-1.el7.centos.x86_64 on cent os 7. The host is activated after reinstall. Please check with latest 4.1 build (Originally by Ravi Shankar Nori)
Created attachment 1388412 [details] Activate hosts python script (Originally by Ravi Shankar Nori)
(In reply to Ravi Nori from comment #13) > Works in ovirt-engine-backend-4.1.9.1-1.el7.centos.noarch too. > > Tried with both node-ng and vdsm-4.19.45-1.el7.centos.x86_64 on cent os 7. > The host is activated after reinstall. > > Please check with latest 4.1 build Nori, if you tested it on 4.1.9 and it didn't reproduce for you, i.e. after reinstall HE Local maintenance was disabled on the host - let's close it as if it works in 4.1.9. (Originally by Marina Kalinin)
Nikolai, maybe you can help Nori verifying this bug? Thank you! (Originally by Marina Kalinin)
That is fixed in the latest version of 4.2 beta (Originally by Laurent Domb)
The operation works just fine on 4.2.1.5-0.1.el7. rhvm-appliance-4.2-20180202.0.el7.noarch ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.9-1.el7ev.noarch Linux 3.10.0-693.17.1.el7.x86_64 #1 SMP Sun Jan 14 10:36:03 EST 2018 x86_64 x86_64 x86_64 GNU/Linux (Originally by Nikolai Sednev)
Reopening to backport fix to 4.1.10. (Originally by ylavi)
(In reply to Yaniv Lavi from comment #20) > Reopening to backport fix to 4.1.10. What do you want to backport? According to Comment 13 it works fine in 4.1.9 (Originally by Martin Perina)
Added to 4.1.10 errata and moving to ON_QA. Nikolai, could you please verify, that every flow works as expected in 4.1.10 and we haven't missed anything? (Originally by Martin Perina)
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: rhv-devops (Originally by rhv-bugzilla-bot)
(In reply to Martin Perina from comment #22) > Added to 4.1.10 errata and moving to ON_QA. Nikolai, could you please > verify, that every flow works as expected in 4.1.10 and we haven't missed > anything? Could you please define required flows? (Originally by Nikolai Sednev)
Original issue is still being reproduced on latest 4.1.10.1-0.1.el7 Reproduction steps: 1.Deployed rhevm-4.1.9.1-0.1.el7.noarch on pair of 4.1.9 ha-hosts, engine was running on RHEL7.4, hosts on RHEL7.5. 2.Set global maintenance via UI. 3."yum update -y ovirt-engine-setup" to rhevm-4.1.10.1-0.1.el7.noarch. 4.Upgraded the engine to rhevm-4.1.10.1-0.1.el7.noarch using "engine-setup". 5."yum update -y" on engine to get RHEL7.4 updated to RHEL7.5. 6.Rebooted the engine from the engine. 7.Started engine from host using "hosted-engine --vm-start". 8.Removed global maintenance from ha-hosts. 9.Logged in to the engine's UI and set one of two hosts *alma03, the first host that was not hosting SHE-VM and it was not SPM) in to maintenance and then reinstalled it, after reinstall, host recovered and got automatically activated. 10.Reinstalled ha-host became in local maintenance in CLI, and in UI it was appeared as "Unavailable due to HA score". See result in CLI: alma03 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma03 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped : False Local maintenance : True crc32 : bb19601a local_conf_timestamp : 9806 Host timestamp : 9804 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=9804 (Sun Feb 25 19:03:13 2018) host-id=1 score=0 vm_conf_refresh_time=9806 (Sun Feb 25 19:03:15 2018) conf_on_shared_storage=True maintenance=True state=LocalMaintenance stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma04 Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 915c08da local_conf_timestamp : 9769 Host timestamp : 9767 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=9767 (Sun Feb 25 19:03:19 2018) host-id=2 score=3400 vm_conf_refresh_time=9769 (Sun Feb 25 19:03:21 2018) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False Print screen from UI and sosreports from both hosts and the engine are attached. Moving back to assigned. (Originally by Nikolai Sednev)
Created attachment 1400614 [details] Screenshot from 2018-02-25 19-07-06.png (Originally by Nikolai Sednev)
Created attachment 1400615 [details] engine logs (Originally by Nikolai Sednev)
Created attachment 1400616 [details] alma03 in local maintenance (Originally by Nikolai Sednev)
Created attachment 1400617 [details] alma04 logs (Originally by Nikolai Sednev)
To enable alma03, I manually hade to cast "hosted-engine --set-maintenance --mode=none" from CLI. See also attached screencast. (Originally by Nikolai Sednev)
Created attachment 1400618 [details] screencast (Originally by Nikolai Sednev)
*** Bug 1536286 has been marked as a duplicate of this bug. *** (Originally by ylavi)
Ravi, are you looking into this? (Originally by Yaniv Kaul)
I was able to reproduce the issue on 4.1.9. The patch https://gerrit.ovirt.org/#/c/86645/ for BZ 1532709 fixes the issue and has not been merged. (Originally by Ravi Shankar Nori)
This is not going to make it to 4.1.10 - please re-target. (Originally by Yaniv Kaul)
In CLI host reports its status correctly: --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : alma04 Host ID : 2 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 585c8d69 local_conf_timestamp : 12922 Host timestamp : 13079 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=13079 (Thu Apr 12 19:11:47 2018) host-id=2 score=3400 vm_conf_refresh_time=12922 (Thu Apr 12 19:09:10 2018) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False Although in WEBUI I clearly see that host has "Hosted Engine HA: Not Active" instead of 3400 and it appears up, although not as ha-capable-host. Moving back to assigned. Tested on these components: ovirt-hosted-engine-setup-2.1.4.2-1.el7ev.noarch ovirt-hosted-engine-ha-2.1.11-1.el7ev.noarch rhvm-appliance-4.1.20180125.0-1.el7.noarch Red Hat Enterprise Linux Server release 7.5 (Maipo) Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
After some time, host appears in UI as: Hosted Engine HA: Local Maintenance Enabled In CLI host also appears as in local maintenance: --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma04 Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped : False Local maintenance : True crc32 : 4f42e83e local_conf_timestamp : 14048 Host timestamp : 14206 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=14206 (Thu Apr 12 19:30:33 2018) host-id=2 score=0 vm_conf_refresh_time=14048 (Thu Apr 12 19:27:55 2018) conf_on_shared_storage=True maintenance=True state=LocalMaintenance stopped=False
Created attachment 1420937 [details] Screenshot from 2018-04-12 19-32-32.png
Created attachment 1420951 [details] sosreport from alma04
Created attachment 1420952 [details] engine logs
(In reply to RHV Bugzilla Automation and Verification Bot from comment #27) > Original issue is still being reproduced on latest 4.1.10.1-0.1.el7 > > Reproduction steps: > 1.Deployed rhevm-4.1.9.1-0.1.el7.noarch on pair of 4.1.9 ha-hosts, engine > was running on RHEL7.4, hosts on RHEL7.5. > 2.Set global maintenance via UI. > 3."yum update -y ovirt-engine-setup" to rhevm-4.1.10.1-0.1.el7.noarch. > 4.Upgraded the engine to rhevm-4.1.10.1-0.1.el7.noarch using "engine-setup". As mentioned in the Target Milestone this fix is included in 4.1.11, please retest with correct version
Works for me on these components: Host: ovirt-hosted-engine-setup-2.1.4.2-1.el7ev.noarch ovirt-hosted-engine-ha-2.1.11-1.el7ev.noarch Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) Engine: ovirt-engine-4.1.11.1-0.1.el7.noarch Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo)
Moving back forth to https://bugzilla.redhat.com/show_bug.cgi?id=1560574#c46.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1219
BZ<2>Jira Resync
sync2jira