Bug 1560574
| Summary: | [downstream clone - 4.1.11] HE host is not taken out of Local Maintenance after reinstall or upgrade | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | RHV bug bot <rhv-bugzilla-bot> | ||||||||
| Component: | ovirt-engine | Assignee: | Ravi Nori <rnori> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Nikolai Sednev <nsednev> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 4.1.4 | CC: | gveitmic, jentrena, ldomb, lsurette, mavital, mgoldboi, mkalinin, mperina, msivak, nsednev, pdwyer, pstehlik, rbalakri, Rhev-m-bugs, rnori, sbonazzo, srevivo, ykaul, ylavi | ||||||||
| Target Milestone: | ovirt-4.1.11 | Keywords: | Reopened, Triaged, ZStream | ||||||||
| Target Release: | --- | Flags: | lsvaty:
testing_plan_complete-
|
||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | 1489982 | Environment: | |||||||||
| Last Closed: | 2018-04-24 15:30:28 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 1489982 | ||||||||||
| Bug Blocks: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
RHV bug bot
2018-03-26 13:18:28 UTC
I think, it actually should be high. The end user would expect the host to be out of HE maintenance. And if it does not go back out of maintenance automatically, without informing the user, it is not right flow. (Originally by Marina Kalinin) *** Bug 1501016 has been marked as a duplicate of this bug. *** (Originally by Sandro Bonazzola) I am unable to reproduce this on master and 4.2 trying 4.1 (Originally by Ravi Shankar Nori) Works in ovirt-engine-backend-4.1.9.1-1.el7.centos.noarch too. Tried with both node-ng and vdsm-4.19.45-1.el7.centos.x86_64 on cent os 7. The host is activated after reinstall. Please check with latest 4.1 build (Originally by Ravi Shankar Nori) Created attachment 1388412 [details]
Activate hosts python script
(Originally by Ravi Shankar Nori)
(In reply to Ravi Nori from comment #13) > Works in ovirt-engine-backend-4.1.9.1-1.el7.centos.noarch too. > > Tried with both node-ng and vdsm-4.19.45-1.el7.centos.x86_64 on cent os 7. > The host is activated after reinstall. > > Please check with latest 4.1 build Nori, if you tested it on 4.1.9 and it didn't reproduce for you, i.e. after reinstall HE Local maintenance was disabled on the host - let's close it as if it works in 4.1.9. (Originally by Marina Kalinin) Nikolai, maybe you can help Nori verifying this bug? Thank you! (Originally by Marina Kalinin) That is fixed in the latest version of 4.2 beta (Originally by Laurent Domb) The operation works just fine on 4.2.1.5-0.1.el7. rhvm-appliance-4.2-20180202.0.el7.noarch ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.9-1.el7ev.noarch Linux 3.10.0-693.17.1.el7.x86_64 #1 SMP Sun Jan 14 10:36:03 EST 2018 x86_64 x86_64 x86_64 GNU/Linux (Originally by Nikolai Sednev) Reopening to backport fix to 4.1.10. (Originally by ylavi) (In reply to Yaniv Lavi from comment #20) > Reopening to backport fix to 4.1.10. What do you want to backport? According to Comment 13 it works fine in 4.1.9 (Originally by Martin Perina) Added to 4.1.10 errata and moving to ON_QA. Nikolai, could you please verify, that every flow works as expected in 4.1.10 and we haven't missed anything? (Originally by Martin Perina) INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: rhv-devops (Originally by rhv-bugzilla-bot) INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: rhv-devops (Originally by rhv-bugzilla-bot) (In reply to Martin Perina from comment #22) > Added to 4.1.10 errata and moving to ON_QA. Nikolai, could you please > verify, that every flow works as expected in 4.1.10 and we haven't missed > anything? Could you please define required flows? (Originally by Nikolai Sednev) Original issue is still being reproduced on latest 4.1.10.1-0.1.el7
Reproduction steps:
1.Deployed rhevm-4.1.9.1-0.1.el7.noarch on pair of 4.1.9 ha-hosts, engine was running on RHEL7.4, hosts on RHEL7.5.
2.Set global maintenance via UI.
3."yum update -y ovirt-engine-setup" to rhevm-4.1.10.1-0.1.el7.noarch.
4.Upgraded the engine to rhevm-4.1.10.1-0.1.el7.noarch using "engine-setup".
5."yum update -y" on engine to get RHEL7.4 updated to RHEL7.5.
6.Rebooted the engine from the engine.
7.Started engine from host using "hosted-engine --vm-start".
8.Removed global maintenance from ha-hosts.
9.Logged in to the engine's UI and set one of two hosts *alma03, the first host that was not hosting SHE-VM and it was not SPM) in to maintenance and then reinstalled it, after reinstall, host recovered and got automatically activated.
10.Reinstalled ha-host became in local maintenance in CLI, and in UI it was appeared as "Unavailable due to HA score".
See result in CLI:
alma03 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : alma03
Host ID : 1
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 0
stopped : False
Local maintenance : True
crc32 : bb19601a
local_conf_timestamp : 9806
Host timestamp : 9804
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=9804 (Sun Feb 25 19:03:13 2018)
host-id=1
score=0
vm_conf_refresh_time=9806 (Sun Feb 25 19:03:15 2018)
conf_on_shared_storage=True
maintenance=True
state=LocalMaintenance
stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : alma04
Host ID : 2
Engine status : {"health": "good", "vm": "up", "detail": "up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 915c08da
local_conf_timestamp : 9769
Host timestamp : 9767
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=9767 (Sun Feb 25 19:03:19 2018)
host-id=2
score=3400
vm_conf_refresh_time=9769 (Sun Feb 25 19:03:21 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False
Print screen from UI and sosreports from both hosts and the engine are attached.
Moving back to assigned.
(Originally by Nikolai Sednev)
Created attachment 1400614 [details]
Screenshot from 2018-02-25 19-07-06.png
(Originally by Nikolai Sednev)
Created attachment 1400615 [details]
engine logs
(Originally by Nikolai Sednev)
Created attachment 1400616 [details]
alma03 in local maintenance
(Originally by Nikolai Sednev)
Created attachment 1400617 [details]
alma04 logs
(Originally by Nikolai Sednev)
To enable alma03, I manually hade to cast "hosted-engine --set-maintenance --mode=none" from CLI. See also attached screencast. (Originally by Nikolai Sednev) Created attachment 1400618 [details]
screencast
(Originally by Nikolai Sednev)
*** Bug 1536286 has been marked as a duplicate of this bug. *** (Originally by ylavi) Ravi, are you looking into this? (Originally by Yaniv Kaul) I was able to reproduce the issue on 4.1.9. The patch https://gerrit.ovirt.org/#/c/86645/ for BZ 1532709 fixes the issue and has not been merged. (Originally by Ravi Shankar Nori) This is not going to make it to 4.1.10 - please re-target. (Originally by Yaniv Kaul) In CLI host reports its status correctly:
--== Host 2 status ==--
conf_on_shared_storage : True
Status up-to-date : False
Hostname : alma04
Host ID : 2
Engine status : unknown stale-data
Score : 3400
stopped : False
Local maintenance : False
crc32 : 585c8d69
local_conf_timestamp : 12922
Host timestamp : 13079
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=13079 (Thu Apr 12 19:11:47 2018)
host-id=2
score=3400
vm_conf_refresh_time=12922 (Thu Apr 12 19:09:10 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
Although in WEBUI I clearly see that host has "Hosted Engine HA:
Not Active" instead of 3400 and it appears up, although not as ha-capable-host.
Moving back to assigned.
Tested on these components:
ovirt-hosted-engine-setup-2.1.4.2-1.el7ev.noarch
ovirt-hosted-engine-ha-2.1.11-1.el7ev.noarch
rhvm-appliance-4.1.20180125.0-1.el7.noarch
Red Hat Enterprise Linux Server release 7.5 (Maipo)
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
After some time, host appears in UI as:
Hosted Engine HA:
Local Maintenance Enabled
In CLI host also appears as in local maintenance:
--== Host 2 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : alma04
Host ID : 2
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 0
stopped : False
Local maintenance : True
crc32 : 4f42e83e
local_conf_timestamp : 14048
Host timestamp : 14206
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=14206 (Thu Apr 12 19:30:33 2018)
host-id=2
score=0
vm_conf_refresh_time=14048 (Thu Apr 12 19:27:55 2018)
conf_on_shared_storage=True
maintenance=True
state=LocalMaintenance
stopped=False
Created attachment 1420937 [details]
Screenshot from 2018-04-12 19-32-32.png
Created attachment 1420951 [details]
sosreport from alma04
Created attachment 1420952 [details]
engine logs
(In reply to RHV Bugzilla Automation and Verification Bot from comment #27) > Original issue is still being reproduced on latest 4.1.10.1-0.1.el7 > > Reproduction steps: > 1.Deployed rhevm-4.1.9.1-0.1.el7.noarch on pair of 4.1.9 ha-hosts, engine > was running on RHEL7.4, hosts on RHEL7.5. > 2.Set global maintenance via UI. > 3."yum update -y ovirt-engine-setup" to rhevm-4.1.10.1-0.1.el7.noarch. > 4.Upgraded the engine to rhevm-4.1.10.1-0.1.el7.noarch using "engine-setup". As mentioned in the Target Milestone this fix is included in 4.1.11, please retest with correct version Works for me on these components: Host: ovirt-hosted-engine-setup-2.1.4.2-1.el7ev.noarch ovirt-hosted-engine-ha-2.1.11-1.el7ev.noarch Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) Engine: ovirt-engine-4.1.11.1-0.1.el7.noarch Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) Moving back forth to https://bugzilla.redhat.com/show_bug.cgi?id=1560574#c46. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1219 BZ<2>Jira Resync sync2jira sync2jira |