Bug 1130173
| Summary: | can't start hosted engine VM in cluster with 3+ hosts | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Jiri Moskovcak <jmoskovc> | |
| Component: | ovirt-hosted-engine-ha | Assignee: | Doron Fediuck <dfediuck> | |
| Status: | CLOSED ERRATA | QA Contact: | Nikolai Sednev <nsednev> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.5.0 | CC: | dfediuck, ecohen, gklein, iheim, lsurette, mavital, rbalakri, sbonazzo, yeylon | |
| Target Milestone: | --- | Keywords: | Triaged, ZStream | |
| Target Release: | 3.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | sla | |||
| Fixed In Version: | ovirt-hosted-engine-ha-1.2.2-1.el6ev | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1147411 (view as bug list) | Environment: | ||
| Last Closed: | 2015-02-11 21:09:07 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
|
Description
Jiri Moskovcak
2014-08-14 13:41:17 UTC
Missing merge on 1.2 branch Same as 1147411 behaviour, here even on two hosts only, host on which vm being manually started, for the beginning being powered up, but then goes to powering down:
[root@brown-vdsd ~]# hosted-engine --vm-status
--== Host 1 status ==--
Status up-to-date : True
Hostname : 10.35.103.12
Host ID : 1
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 2400
Local maintenance : False
Host timestamp : 96751
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=96751 (Wed Oct 22 11:10:48 2014)
host-id=1
score=2400
maintenance=False
state=EngineDown
--== Host 3 status ==--
Status up-to-date : True
Hostname : 10.35.106.13
Host ID : 3
Engine status : {"reason": "bad vm status", "health": "bad", "vm": "up", "detail": "powering up"}
Score : 2400
Local maintenance : False
Host timestamp : 77390
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=77390 (Wed Oct 22 08:10:47 2014)
host-id=3
score=2400
maintenance=False
state=EngineStop
timeout=Thu Jan 1 23:34:27 1970
[root@brown-vdsd ~]# hosted-engine --vm-status
[root@brown-vdsd ~]# hosted-engine --vm-status
--== Host 1 status ==--
Status up-to-date : True
Hostname : 10.35.103.12
Host ID : 1
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 2400
Local maintenance : False
Host timestamp : 96961
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=96961 (Wed Oct 22 11:14:18 2014)
host-id=1
score=2400
maintenance=False
state=EngineDown
--== Host 3 status ==--
Status up-to-date : True
Hostname : 10.35.106.13
Host ID : 3
Engine status : {"health": "good", "vm": "up", "detail": "powering down"}
Score : 2400
Local maintenance : False
Host timestamp : 77593
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=77593 (Wed Oct 22 08:14:10 2014)
host-id=3
score=2400
maintenance=False
state=EngineStop
timeout=Thu Jan 1 23:34:27 1970
Engine actually up and HE within GUI shown as being powered down, then powered up and stays up. After some time engine goes within the GUI to UP again and via CLI shown as follows:
[root@brown-vdsd ~]# hosted-engine --vm-status
--== Host 1 status ==--
Status up-to-date : True
Hostname : 10.35.103.12
Host ID : 1
Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"}
Score : 2400
Local maintenance : False
Host timestamp : 97097
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=97097 (Wed Oct 22 11:16:34 2014)
host-id=1
score=2400
maintenance=False
state=EngineStarting
--== Host 3 status ==--
Status up-to-date : True
Hostname : 10.35.106.13
Host ID : 3
Engine status : {"reason": "bad vm status", "health": "bad", "vm": "up", "detail": "powering up"}
Score : 2400
Local maintenance : False
Host timestamp : 77732
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=77732 (Wed Oct 22 08:16:28 2014)
host-id=3
score=2400
maintenance=False
state=EngineStarting
You have new mail in /var/spool/mail/root
[root@brown-vdsd ~]# hosted-engine --vm-status
--== Host 1 status ==--
Status up-to-date : True
Hostname : 10.35.103.12
Host ID : 1
Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"}
Score : 2400
Local maintenance : False
Host timestamp : 97131
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=97131 (Wed Oct 22 11:17:08 2014)
host-id=1
score=2400
maintenance=False
state=EngineStarting
--== Host 3 status ==--
Status up-to-date : True
Hostname : 10.35.106.13
Host ID : 3
Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
Score : 2400
Local maintenance : False
Host timestamp : 77765
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=77765 (Wed Oct 22 08:17:02 2014)
host-id=3
score=2400
maintenance=False
state=EngineStarting
Behaviour should be stable, VM should be powered up.
The behaviour is not the same, in this bug the liveliness check fails, which means that agent fails to communicate with the engine (accessing the the health status page) so my guess here is that your network is somehow broken or the VM running the engine is overloaded. Either way, this is expected behaviour, and you should wait for a while if it will come back to 'up' and health 'good'. When you reproduce this again, please try to run this command, to test the accessibility of the engine status page: curl http://{fqdn}/ovirt-engine/services/health if it fetches the page correctly? also please not Works for me on these components: ovirt-host-deploy-1.3.0-2.el6ev.noarch ovirt-hosted-engine-setup-1.2.1-8.el6ev.noarch qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64 mom-0.4.1-4.el6ev.noarch sanlock-2.8-1.el6.x86_64 vdsm-4.16.8.1-3.el6ev.x86_64 libvirt-0.10.2-46.el6_6.2.x86_64 ovirt-hosted-engine-ha-1.2.4-3.el6ev.noarch rhevm-3.5.0-0.25.el6ev.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0194.html |