Bug 1264085
| Summary: | [RFE] Reserve enough free memory for hosted-engine to start on a host being part of self-hosted env | ||
|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Jiri Belka <jbelka> |
| Component: | RFEs | Assignee: | Martin Sivák <msivak> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.5.4 | CC: | bugs, dfediuck, gklein, lsurette, mavital, melewis, michal.skrivanek, msivak, nsednev, rbalakri, Rhev-m-bugs, srevivo, ykaul, ylavi |
| Target Milestone: | ovirt-4.1.0-alpha | Keywords: | FutureFeature, Triaged |
| Target Release: | --- | Flags: | dfediuck:
ovirt-4.1?
gklein: testing_plan_complete+ rule-engine: planning_ack? rule-engine: devel_ack+ rule-engine: testing_ack+ |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Enhancement | |
| Doc Text: |
With this update, the user can configure the number of memory slots reserved for spare self-hosted engine hosts if the current host crashes. Previously, there was a chance that the self-hosted engine virtual machine would not have a place to start on a loaded cluster and this compromised the high availability feature. Now, the self-hosted engine will have a place to start a backup host so that it is ready to accept the virtual machine if the current host crashes.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-02-15 14:53:21 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1403956, 1406001 | ||
| Bug Blocks: | 1411319, 1427748, 1436613 | ||
|
Description
Jiri Belka
2015-09-17 13:17:14 UTC
(I wanted to go with quota but if memory on two involved hosts differ, i can't defined max memory in quota for other VMs but hosted-engine.) Our HostedEngine is down again and could not be started because of not enough free ram. As workaround we are going to waste memory with a process with will "eat" equal ram as HostedEngine and we will kill it via hooks when needed. Martin, is this already implemented? Yes, this is already in. The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified. I've deployed 2 hosts with 32Gig of RAM and HE with 16384Gig of RAM.
I've created some VMs with 1Gig RAM guaranteed memory per each VM and got started 43 active VMs+HE-VM over all.
43*1024=44032MB of RAM in total, while on puma18 were running 29 VMs and on puma19 were 14 guest VMs and HE-VM (15 VMs including HE-VM on puma19).
When I've tried to start additional VM, I've received this error:
"
Operation Canceled
Error while executing action:
Memory_load-41:
Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:
The host puma18.scl.lab.tlv.redhat.com did not satisfy internal filter Memory because its available memory is too low (100.000000 MB) to run the VM.
The host puma19.scl.lab.tlv.redhat.com did not satisfy internal filter Memory because its available memory is too low (0.000000 MB) to run the VM.
"
RAM reported from WEBUI on puma18 (29 guest-VMs runnig on host):
Physical Memory:
32067 MB total, 6413 MB used, 25654 MB free
Swap Size:
2047 MB total, 0 MB used, 2047 MB free
Shared Memory:
27%
Max free Memory for scheduling new VMs:
100 MB
RAM reported from WEBUI on puma19 (host that was running HE-VM with 16Gig RAM and 14 guest-VMs):
Physical Memory:
32067 MB total, 18278 MB used, 13789 MB free
Swap Size:
2047 MB total, 0 MB used, 2047 MB free
Shared Memory:
7%
Max free Memory for scheduling new VMs:
0 MB
Now I've manually stopped HE-VM on puma19 to see if it will get started on puma18 (as because of manually killing HE-VM on puma19, its HA score being zeroed and HA should start HE-VM on another host with positive best score, which in my case should be puma18, as I have only 2 hosted-engine-hosts in my environment.
puma19 ~]# hosted-engine --vm-poweroff
puma19 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : puma18.scl.lab.tlv.redhat.com
Host ID : 1
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 24c902b4
local_conf_timestamp : 12304
Host timestamp : 518905
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=518905 (Mon Jan 2 14:52:08 2017)
host-id=1
score=3400
vm_conf_refresh_time=12304 (Tue Dec 27 18:08:37 2016)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : puma19.scl.lab.tlv.redhat.com
Host ID : 2
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 0
stopped : False
Local maintenance : False
crc32 : 0b49f26a
local_conf_timestamp : 10819
Host timestamp : 517458
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=517458 (Mon Jan 2 14:52:34 2017)
host-id=2
score=0
vm_conf_refresh_time=10819 (Tue Dec 27 18:08:23 2016)
conf_on_shared_storage=True
maintenance=False
state=EngineUnexpectedlyDown
stopped=False
puma18 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : puma18.scl.lab.tlv.redhat.com
Host ID : 1
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : f5d68fe8
local_conf_timestamp : 12304
Host timestamp : 519035
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=519035 (Mon Jan 2 14:54:19 2017)
host-id=1
score=3400
vm_conf_refresh_time=12304 (Tue Dec 27 18:08:37 2016)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : puma19.scl.lab.tlv.redhat.com
Host ID : 2
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 0
stopped : False
Local maintenance : False
crc32 : 613a9a8b
local_conf_timestamp : 10819
Host timestamp : 517581
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=517581 (Mon Jan 2 14:54:37 2017)
host-id=2
score=0
vm_conf_refresh_time=10819 (Tue Dec 27 18:08:23 2016)
conf_on_shared_storage=True
maintenance=False
state=EngineUnexpectedlyDown
stopped=False
timeout=Wed Jan 7 01:54:53 1970
puma18 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : puma18.scl.lab.tlv.redhat.com
Host ID : 1
Engine status : {"health": "good", "vm": "up", "detail": "up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 29dfc563
local_conf_timestamp : 12304
Host timestamp : 521268
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=521268 (Mon Jan 2 15:31:32 2017)
host-id=1
score=3400
vm_conf_refresh_time=12304 (Tue Dec 27 18:08:37 2016)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : puma19.scl.lab.tlv.redhat.com
Host ID : 2
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : fb1a3aa2
local_conf_timestamp : 10819
Host timestamp : 519787
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=519787 (Mon Jan 2 15:31:23 2017)
host-id=2
score=3400
vm_conf_refresh_time=10819 (Tue Dec 27 18:08:23 2016)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
Finally HE-VM was started on puma18 and host was running 30VMs over all, while 29 guest-VMs and HE-VM.
puma18:
Physical Memory:
32067 MB total, 10582 MB used, 21485 MB free
Swap Size:
2047 MB total, 0 MB used, 2047 MB free
Shared Memory:
30%
Max free Memory for scheduling new VMs:
Works for me on these components on hosts:
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0-0.0.master.20161221071755.git46cacd3.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch
libvirt-client-2.0.0-10.el7_3.2.x86_64
ovirt-release41-pre-4.1.0-0.6.beta2.20161221025826.gitc487776.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64
ovirt-hosted-engine-ha-2.1.0-0.0.master.20161221070856.20161221070854.git387fa53.el7.centos.noarch
rhevm-appliance-20161116.0-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-host-deploy-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
vdsm-4.18.999-1218.gitd36143e.el7.centos.x86_64
ovirt-imageio-daemon-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)
On engine:
ovirt-engine-setup-plugin-ovirt-engine-common-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-imageio-proxy-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-iso-uploader-4.1.0-0.0.master.20160909154152.git14502bd.el7.centos.noarch
ovirt-engine-userportal-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-dbscripts-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-extensions-api-impl-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-host-deploy-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
python-ovirt-engine-sdk4-4.1.0-0.1.a0.20161215git77fce51.el7.centos.x86_64
ovirt-host-deploy-java-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
ovirt-release41-pre-4.1.0-0.6.beta2.20161221025826.gitc487776.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch
ovirt-engine-extension-aaa-jdbc-1.1.2-1.el7.noarch
ovirt-engine-dwh-setup-4.1.0-0.0.master.20161129154019.el7.centos.noarch
ovirt-imageio-proxy-setup-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-engine-tools-backup-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-websocket-proxy-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-backend-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-tools-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-webadmin-portal-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-restapi-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-vmconsole-proxy-helper-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-wildfly-overlay-10.0.0-1.el7.noarch
ovirt-engine-cli-3.6.9.2-1.el7.centos.noarch
ovirt-web-ui-0.1.1-2.el7.centos.x86_64
ovirt-engine-setup-base-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7.centos.noarch
ovirt-engine-dwh-4.1.0-0.0.master.20161129154019.el7.centos.noarch
ovirt-engine-setup-plugin-websocket-proxy-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-hosts-ansible-inventory-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-dashboard-1.1.0-0.4.20161128git5ed6f96.el7.centos.noarch
ovirt-engine-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-guest-agent-common-1.0.13-1.20161220085008.git165fff1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7.centos.noarch
ovirt-engine-wildfly-10.1.0-1.el7.x86_64
ovirt-engine-lib-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-vmconsole-proxy-1.0.4-1.el7.centos.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (builder.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Dec 6 23:06:41 UTC 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 7.3.1611 (Core)
*** Bug 1436002 has been marked as a duplicate of this bug. *** |