Bug 1314377
Summary: | Confusing message when the HE VM cannot be migrated to "failing" HE host | ||
---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Jiri Belka <jbelka> |
Component: | Backend.Core | Assignee: | Andrej Krejcir <akrejcir> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 3.6.3.3 | CC: | bugs, dfediuck, jbelka, mavital |
Target Milestone: | ovirt-4.2.0 | Keywords: | Triaged |
Target Release: | 4.2.0 | Flags: | rule-engine:
ovirt-4.2+
rule-engine: planning_ack+ rgolan: devel_ack+ mavital: testing_ack+ |
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: |
undefined
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-12-20 11:01:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1512534 | ||
Bug Blocks: |
Description
Jiri Belka
2016-03-03 13:22:36 UTC
It seems there's something wrong in logic: - i was migration from dell-r210ii-03 to dell-r210ii-04 but free memory computation was related to _source_ host (ie. dell-r210ii-03): ~~~ ... 2016-07-22 14:13:08,648 DEBUG [org.ovirt.engine.core.bll.scheduling.policyunits.HostedEngineHAClusterFilterPolicyUnit] (default task-63) [3610c5f5] Host 'dell-r210ii-04' was filtered out as it doesn't have a posi tive score (the score is 0) 2016-07-22 14:13:08,648 DEBUG [org.ovirt.engine.core.bll.scheduling.policyunits.HostedEngineHAClusterFilterPolicyUnit] (default task-63) [3610c5f5] Host 'dell-r210ii-04' was filtered out as it doesn't have a posi tive score (the score is 0) 2016-07-22 14:13:08,648 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-63) [3610c5f5] Candidate host 'dell-r210ii-04' ('c088b29d-30ac-454b-890f-d09301758433') was filtered out by 'VA R__FILTERTYPE__INTERNAL' filter 'HA' (correlation id: null) 2016-07-22 14:13:08,648 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-63) [3610c5f5] Candidate host 'dell-r210ii-04' ('c088b29d-30ac-454b-890f-d09301758433') was filtered out by 'VA R__FILTERTYPE__INTERNAL' filter 'HA' (correlation id: null) ^^^ Candidate host 'dell-r210ii-04' ! 2016-07-22 14:13:08,648 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-63) [3610c5f5] hasPhysMemoryToRunVM: host 'dell-r210ii-03'; free memory is : 3848.0 MB (+ 0 MB pending); free swap i s: 8191 MB, required memory is 4161.0 MB; Guest overhead 65 MB ^^^ hasPhysMemorytoRunVM: host 'dell-r210ii-03' ! 2016-07-22 14:13:08,648 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-63) [3610c5f5] hasPhysMemoryToRunVM: host 'dell-r210ii-03'; free memory is : 3848.0 MB (+ 0 MB pending); free swap i s: 8191 MB, required memory is 4161.0 MB; Guest overhead 65 MB 2016-07-22 14:13:08,648 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-63) [3610c5f5] 4161.0 <= ??? 12039.0 2016-07-22 14:13:08,648 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-63) [3610c5f5] 4161.0 <= ??? 12039.0 2016-07-22 14:13:08,649 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-63) [3610c5f5] hasOvercommitMemoryToRunVM: host 'dell-r210ii-03'; max scheduling memory : 3151.0 MB; required memory is 4161.0 MB; Guest overhead 65 MB 2016-07-22 14:13:08,649 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-63) [3610c5f5] hasOvercommitMemoryToRunVM: host 'dell-r210ii-03'; max scheduling memory : 3151.0 MB; required memory is 4161.0 MB; Guest overhead 65 MB 2016-07-22 14:13:08,649 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-63) [3610c5f5] 4161.0 <= ??? 3151.0 2016-07-22 14:13:08,649 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-63) [3610c5f5] 4161.0 <= ??? 3151.0 2016-07-22 14:13:08,649 DEBUG [org.ovirt.engine.core.bll.scheduling.policyunits.MemoryPolicyUnit] (default task-63) [3610c5f5] Host 'dell-r210ii-03' is already too close to the memory overcommitment limit. It can only accept 3151.0 MB of additional memory load. 2016-07-22 14:13:08,649 DEBUG [org.ovirt.engine.core.bll.scheduling.policyunits.MemoryPolicyUnit] (default task-63) [3610c5f5] Host 'dell-r210ii-03' is already too close to the memory overcommitment limit. It can only accept 3151.0 MB of additional memory load. 2016-07-22 14:13:08,649 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-63) [3610c5f5] Candidate host 'dell-r210ii-03' ('320d4f9c-4360-40bb-af7e-d93dcc87401b') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null) 2016-07-22 14:13:08,649 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-63) [3610c5f5] Candidate host 'dell-r210ii-03' ('320d4f9c-4360-40bb-af7e-d93dcc87401b') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null) 2016-07-22 14:13:08,649 WARN [org.ovirt.engine.core.bll.MigrateVmCommand] (default task-63) [3610c5f5] Validation of action 'MigrateVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName dell-r210ii-03,$filterName Memory,$availableMem 3151.000000,VAR__DETAIL__NOT_ENOUGH_MEMORY,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL,VAR__FILTERTYPE__INTERNAL,$hostName dell-r210ii-04,$filterName HA,VAR__DETAIL__NOT_HE_HOST,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL ^^ uh? 2016-07-22 14:13:08,649 WARN [org.ovirt.engine.core.bll.MigrateVmCommand] (default task-63) [3610c5f5] Validation of action 'MigrateVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__MIGRATE,VAR__TY: ... ~~~ My env was SHE, migrated from oVirt 3.6 to 4.0-snapshot but DC/CL compat was still 3.6 on 4.x engine. ovirt-engine-4.0.3-0.0.master.20160720203246.git9c88731.el7.centos.noarch my hosts: oVirt Node Hypervisor release 3.6 (0.999.201607211021.el7.centos) ovirt-release-host-node-4.0.2-1.el7.noarch 1.does the cluster have any scheduling policy (different from none)? 2.how was the migration done ? via the UI ? REST ? load balancing ? (In reply to Yanir from comment #5) > 1.does the cluster have any scheduling policy (different from none)? > 2.how was the migration done ? via the UI ? REST ? load balancing ? My memory is little bit dim after two months since submitting this issue. 1. Nope 2. Via 'Migrate' button in UI There are logs as attachments. Regarding https://bugzilla.redhat.com/show_bug.cgi?id=1314377#c2 : 1.the source host appears in log messages because all hosts are taken into computation of the target host (unless you choose a specific host you want to migrate to) 2.the memory Debug messages gives you statistics regarding the memory state of each host, memory state is not the issue of filtering out a host (note again that these are debug messages) now for the main reason of this bug : " Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details: The host _slot-5b did not satisfy internal filter HA because it is not a Hosted Engine host.." The reason for the score being zero (thus filtering it out as a not hosted engine host) might be that the engine hasn't noticed the deployment yet or something failed on the host. Since the score comes from VDSM we can add to the UI or Debug log message something like : "... if it is a designated hosted engine host - check host status" (this can be done by logging into any hosted engine host and issuing hosted-engine --vm-status command.) Other option is to leave the log message as it is. This can be considered as a documentation bug. The hosted engine score was 0, because the host was not ready: MainThread::INFO::2016-03-03 14:10:46,292::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::ERROR::2016-03-03 14:10:46,292::hosted_engine::845::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=23c03bb6-9889-4cbf-b7ad-55b9a2c70653, host_id=2): timeout during domain acquisition Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 452, in start_monitoring self._initialize_domain_monitor() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 846, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=23c03bb6-9889-4cbf-b7ad-55b9a2c70653, host_id=2): timeout during domain acquisition VDSM took a long time to synchronize its internals. This bug can be now considered just an error message update bug. We should distinguish between non-HE host and not operational, but HE enabled host in the scheduling result. Deployed over Gluster using: ovirt-hosted-engine-ha-2.2.0-0.0.master.20171128125909.20171128125907.gitfa5daa6.el7.centos.noarch ovirt-hosted-engine-setup-2.2.0-0.0.master.20171129192644.git440040c.el7.centos.noarch ovirt-engine-appliance-4.2-20171129.1.el7.centos.noarch Caused destination host to get zero score prior to migration of SHE-VM to it. Tried to migrate SHE-VM to destination ha-host with zero score and received this message from UI: "Operation Canceled Error while executing action: HostedEngine: Cannot migrate VM because the VM is in Not Responding status." Moving to verified. This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |