Bug 1670701
Summary: | Could not fetch data needed for VM migrate operation | ||
---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Sandro Bonazzola <sbonazzo> |
Component: | ovirt-engine-ui-extensions | Assignee: | Greg Sheremeta <gshereme> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Petr Matyáš <pmatyas> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.3.0 | CC: | akrejcir, bugs, dagur, izuckerm, lleistne, lsvaty, michal.skrivanek, mjohnson, pmatyas, rbarry, stirabos |
Target Milestone: | ovirt-4.3.3 | Flags: | aoconnor:
ovirt-4.3+
dagur: testing_ack+ |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-ui-extensions-1.0.4-1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-04-16 13:58:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sandro Bonazzola
2019-01-30 08:21:55 UTC
Greg, any ideas? Probably an issue with api/hosts?migration_target_of relevant code: await engineGet(`api/hosts?migration_target_of=${vms.map(vm => vm.id).join(',')}`) const targetHosts = json.host if (!Array.isArray(targetHosts)) { throw new Error('VmMigrateDataProvider: Failed to fetch target hosts') } waiting on access to the environment *** Bug 1690862 has been marked as a duplicate of this bug. *** We found another reproducer. Do you have access to the environment for the reproducer? This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. This bug is a regression testing blocker. We have another reproducer. The environment credentials are in the private message Steps to reproduce: 1. login to the web Admin UI and and try migrating vm "HostedEngine" to host b01-h03-r620 Please specify a workaround if exists for forcing the HE vm to migrate to the target host (just in order to proceed with our testing). Thank you. scale and performance QE team Also happens from the admin portal, with the following in the logs. Taking back to Virt: 2019-03-20 15:50:51,129Z INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-127) [54a31206-41e3-41f3-93e4-565497ab6a30] Candidate host 'host_mixed_1' ('42a138b6-efd6-4a7e-a6b2-d962b279f9ec') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPU' (correlation id: null) 2019-03-20 15:50:51,129Z INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-127) [54a31206-41e3-41f3-93e4-565497ab6a30] Candidate host 'b01-h03-r620.rhev.openstack.engineering.redhat.com' ('c83a8008-cd71-4671-9f10-1325f3364034') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPU' (correlation id: null) In the environment, the problem is that the VM needs more CPU cores than the hosts have. The VM needs 16 cores, but the hosts have only 12. The SMT threads are not counted as cores. A workaround can be to set the 'Count Threads as Cores' option for the cluster. Then the hosts will have 24 cores. Ok, so it's another instance of api/hosts?migration_target_of correctly returning an empty list {} So I think the fix is just updating the error message to something like "no acceptable target hosts found" That work for you, Ryan? Works for me I looked through the git log and didn't see why scheduler changes which would affect this. So the real question now is "is this a change in behavior?" If it is, let's change it back to treating cores as CPUs for this calculation. If it's not, let's fix the scale tests, remove "Regression" and the blocker flag, and see about giving a more meaningful error message I think this is basically a bug in ui-extensions, complicated by some unexpected data from engine. If there are no hosts that a VM can be migrated to, this is supposed to show: https://imgur.com/a/XL0f4yM [no available hosts] If I create a fresh environment with one host, and create one VM, and click migrate, that's the message I get. The VM is on the one host, and so there's nowhere to migrate it to. This is what the API call looks like: https://imgur.com/a/aq584mp The ui-extensions plugin JavaScript recognizes that the JSON is valid, filters out the host that the VM is already on, gets an emtpy list, and acts correctly. // If all VMs are currently running on the same host (currentHostIds.length === 1), // this particular host cannot be used as a migration target to any of the selected // VMs (since those VMs are already running on it). Otherwise, don't filter target // hosts, since each of them is a potential migration target to each of the VMs. return (currentHostIds.length === 1) ? targetHosts.filter(host => !currentHostIds.includes(host.id)) : targetHosts ^ so in an environment with 1 host and 1 vm, that filter runs on the populated JSON, and we end up with a proper empty JavaScript array, targetHosts = [] But it looks like something is happening in the API such that sometimes a completely empty JSON object is returned, not an array. if (!Array.isArray(targetHosts)) { throw new Error('VmMigrateDataProvider: Failed to fetch target hosts') } and then we see https://imgur.com/a/6cxiXNL [could not fetch data] So there is some condition in the engine that causes the REST API to return { } from /api/hosts when the ui-extensions code expects to always see at the very least { host: [ ] } I don't know exactly what condition this is -- maybe when a VM is already on a host, but that host is saturated and the API doesn't return that same host in the results. I can replicate the { } by commenting out some code in the API https://imgur.com/a/gnkllPw and it does give me the error https://imgur.com/a/6cxiXNL [could not fetch data] So ui-extensions expects the API to always return at least { host: [ ] }, not { } We can change this in engine, or in ui-extensions. Thoughts? My POV is basically: * Fix in ui-extensions (since this is probably not the only case in which it happens) * If it's a behavior change in the scheduler, go back to what it used to be so we don't trigger this "live" in the field (In reply to Ryan Barry from comment #16) > My POV is basically: > > * Fix in ui-extensions (since this is probably not the only case in which it > happens) Will do. > * If it's a behavior change in the scheduler, go back to what it used to be > so we don't trigger this "live" in the field No idea there. (In reply to Ryan Barry from comment #16) > My POV is basically: > > * Fix in ui-extensions (since this is probably not the only case in which it > happens) > * If it's a behavior change in the scheduler, go back to what it used to be > so we don't trigger this "live" in the field It is not a recent scheduler change. This behavior has been there a long time. The problem is that all hosts are filtered out, even the one where the VM is currently running. We could change the scheduler code, so that the host where a VM is currently running is always a valid target for migration. At least if it is not overloaded. Great that it's not a regression. Rather than changing the scheduler code (and this calculation may affect running workloads for customers), let's go with comment#15, which will resolve appropriately (and show a meaningful error message) Ilan/Daniel, see above: Removing blocker+ and the Regression keyword. Let's fix the scale tests to be aware of the scheduler logic in comment#11 (In reply to Andrej Krejcir from comment #18) > It is not a recent scheduler change. This behavior has been there a long > time. The problem is that all hosts are filtered out, even the one where the > VM is currently running. Ok, cool. We didn't know this when writing the migrate-vm plugin. Makes sense now. > We could change the scheduler code, so that the host where a VM is currently > running is always a valid target for migration. At least if it is not > overloaded. Agree with Ryan, no need. I posted a patch to correctly handle empty object and empty array the same. Thanks! Verified on ovirt-engine-ui-extensions-1.0.4-1.el7ev.noarch This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |