Bug 1670701

Summary: Could not fetch data needed for VM migrate operation
Product: [oVirt] ovirt-engine Reporter: Sandro Bonazzola <sbonazzo>
Component: ovirt-engine-ui-extensionsAssignee: Greg Sheremeta <gshereme>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Matyáš <pmatyas>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: akrejcir, bugs, dagur, izuckerm, lleistne, lsvaty, michal.skrivanek, mjohnson, pmatyas, rbarry, stirabos
Target Milestone: ovirt-4.3.3Flags: aoconnor: ovirt-4.3+
dagur: testing_ack+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-ui-extensions-1.0.4-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-16 13:58:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sandro Bonazzola 2019-01-30 08:21:55 UTC
Selected hosted engine VM, hit Migrate button.
Got "Could not fetch data needed for VM migrate operation"

browser console shows:

DataProvider.js:35 DataProvider failed to fetch data Error: VmMigrateDataProvider: Failed to fetch target hosts
    at VmMigrateDataProvider.js:70
    at r (vendor.f6c3dc5f.js:46)
    at Generator._invoke (vendor.f6c3dc5f.js:46)
    at Generator.e.(/ovirt-engine/webadmin/anonymous function) [as next] (https://ovirt4.home/ovirt-engine/webadmin/plugin/ui-extensions/js/vendor.f6c3dc5f.js:46:146147)
    at r (plugin.9e692736.js:1886)
    at plugin.9e692736.js:1886



can't find any related info in engine.log and ui.log, I've the environment available so if any specific log can help, let me know.

Comment 1 Ryan Barry 2019-01-31 08:03:20 UTC
Greg, any  ideas?

Comment 2 Greg Sheremeta 2019-01-31 12:02:40 UTC
Probably an issue with api/hosts?migration_target_of

relevant code:


  await engineGet(`api/hosts?migration_target_of=${vms.map(vm => vm.id).join(',')}`)
  const targetHosts = json.host

  if (!Array.isArray(targetHosts)) {
    throw new Error('VmMigrateDataProvider: Failed to fetch target hosts')
  }


waiting on access to the environment

Comment 3 Simone Tiraboschi 2019-03-20 11:19:11 UTC
*** Bug 1690862 has been marked as a duplicate of this bug. ***

Comment 4 Simone Tiraboschi 2019-03-20 11:21:43 UTC
We found another reproducer.

Comment 5 Ryan Barry 2019-03-20 11:22:51 UTC
Do you have access to the environment for the reproducer?

Comment 6 RHEL Program Management 2019-03-20 12:16:24 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 7 Ilan Zuckerman 2019-03-20 12:21:34 UTC
This bug is a regression testing blocker.

We have another reproducer.
The environment credentials are in the private message

Steps to reproduce:
1. login to the web Admin UI and and  try migrating vm "HostedEngine" to host b01-h03-r620

Please specify a workaround if exists for forcing the HE vm to migrate to the target host (just in order to proceed with our testing).

Thank you. scale and performance QE team

Comment 10 Ryan Barry 2019-03-20 15:58:53 UTC
Also happens from the admin portal, with the following in the logs. Taking back to Virt:

2019-03-20 15:50:51,129Z INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-127) [54a31206-41e3-41f3-93e4-565497ab6a30] Candidate host 'host_mixed_1' ('42a138b6-efd6-4a7e-a6b2-d962b279f9ec') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPU' (correlation id: null)
2019-03-20 15:50:51,129Z INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-127) [54a31206-41e3-41f3-93e4-565497ab6a30] Candidate host 'b01-h03-r620.rhev.openstack.engineering.redhat.com' ('c83a8008-cd71-4671-9f10-1325f3364034') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPU' (correlation id: null)

Comment 11 Andrej Krejcir 2019-03-20 16:01:04 UTC
In the environment, the problem is that the VM needs more CPU cores than the hosts have.
The VM needs 16 cores, but the hosts have only 12. The SMT threads are not counted as cores.

A workaround can be to set the 'Count Threads as Cores' option for the cluster. Then the hosts will have 24 cores.

Comment 12 Greg Sheremeta 2019-03-20 16:22:03 UTC
Ok, so it's another instance of api/hosts?migration_target_of  correctly returning an empty list {}

So I think the fix is just updating the error message to something like "no acceptable target hosts found"

That work for you, Ryan?

Comment 13 Ryan Barry 2019-03-20 16:24:44 UTC
Works for me

Comment 14 Ryan Barry 2019-03-20 19:48:57 UTC
I looked through the git log and didn't see why scheduler changes which would affect this.

So the real question now is "is this a change in behavior?" If it is, let's change it back to treating cores as CPUs for this calculation.

If it's not, let's fix the scale tests, remove "Regression" and the blocker flag, and see about giving a more meaningful error message

Comment 15 Greg Sheremeta 2019-03-20 21:12:29 UTC
I think this is basically a bug in ui-extensions, complicated by some unexpected data from engine.

If there are no hosts that a VM can be migrated to, this is supposed to show:
https://imgur.com/a/XL0f4yM  [no available hosts]

If I create a fresh environment with one host, and create one VM, and click migrate, that's the message I get. The VM is on the one host, and so there's nowhere to migrate it to. This is what the API call looks like:
https://imgur.com/a/aq584mp

The ui-extensions plugin JavaScript recognizes that the JSON is valid, filters out the host that the VM is already on, gets an emtpy list, and acts correctly.
  // If all VMs are currently running on the same host (currentHostIds.length === 1),
  // this particular host cannot be used as a migration target to any of the selected
  // VMs (since those VMs are already running on it). Otherwise, don't filter target
  // hosts, since each of them is a potential migration target to each of the VMs.
  return (currentHostIds.length === 1)
    ? targetHosts.filter(host => !currentHostIds.includes(host.id))
    : targetHosts

^ so in an environment with 1 host and 1 vm, that filter runs on the populated JSON, and we end up with a proper empty JavaScript array, targetHosts = []

But it looks like something is happening in the API such that sometimes a completely empty JSON object is returned, not an array.
  if (!Array.isArray(targetHosts)) {
    throw new Error('VmMigrateDataProvider: Failed to fetch target hosts')
  }
and then we see
https://imgur.com/a/6cxiXNL  [could not fetch data]

So there is some condition in the engine that causes the REST API to return
{ }
from /api/hosts when the ui-extensions code expects to always see at the very least
{ host: [ ] }
I don't know exactly what condition this is -- maybe when a VM is already on a host, but that host is saturated and the API doesn't return that same host in the results.

I can replicate the { } by commenting out some code in the API
https://imgur.com/a/gnkllPw
and it does give me the error
https://imgur.com/a/6cxiXNL  [could not fetch data]

So ui-extensions expects the API to always return at least { host: [ ] },   not { }

We can change this in engine, or in ui-extensions.

Thoughts?

Comment 16 Ryan Barry 2019-03-20 21:15:13 UTC
My POV is basically:

* Fix in ui-extensions (since this is probably not the only case in which it happens)
* If it's a behavior change in the scheduler, go back to what it used to be so we don't trigger this "live" in the field

Comment 17 Greg Sheremeta 2019-03-20 21:16:58 UTC
(In reply to Ryan Barry from comment #16)
> My POV is basically:
> 
> * Fix in ui-extensions (since this is probably not the only case in which it
> happens)

Will do.

> * If it's a behavior change in the scheduler, go back to what it used to be
> so we don't trigger this "live" in the field

No idea there.

Comment 18 Andrej Krejcir 2019-03-21 10:07:20 UTC
(In reply to Ryan Barry from comment #16)
> My POV is basically:
> 
> * Fix in ui-extensions (since this is probably not the only case in which it
> happens)
> * If it's a behavior change in the scheduler, go back to what it used to be
> so we don't trigger this "live" in the field

It is not a recent scheduler change. This behavior has been there a long time. The problem is that all hosts are filtered out, even the one where the VM is currently running.

We could change the scheduler code, so that the host where a VM is currently running is always a valid target for migration. At least if it is not overloaded.

Comment 19 Ryan Barry 2019-03-21 10:44:21 UTC
Great that it's not a regression.

Rather than changing the scheduler code (and this calculation may affect running workloads for customers), let's go with comment#15, which will resolve appropriately (and show a meaningful error message)

Ilan/Daniel, see above:

Removing blocker+ and the Regression keyword. Let's fix the scale tests to be aware of the scheduler logic in comment#11

Comment 20 Greg Sheremeta 2019-03-21 10:55:10 UTC
(In reply to Andrej Krejcir from comment #18)
> It is not a recent scheduler change. This behavior has been there a long
> time. The problem is that all hosts are filtered out, even the one where the
> VM is currently running.

Ok, cool. We didn't know this when writing the migrate-vm plugin. Makes sense now.

> We could change the scheduler code, so that the host where a VM is currently
> running is always a valid target for migration. At least if it is not
> overloaded.

Agree with Ryan, no need. I posted a patch to correctly handle empty object and empty array the same.

Thanks!

Comment 21 Petr Matyáš 2019-04-04 08:40:35 UTC
Verified on ovirt-engine-ui-extensions-1.0.4-1.el7ev.noarch

Comment 22 Sandro Bonazzola 2019-04-16 13:58:20 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.