Bug 1263357 - [RFE] REST-API: Bulk API information on RHEV Hosts/VMs/templates/etc
Summary: [RFE] REST-API: Bulk API information on RHEV Hosts/VMs/templates/etc
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: RestAPI
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Juan Hernández
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-15 15:56 UTC by Alex Krzos
Modified: 2019-04-28 14:00 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-07 20:50:44 UTC
oVirt Team: Infra
Embargoed:
ylavi: ovirt-future?
ylavi: planning_ack?
ylavi: devel_ack?
ylavi: testing_ack?


Attachments (Terms of Use)
Network utilization from Cloud Forms perspective while obtaining inventory of RHEVM environments (129.81 KB, image/png)
2015-09-15 16:06 UTC, Alex Krzos
no flags Details

Description Alex Krzos 2015-09-15 15:56:59 UTC
Description of problem:
When Cloud Forms is inventorying a RHEVM provider it must query multiple API collections in order to retrieve a full inventory.  This is fine for small scale environments but as the environments are scaled up the amount of time spent querying for detailed information from the API grows considerably.  If RHEVM offered a bulk request that could provide all information with less requests we could significantly improvement the amount of time spent querying for data.

Version-Release number of selected component (if applicable):
RHEV 3.4
CFME 5.4
ManageIQ Master

How reproducible:
Always

Steps to Reproduce:
1. Create large scale RHEVM environment (I have a 3,000 simulated virtual machine environment to test against)
2. Add provide to cloud forms and observe amount of time spent querying the RHEVM API
3.

Actual results:
11minutes spent obtaining inventory, 3minutes spent afterwards wrapping up refresh on cloud forms
many requests against RHEVM in ssl_access_log/ssl_request_log that are proportional to the number of VMs and other queried inventory components

Expected results:
Far fewer requests to obtain the entire datacenter's detailed inventory
>3000 for 3000VMs

Additional info:
Additionally, there is a spike in cpu usage on the RHEVM machine as CFME/ManageIQ requests
this data.

Comment 1 Alex Krzos 2015-09-15 16:06:16 UTC
Created attachment 1073726 [details]
Network utilization from Cloud Forms perspective while obtaining inventory of RHEVM environments

Attached are three graphs showing the network utilization from a Cloud Forms appliance to 3 separate RHEVM environments.  

RHEVM environment sizes:
Top - 100vms
Middle - 1000vms
Bottom - 3000vms

Note the growth in amount of time spent retrieving inventory as the environment size grows.  A bulk transfer of the information would help alleviate the first portion of growth as environments scale.

Comment 2 Jason Frey 2015-09-15 16:22:23 UTC
So, to understand what we are doing on the CloudForms side:

When collecting inventory, we cannot request detailed information for an entity without creating a new request for that entity.  So, taking Hosts->Vms as an example, when we query for hosts, we can see the VMs and their ids, but cannot get detailed information for each VM in that initial query.  Instead we have to have 1 request per VM, so 1000 VMs means 1000 requests.  Even worse, since we need the disks, snapshots, and nics on the VMs, those are each separate queries as well, turning that into 4000 requests on 1000 VMs.

To make this even doable, we create a pool of threads (1 per CPU, so an 8 CPU machine would have a pool of 8 threads), and farm out the requests to the pool, to parallelize the requests.  If we don't parallelize it takes forever to collect all the inventory.

Here is the inventory collection code, if you want an idea of how it works: https://github.com/ManageIQ/ovirt/blob/master/lib/ovirt/inventory.rb

Comment 7 Yaniv Kaul 2017-06-07 18:34:01 UTC
Juan, while I don't think we've implemented this, I believe the issue is far less problematic with pipelining, multiplexing, compression and other improvements?

Can we close this for the time being?

Comment 8 Juan Hernández 2017-06-07 20:50:44 UTC
Yes, with those improvements the inventory collection will be much faster. See the following example script:

  https://github.com/oVirt/ovirt-engine-sdk-ruby/blob/master/sdk/examples/asynchronous_inventory.rb

In an environment with approx 4000 virtual machines, 10000 disks and 150ms of latency that completes in less than 3 minutes, without reducing the number of requests or using multiple threads or workers. Note however that these changes haven't been applied to ManageIQ yet.

We are also currently working in what we call "link following", which is a mechanism to retrieve related objects with a single request. We will, for example, be able to retrieve a virtual machine with its disks and NICs in only one request. That should improve further the inventory collecting time.

So, I am closing the bug, as the work is tracked in other bugs.


Note You need to log in before you can comment on or make changes to this bug.