Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1263357

Summary:

[RFE] REST-API: Bulk API information on RHEV Hosts/VMs/templates/etc

Product:

[oVirt] ovirt-engine

Reporter:

Alex Krzos <akrzos>

Component:

RestAPI

Assignee:

Juan Hernández <juan.hernandez>

Status:

CLOSED WONTFIX

QA Contact:

Pavel Stehlik <pstehlik>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

---

CC:

bazulay, bugs, jfrey, juan.hernandez, lsurette, mbetak, rbalakri, Rhev-m-bugs, srevivo, ykaul

Target Milestone:

---

Keywords:

FutureFeature

Target Release:

---

Flags:

ylavi: ovirt-future?
ylavi: planning_ack?
ylavi: devel_ack?
ylavi: testing_ack?

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Enhancement

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-06-07 20:50:44 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Infra

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Network utilization from Cloud Forms perspective while obtaining inventory of RHEVM environments	none

Description Alex Krzos 2015-09-15 15:56:59 UTC

Description of problem:
When Cloud Forms is inventorying a RHEVM provider it must query multiple API collections in order to retrieve a full inventory.  This is fine for small scale environments but as the environments are scaled up the amount of time spent querying for detailed information from the API grows considerably.  If RHEVM offered a bulk request that could provide all information with less requests we could significantly improvement the amount of time spent querying for data.

Version-Release number of selected component (if applicable):
RHEV 3.4
CFME 5.4
ManageIQ Master

How reproducible:
Always

Steps to Reproduce:
1. Create large scale RHEVM environment (I have a 3,000 simulated virtual machine environment to test against)
2. Add provide to cloud forms and observe amount of time spent querying the RHEVM API
3.

Actual results:
11minutes spent obtaining inventory, 3minutes spent afterwards wrapping up refresh on cloud forms
many requests against RHEVM in ssl_access_log/ssl_request_log that are proportional to the number of VMs and other queried inventory components

Expected results:
Far fewer requests to obtain the entire datacenter's detailed inventory
>3000 for 3000VMs

Additional info:
Additionally, there is a spike in cpu usage on the RHEVM machine as CFME/ManageIQ requests
this data.

Comment 1 Alex Krzos 2015-09-15 16:06:16 UTC

Created attachment 1073726 [details]
Network utilization from Cloud Forms perspective while obtaining inventory of RHEVM environments

Attached are three graphs showing the network utilization from a Cloud Forms appliance to 3 separate RHEVM environments.  

RHEVM environment sizes:
Top - 100vms
Middle - 1000vms
Bottom - 3000vms

Note the growth in amount of time spent retrieving inventory as the environment size grows.  A bulk transfer of the information would help alleviate the first portion of growth as environments scale.

Comment 2 Jason Frey 2015-09-15 16:22:23 UTC

So, to understand what we are doing on the CloudForms side:

When collecting inventory, we cannot request detailed information for an entity without creating a new request for that entity.  So, taking Hosts->Vms as an example, when we query for hosts, we can see the VMs and their ids, but cannot get detailed information for each VM in that initial query.  Instead we have to have 1 request per VM, so 1000 VMs means 1000 requests.  Even worse, since we need the disks, snapshots, and nics on the VMs, those are each separate queries as well, turning that into 4000 requests on 1000 VMs.

To make this even doable, we create a pool of threads (1 per CPU, so an 8 CPU machine would have a pool of 8 threads), and farm out the requests to the pool, to parallelize the requests.  If we don't parallelize it takes forever to collect all the inventory.

Here is the inventory collection code, if you want an idea of how it works: https://github.com/ManageIQ/ovirt/blob/master/lib/ovirt/inventory.rb

Comment 7 Yaniv Kaul 2017-06-07 18:34:01 UTC

Juan, while I don't think we've implemented this, I believe the issue is far less problematic with pipelining, multiplexing, compression and other improvements?

Can we close this for the time being?

Comment 8 Juan Hernández 2017-06-07 20:50:44 UTC

Yes, with those improvements the inventory collection will be much faster. See the following example script:

  https://github.com/oVirt/ovirt-engine-sdk-ruby/blob/master/sdk/examples/asynchronous_inventory.rb

In an environment with approx 4000 virtual machines, 10000 disks and 150ms of latency that completes in less than 3 minutes, without reducing the number of requests or using multiple threads or workers. Note however that these changes haven't been applied to ManageIQ yet.

We are also currently working in what we call "link following", which is a mechanism to retrieve related objects with a single request. We will, for example, be able to retrieve a virtual machine with its disks and NICs in only one request. That should improve further the inventory collecting time.

So, I am closing the bug, as the work is tracked in other bugs.