1047923 – [RFE] OVIRT-CLI: Automatically use pagination to return all collection elements when using --max -1

Bug 1047923 - [RFE] OVIRT-CLI: Automatically use pagination to return all collection elements when using --max -1

Summary: [RFE] OVIRT-CLI: Automatically use pagination to return all collection elemen...

Keywords:
Status:	CLOSED DUPLICATE of bug 1025320
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine-cli
Sub Component:
Version:	3.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Juan Hernández
QA Contact:	Shai Revivo
Docs Contact:
URL:
Whiteboard:	infra
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-01-02 15:17 UTC by Evgheni Dereveanchin
Modified:	2019-04-28 09:34 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-05-20 15:35:15 UTC
oVirt Team:	Infra
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Evgheni Dereveanchin 2014-01-02 15:17:58 UTC

Description of problem:
when "--max 0" parameter is passed by rhev-m shell, the API displays 0 entries instead of displaying all of them.

Version-Release number of selected component (if applicable):
3.2.5

How reproducible:
always

Steps to Reproduce:
1. # rhevm-shell
2. # connect
3. # list vms --show-all --max 0

Actual results:
0 lines displayed

Expected results:
all existing VMs displayed (no limit)

Comment 2 Juan Hernández 2014-01-28 15:23:27 UTC

In fact this is the expected behavior of the API, the "max" parameter indicates the maximum number of results to return, so 0 means no result. If the user wants all the results then it should use the value -1, both in the API directly or in the shell.

I'm moving the bug to the API component, and closing it. If you think that we should change this behavior of the engine open it again.

Comment 3 Juan Hernández 2014-01-28 15:25:50 UTC

In case it isn't clear, this is the way to request all the VMs:

  # rhevm-shell
  # connect
  # list vms --show-all --max -1

Comment 5 Juan Hernández 2014-02-17 11:43:35 UTC

When the value -1 is given it disables the limit of results imposed by the API. There is still a limit imposed by the backend, and controlled by the configuration parameter SearchResultsLimit. The default value of this parameter is 100, and it controls the max number of results returned both to the GUI and to the API. It can be changed as follows:

engine-config -s SearchResultsLimit=200

However, changing it to a larger value isn't recommended, and in environments with large numbers of objects it can severely impact performance.

If the user still wants to list all the VMs then it can use the "summary" command to find the total number of VMs and then can pass this parameter to the "list vms" command:

[RHEVM shell (connected)]# summary

hosts-active          : 100
hosts-total           : 100
storage_domains-active: 1
storage_domains-total : 1
users-active          : 1
users-total           : 1
vms-active            : 1234
vms-total             : 1234 

[RHEVM shell (connected)]# list vms --max 1234

id         : c46bf225-4e58-48b8-bfac-112351b00619
name       : vm0
...

However, this isn't the best way to enumerate all the VMs, as with a large number it can take a long time for the engine to generate the list, and it can put severe pressure on the server. It is better to use pagination, and in this case the it is better to use the SDK directly, for example, with an script like this:

#!/usr/bin/python

import ovirtsdk.api
import ovirtsdk.xml

# Create the connection to the server:
api = ovirtsdk.api.API(
    url='https://the_server/api',
    username='admin@internal',
    password='the_password',
    ca_file='/etc/pki/ovirt-engine/ca.pem')

# Get the reference to the collection of VMs:
collection = api.vms

# Define how you want to iterate the collection, in this
# case we will retrieve a block of 10 VMs each time:
max=10

# Retrieving the VMs, block by block, stop when there are
# no more VMs:
page=1
while True:
    vms = collection.list(max=max, query="page %s" % page)
    if len(vms) == 0:
        break
    for vm in vms:
        print(vm.name)
    page += 1

We could do this automatically from the CLI, but I'm not in favor, as it will be very easy to overload a large system using an apparently innocent command.

I'm changing the target release to 3.5.0 so that we consider this when reviewing 3.5 features.

Comment 6 Evgheni Dereveanchin 2014-02-17 12:47:20 UTC

Juan, thanks for the insight, however the question remains open: how do we disable the limit? What SearchResultsLimit must be set?

Comment 8 Juan Hernández 2014-02-17 13:07:51 UTC

The limit shouldn't be disabled, it is there to protect the system.

The way to retrieve all the VMs is to use a script and pagination, as described in comment 5.

The problem with increasing the SearchResultsLimit is that it affects the API and also the GUI. Imagine an environment with 1000 VMs. If you increase the value to 1000 then whenever the administrator opens the GUI it will request the 1000 VMs, even if the user is only going to see a few of them, as there isn't space enough in the screen.

Even worse, the GUI refreshes every few seconds (every 5 seconds by default), so a simple GUI user is going to be requesting 1000 VMs every 5 secons. This is what can exhaust the resources of the server.

Even if it affected only the API it would be bad idea to retrieve many VMs with one request. Retrieving 1000 VMs with one request, for example, means that the server has to load those 1000 VMs in memory, generate a large XML document containing those 1000 VMs, send that large document through the network, etc. Same for the client. It is better, in terms of use of resources, to do this using pagination and smaller sets of VMs.

This doesn't mean that the SearchResultsLimit parameter can't be changed, just that it has to be changed with care. Changing it to 1000, for example, is something that I don't recommend.

Comment 9 Evgheni Dereveanchin 2014-04-16 13:26:45 UTC

I agree that a default limit is required, however not being able to disable it is surprising. Let's say I understand the concequences and still want to get an XML with 1000 VMs.

The whole postgress DB size is normally below 200 megabytes on-disk (out of which the biggest table is the audit_log with thousands of rows).

Our recommended system requirements for the engine are 16GB RAM and 4 CPUs:
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.3/html/Installation_Guide/Red_Hat_Enterprise_Virtualization_Manager_Hardware_Requirements.html

Is that not enough to store the list of VMs and generate an XML? Doesn't the engine store all VMs in memory at all times?

I will not mention that doing 10 requests in a row to get the 1000 VMs will be slower for my application. Let's just imagine that between page requests someone adds/removes virtual machines - then the order will shift and I risk getting an inconsistent list.

From comment #5 the workaround is to figure out the number of objects in the collection beforehand, then supply that value as the --max parameter. This was done and works properly. Nothing crashes. This RFE is here to remove the redundant step of calculating the number of items to be displayed, and enabling the API to show everyhting for the user automatically.

Comment 13 Arthur Berezin 2014-05-20 07:37:04 UTC

I agree, default behaviour should remain the same, but we do need to have a flag to override SearchResultsLimit default behaviour in API / SDK / rhevm-shell.

We should add a caveat notice to the documentation mentioning this operation might cause cause high loads under stressed environments.

Comment 14 Juan Hernández 2014-05-20 15:35:15 UTC

This has already been changed in 3.4, in bug 1025320.

*** This bug has been marked as a duplicate of bug 1025320 ***

Note You need to log in before you can comment on or make changes to this bug.