Bug 1205788 - Cinder (Icehouse) code poorly handles lookups of a large number of volumes
Summary: Cinder (Icehouse) code poorly handles lookups of a large number of volumes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 5.0 (RHEL 6)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 5.0 (RHEL 6)
Assignee: Gorka Eguileor
QA Contact: lkuchlan
URL:
Whiteboard:
Depends On:
Blocks: 1323406
TreeView+ depends on / blocked
 
Reported: 2015-03-25 15:39 UTC by Aaron Thomas
Modified: 2023-02-22 23:02 UTC (History)
11 users (show)

Fixed In Version: openstack-cinder-2014.1.5-7.el6ost
Doc Type: Bug Fix
Doc Text:
Previously, post processing of volume listings did not use cached values, causing a database query to be executed to retrieve volume information with post processing on each volume. Now, cached values are used, eliminating the need to execute additional database queries when listing volumes.
Clone Of:
: 1323406 (view as bug list)
Environment:
Last Closed: 2016-06-01 12:29:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1308255 0 None None None Never
Red Hat Product Errata RHBA-2016:1193 0 normal SHIPPED_LIVE openstack-cinder bug fix advisory 2016-06-01 16:22:27 UTC

Description Aaron Thomas 2015-03-25 15:39:14 UTC
Description of problem:
-----------------------------------------
One of our customer is reporting a huge lookup times when query large amounts of cinder volumes. The customers messaging services appear to experiencing time outs when setting  "osapi_max_limit" set to number over 1000 in cinder.api.common. The customer started running into load balancer timeout limits, they've set server and client sessions globally to 30s timeouts. Listing 1000 volumes (an artificial restriction set by the osapi_max_limit parameter, as the customer defines it) usually returned ok. It would occasionally timeout if it hit they the 30s mark. This problem potentially affects nova and other core services that use osapi_max_timeout. Since Cinder and Nova retain information on deleted resources, the amount of time to list any may increase over time.


How reproducible:
-----------------------------------------
Reproducible 

Customer Reproduction:
-----------------------------------------
1. Create 2000 volumes in OpenStack. for i in {1..2000}; do cinder create --display-name debug-volume_$i 1 ; sleep 2; done

2. Execute keystone tenant-list followed by a cinder quota-usage <tenantid>

3. Then Execute, cinder list | wc -l, the answer will be 1004, even if the volumes shown in quota usage is 1166

Actual results:
-----------------------------------------
answer will be 1004


Expected results:
-----------------------------------------
volumes shown in quota usage is 1166


Additional info:
-----------------------------------------

Comment 5 Sergey Gotliv 2015-03-29 22:31:12 UTC
I guess that "cinder list | wc -l" returns a wrong response - 1004 instead of 1166 due to the issue described here: 

https://bugzilla.redhat.com/show_bug.cgi?id=1157642#c4

I agree that "cinder list" performance probably can be better. In this particular case cinder-api returned a result containing MORE than 1000 volumes in 36 seconds mostly due to volume object serialization. It also creates a lot of unnecessary log printings with INFO severity for each volume in the list, and maybe we can even optimize a query to the database, but in the end of the day we have to return back a reasonable number of volumes. I don't know if it should be a 1000, a default "osapi_max_limit" value, or let user control it via configuration as it today or via --limit (V2) parameter, but returning 100,000 volumes will take longer than 1000 volumes just because cinder-api has to serialize additional 99,000 volumes so in this case its reasonable to increase a cinder-api timeout as well. Btw this timeout has nothing to do with the Cinder but with HA deployment.

Comment 6 Aaron Thomas 2015-04-07 21:13:03 UTC
Hello Sergey, 

The customer case attached to this bug has outlined what you've relayed concerning issues with HA deployments in case comment #c21. Since the issue appears to revolve around HAP timeout would it be advisable to open an additional bugzilla around this issue?

#c21:
-----------------------------------------
RCA:
Cinder (Icehouse) code poorly handles lookups of a large number of volumes. As noted above, listing 1688 volumes takes about 40s. This is a known problem and I have RedHat already looking at patches to fix the problem. Because of this huge lookup time, we started bumping into the load balancer timeout limits. Server and client sessions were set globally to timeout at 30s. Listing 1000 volumes (an artificial restriction set by the osapi_max_limit parameter) usually returned ok. It would occasionally timeout if it hit the 30s mark. This problem potentially affects nova and other core services that use osapi_max_timeout. Since Cinder and Nova retain information on deleted resources, the amount of time to list any may increase over time.

To test this theory, I set the cinder-api timeouts to 120s. I then set osapi_max_limit=1000000. Cinder list now shows the correct number of volumes, in an (questionably) appropriate amount of time.

Resolution:
Increase the global timeouts for HAP to 600s (my recommendation).

Comment 7 Sergey Gotliv 2015-04-12 11:11:47 UTC
Aaron,

Please, explain why customer really needs to retrieve 1M volumes in one request?
Retrieving 1000 volumes is always faster than retrieving 1M, and you can't display all of them without scrolling anyway, so what is the use case?
According, to the bug description he just counting his active volumes, if this is the case we probably have to create another API instead of increasing timeouts.

Comment 8 wdaniel 2015-04-13 16:53:28 UTC
Sergey,

The customer in particular here is using CloudForms to manage their VMs, and while these are rough estimates, this is what they plan to scale out to:

~2000 instances
4x cinder volumes per instance
Volume snapshot of each volume, once a day
Volume snapshots stored for a rolling 30 days

With these kinds of numbers they plan on possibly hitting a quarter million volume snapshots in very little time. With CloudForms keeping inventory and needing it refresh via OpenStack's API, the customer is hoping to increase this performance so CloudForms isn't slowed down.

Comment 9 Dave Maley 2015-04-24 18:52:28 UTC
So is the recommendation here to increase the timeouts as needed based on the number of volumes per request?  Or is there any other guidance from engineering on this issue?  Thanks!

Comment 10 Sergey Gotliv 2015-04-26 08:15:49 UTC
Dave,

From Cinder perspective the recommendation is to use the Cinder V2 API [1] that introduces a pagination instead of the deprecated V1 which doesn't support that feature. Of course the page size should be reasonable and as small as possible to prevent huge responses to be sent from the server to the client.

[1] cinder --os-volume-api-version 2 list --limit 10 --sort created_at:desc

We still continue investigating that issue to make Cinder performance even better and detect possible bottlenecks, but V2 suppose to be a baseline.

Comment 12 Dave Maley 2015-04-26 20:54:24 UTC
Engineering had a call w/ the customer in which they discussed this issue.  The customer now understands that this specific problem should be addressed from the CloudForms side:

"1. Fetching 1M volumes will be always slower than fetching 1k volumes, because you have to process additional 999,000 volumes, no API can fix that, therefore we recommended them to use 1k as a default.

2. Using Cinder's V2 (because it implements a pagination at least for volumes) instead of the deprecated but default V1 should be better."

This bug is being left open to review if further improvements can be made in cinder (V2), however this is not planned for osp5/osp6 so re-targeting back to osp7.

Comment 14 Gorka Eguileor 2015-06-02 14:57:43 UTC
We are looking to include an Efficient polling functionality[1] in Cinder that will have the same functionality as the one from Nova[2], which will allow CloudForms, or anyone, to request only modified volumes since a given time.

This, together with v2 pagination, should allow a fast volume list update by doing a full volume get the first time and from then on doing differentials.


[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1226875
[2]: http://docs.openstack.org/developer/nova/v2/polling_changes-since_parameter.html

Comment 15 Sergey Gotliv 2015-11-12 07:42:16 UTC
Pagination is already implemented in Liberty. Efficient polling functionality mentioned in the comment #14 is targeted for Mitaka.

Comment 16 Gorka Eguileor 2015-12-15 18:30:09 UTC
We can speed up volume listings considerably using caching in wsgi postprocessing, which will remove as many DB queries as entries are returned.

This caching is internal only to the specific request and expires once a reply is issued, so listing behavior is the same.

In my tests listing 1000 volumes would take 16 seconds, and after code is change to use caching, this was reduced to 1.6 seconds.

Listing 10,000 volumes with caching, changing osapi_max_limit to that value, will now only take 7 seconds.

Comment 19 Mike McCune 2016-03-28 22:38:41 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 23 nlevinki 2016-05-16 08:12:36 UTC
We don't have the resources to verify this ticket.
We talked to Eng and they confirm that customer got a hotfix and tested it and it passed.
Due to this info I am moving this ticket to verify

Comment 25 errata-xmlrpc 2016-06-01 12:29:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1193


Note You need to log in before you can comment on or make changes to this bug.