Bug 1444541 - Slow Performance with Gnocchi Posting new Measures via Ceilometer Notification Agent into Ceph Storage
Summary: Slow Performance with Gnocchi Posting new Measures via Ceilometer Notificatio...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-gnocchi
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: Upstream M2
: 12.0 (Pike)
Assignee: Julien Danjou
QA Contact: Sasha Smolyak
URL:
Whiteboard: scale_lab
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-22 03:44 UTC by Alex Krzos
Modified: 2021-03-11 15:09 UTC (History)
8 users (show)

Fixed In Version: openstack-gnocchi-3.1.6-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 21:23:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 459333 0 'None' MERGED storage: introduce add_measures_batch for Ceph 2021-01-26 08:19:43 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Alex Krzos 2017-04-22 03:44:52 UTC
Description of problem:
Scale testing with Ceilometer Agent-Notification publishing directly to Gnocchi has revealing that spawning threads to create measure objects and adding a key to the omap object "measure" for the Gnocchi backlog causes performance issues.  The current code for Gnocchi 3.1 creates N number of Threads (Based on core count) [0] to create the ceph backlog measures but all write a key to a single object.  When measuring Gnocchi API Request times (Apache Log %D) we can see POST taking >1min(And much greater at large scale and eventually greater than the httpd timeout (120s). Http gateway timeout Error messages in Ceilometer Notification Agent logs are a good indicator that this problem is occurring.

By batching all new measure objects and the adding of the key to the omap object we reduce the request time from >1min-30s or greater to ~2-5s on the same hardware.  Example patch [1] reduces overall time that is required to move data with Ceilometer Notification-Agent into Gnocchi Ceph Storage for processing.



Version-Release number of selected component (if applicable):
Ocata Beta (OSP11)
Build 2017-04-06.4

openstack-gnocchi-api-3.1.2-3.el7ost.noarch
openstack-gnocchi-indexer-sqlalchemy-3.1.2-3.el7ost.noarch
python-gnocchiclient-3.1.0-1.el7ost.noarch
openstack-gnocchi-common-3.1.2-3.el7ost.noarch
openstack-gnocchi-metricd-3.1.2-3.el7ost.noarch
puppet-gnocchi-10.3.0-2.el7ost.noarch
python-gnocchi-3.1.2-3.el7ost.noarch
openstack-gnocchi-statsd-3.1.2-3.el7ost.noarch


How reproducible:
With a large enough scale you will see this issue.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1430588

[0] https://github.com/openstack/gnocchi/blob/stable/3.1/gnocchi/rest/__init__.py#L1460-L1463
[1] https://gist.github.com/akrzos/9d841feff51050c12913faf39634383a#file-gistfile1-txt-L1427-L1440

Comment 1 Julien Danjou 2017-06-22 18:33:48 UTC
This has been ever been backported to OSP 10, so marking as done.

Comment 4 Julien Danjou 2017-11-15 14:50:16 UTC
This is a performance issue that is not verifiable by standard QE process. What Alex described has been solved in Gnocchi 4 because the way the data are written are now batched. The patch has already been backported to OSP 10 and OSP 11 and shipped to customers.

Comment 8 Julien Danjou 2017-12-04 12:49:23 UTC
This has been merged in Gnocchi 3.1.5. I realize that the Fixed in Version is wrong here.

Comment 16 errata-xmlrpc 2017-12-13 21:23:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.