Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1444541 - Slow Performance with Gnocchi Posting new Measures via Ceilometer Notification Agent into Ceph Storage
Slow Performance with Gnocchi Posting new Measures via Ceilometer Notificatio...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-gnocchi (Show other bugs)
11.0 (Ocata)
Unspecified Unspecified
high Severity high
: Upstream M2
: 12.0 (Pike)
Assigned To: Julien Danjou
Sasha Smolyak
scale_lab
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-04-21 23:44 EDT by Alex Krzos
Modified: 2018-02-27 06:59 EST (History)
8 users (show)

See Also:
Fixed In Version: openstack-gnocchi-3.1.6-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-13 16:23:38 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 459333 None None None 2017-04-24 10:38 EDT
Red Hat Product Errata RHEA-2017:3462 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-15 20:43:25 EST

  None (edit)
Description Alex Krzos 2017-04-21 23:44:52 EDT
Description of problem:
Scale testing with Ceilometer Agent-Notification publishing directly to Gnocchi has revealing that spawning threads to create measure objects and adding a key to the omap object "measure" for the Gnocchi backlog causes performance issues.  The current code for Gnocchi 3.1 creates N number of Threads (Based on core count) [0] to create the ceph backlog measures but all write a key to a single object.  When measuring Gnocchi API Request times (Apache Log %D) we can see POST taking >1min(And much greater at large scale and eventually greater than the httpd timeout (120s). Http gateway timeout Error messages in Ceilometer Notification Agent logs are a good indicator that this problem is occurring.

By batching all new measure objects and the adding of the key to the omap object we reduce the request time from >1min-30s or greater to ~2-5s on the same hardware.  Example patch [1] reduces overall time that is required to move data with Ceilometer Notification-Agent into Gnocchi Ceph Storage for processing.



Version-Release number of selected component (if applicable):
Ocata Beta (OSP11)
Build 2017-04-06.4

openstack-gnocchi-api-3.1.2-3.el7ost.noarch
openstack-gnocchi-indexer-sqlalchemy-3.1.2-3.el7ost.noarch
python-gnocchiclient-3.1.0-1.el7ost.noarch
openstack-gnocchi-common-3.1.2-3.el7ost.noarch
openstack-gnocchi-metricd-3.1.2-3.el7ost.noarch
puppet-gnocchi-10.3.0-2.el7ost.noarch
python-gnocchi-3.1.2-3.el7ost.noarch
openstack-gnocchi-statsd-3.1.2-3.el7ost.noarch


How reproducible:
With a large enough scale you will see this issue.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1430588

[0] https://github.com/openstack/gnocchi/blob/stable/3.1/gnocchi/rest/__init__.py#L1460-L1463
[1] https://gist.github.com/akrzos/9d841feff51050c12913faf39634383a#file-gistfile1-txt-L1427-L1440
Comment 1 Julien Danjou 2017-06-22 14:33:48 EDT
This has been ever been backported to OSP 10, so marking as done.
Comment 4 Julien Danjou 2017-11-15 09:50:16 EST
This is a performance issue that is not verifiable by standard QE process. What Alex described has been solved in Gnocchi 4 because the way the data are written are now batched. The patch has already been backported to OSP 10 and OSP 11 and shipped to customers.
Comment 8 Julien Danjou 2017-12-04 07:49:23 EST
This has been merged in Gnocchi 3.1.5. I realize that the Fixed in Version is wrong here.
Comment 16 errata-xmlrpc 2017-12-13 16:23:38 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.