Hide Forgot
Description of problem: Scale testing with Ceilometer Agent-Notification publishing directly to Gnocchi has revealing that spawning threads to create measure objects and adding a key to the omap object "measure" for the Gnocchi backlog causes performance issues. The current code for Gnocchi 3.1 creates N number of Threads (Based on core count) [0] to create the ceph backlog measures but all write a key to a single object. When measuring Gnocchi API Request times (Apache Log %D) we can see POST taking >1min(And much greater at large scale and eventually greater than the httpd timeout (120s). Http gateway timeout Error messages in Ceilometer Notification Agent logs are a good indicator that this problem is occurring. By batching all new measure objects and the adding of the key to the omap object we reduce the request time from >1min-30s or greater to ~2-5s on the same hardware. Example patch [1] reduces overall time that is required to move data with Ceilometer Notification-Agent into Gnocchi Ceph Storage for processing. Version-Release number of selected component (if applicable): Ocata Beta (OSP11) Build 2017-04-06.4 openstack-gnocchi-api-3.1.2-3.el7ost.noarch openstack-gnocchi-indexer-sqlalchemy-3.1.2-3.el7ost.noarch python-gnocchiclient-3.1.0-1.el7ost.noarch openstack-gnocchi-common-3.1.2-3.el7ost.noarch openstack-gnocchi-metricd-3.1.2-3.el7ost.noarch puppet-gnocchi-10.3.0-2.el7ost.noarch python-gnocchi-3.1.2-3.el7ost.noarch openstack-gnocchi-statsd-3.1.2-3.el7ost.noarch How reproducible: With a large enough scale you will see this issue. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1430588 [0] https://github.com/openstack/gnocchi/blob/stable/3.1/gnocchi/rest/__init__.py#L1460-L1463 [1] https://gist.github.com/akrzos/9d841feff51050c12913faf39634383a#file-gistfile1-txt-L1427-L1440
This has been ever been backported to OSP 10, so marking as done.
This is a performance issue that is not verifiable by standard QE process. What Alex described has been solved in Gnocchi 4 because the way the data are written are now batched. The patch has already been backported to OSP 10 and OSP 11 and shipped to customers.
This has been merged in Gnocchi 3.1.5. I realize that the Fixed in Version is wrong here.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462