1454939 – Gnocchi with Ceph Storage Driver creates many objects

Bug 1454939 - Gnocchi with Ceph Storage Driver creates many objects

Summary: Gnocchi with Ceph Storage Driver creates many objects

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-gnocchi
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	async
Target Release:	10.0 (Newton)
Assignee:	Pradeep Kilambi
QA Contact:	Sasha Smolyak
Docs Contact:
URL:
Whiteboard:	scale_lab
Duplicates (2):	1454943 1526574 (view as bug list)
Depends On:	1430588 1454941 1454943
Blocks:	1414467
TreeView+	depends on / blocked

Reported:	2017-05-23 20:17 UTC by Pradeep Kilambi
Modified:	2022-08-16 12:59 UTC (History)
CC List:	29 users (show)
Fixed In Version:	openstack-gnocchi-3.0.11-1.el7ost openstack-tripleo-heat-templates-5.2.0-21.el7ost puppet-gnocchi-9.5.0-2.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1430588
Environment:
Last Closed:	2017-07-12 14:07:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack gerrit	464757	None	MERGED	ceph: store measures in omap values	2021-01-20 18:47:21 UTC
OpenStack gerrit	469226	None	MERGED	Expose metric delay processing metric	2021-01-20 18:47:21 UTC
OpenStack gerrit	469230	None	MERGED	Add metric_processing_delay param	2021-01-20 18:47:21 UTC
Red Hat Issue Tracker	OSP-4635	None	None	None	2022-08-16 12:59:32 UTC
Red Hat Knowledge Base (Solution)	2971771	None	None	None	2019-10-30 10:07:44 UTC
Red Hat Knowledge Base (Solution)	3117631	None	None	None	2019-10-30 10:07:27 UTC
Red Hat Product Errata	RHBA-2017:1748	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 10 Bug Fix and Enhancement Advisory	2017-07-12 18:07:30 UTC

Description Pradeep Kilambi 2017-05-23 20:17:53 UTC

+++ This bug was initially created as a clone of Bug #1430588 +++

Description of problem:
Gnocchi Configured with a Ceph Storage Driver will create many objects in the Ceph pool "metrics".  There are two issues with this approach as I understand it from our storage team.

1. Disk space per object - Objects have been found to range between 16 bytes and 20KB in a bimodal data set.  Since Ceph is a filestore which creates a file for every single object that is added thus each object (no matter its size) creates a file and uses inode space as well.  This means each object pays a 2KB inode price so a 16 byte object has 2KB of inode space used as well.  This is compounded when lots of tiny objects are created and then replicated 3 times over meaning a single 16 byte object will pay 6KB of space over the Ceph cluster.

2. Slab memory pressure - More important than the disk space usage, each object can invade precious slab memory spoace on the Ceph nodes.  This is a larger problem in HCi deployments but my assumption there is the metrics pool should not be on the compute nodes.

Version-Release number of selected component (if applicable):
OpenStack Newton (OSP Director 10 deployed Red Hat OpenStack)

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
I will investigate if the 16 byte objects are the Gnocchi "backlog" or if those continously grow as seen in my team member's environment.  (I believe that possibly there was not enough Capacity to handle the number of instances he had in his cloud thus, the backlog was growing without any catching up on the processing occurring.)
Additional bugs to look at/refer to:
https://bugzilla.redhat.com/show_bug.cgi?id=1428888#c9
https://bugzilla.redhat.com/show_bug.cgi?id=1430084

--- Additional comment from Alex Krzos on 2017-03-08 21:26:20 EST ---

> 2. Slab memory pressure - More important than the disk space usage, each
> object can invade precious slab memory spoace on the Ceph nodes.  This is a
> larger problem in HCi deployments but my assumption there is the metrics
> pool should not be on the compute nodes.

I wanted to point out that if the objects stored in the metrics pool evict other inodes from the slab memory then performance will suffer for all the other pools such as vms, images, volumes etc.  

One short term option could be to reduce the replica count from 3 to 2 in Ceph for the metrics pool.  Another suggested long term option would be to use Ceph bluestore.

--- Additional comment from Ben England on 2017-03-09 07:51:32 EST ---

Thanks Alex,

just wanted to add that some Ceph OSDs were failing with a "suicide timeout" and would not stay running, until we deleted the Gnocchi "metrics" pool, at which point Ceph recovered immediately.  So this may be a more serious problem than we thought.  Some additional detail is available in

https://bugzilla.redhat.com/show_bug.cgi?id=1428888
- has suicide timeout behavior in initial post, but diagnosis was not correct
https://bugzilla.redhat.com/show_bug.cgi?id=1428888#c9
- has info about Gnocchi pool usage

The bimodal RADOS object size distribution was observed by looking at any of the /var/lib/ceph/osd/ceph-*/current/1.*/ directory trees on Ceph OSD nodes to find the replica data files, where the "metrics" pool used by Gnocchi was pool 1 in this cluster, found by "ceph osd pool ls detail".  

The derived statistics objects that I saw, like "mean" to be about 20K, but the individual "measure" objects were all 16 bytes and > 99% of the objects were of this type.

What is the upper bound on the number of objects that Gnocchi will create?

There are other ways to use RADOS API - for example, instead of creating separate objects for each sample, it is possible to append to existing objects.  This would allow Gnocchi to make more efficient use of Ceph and avoid the metadata overhead that it is inducing now, see Ioctx.aio_append() and Ioctx.write().  This will lower Ceph latency for Gnocchi as well.

--- Additional comment from Julien Danjou on 2017-03-09 08:05:39 EST ---

The measures objects are measure posted by external systems (in OSP it's Ceilometer). Since most of those measure are not batched but sent one-by-one they consiste of two 8 bytes floats (timestamp + value) which is why they are 16 bytes.

They are not upper bound on this number of objects, but they are processed and deleted by the metricd worker as fast as possible.

Appending could be a nice strategy to implement, I concur. We'll look into that!

--- Additional comment from Alex Krzos on 2017-03-10 13:31:01 EST ---


> Additional info:
> I will investigate if the 16 byte objects are the Gnocchi "backlog" or if
> those continously grow as seen in my team member's environment.  (I believe
> that possibly there was not enough Capacity to handle the number of
> instances he had in his cloud thus, the backlog was growing without any
> catching up on the processing occurring.)


Further investigation so far shows that the Gnocchi backlog is simply objects in the ceph metrics pool. 

I will attach a screen shot of grafana showing the correlation of gnocchi backlog to number of ceph objects in a cloud of static size over several hours.

--- Additional comment from Alex Krzos on 2017-03-10 13:43 EST ---

Attached is a screenshot that compares Gnocchi Backlog of Measures and Metrics vs the # of Ceph Objects along with the number of Instances booted in the cloud.

You can see shortly after all the instances are booted (500), that the backlog correlated with the number of Ceph Objects.  Next we need to analyze the size of the objects in this environment during the time period that the Object count is "stable" and does not include the backlog.


Your probably asking though: What does this mean?

Basically if your Gnocchi Capacity is less than the number of resources (instances, nics, disks, volumes or anything else that is sampled in the Ceilometer pipeline) You can expect to have a continuous stream of small objects put into the Ceph cluster and for Ceph performance to suffer overall.

--- Additional comment from Ben England on 2017-03-10 14:56:01 EST ---

Another factor: Tim's cluster, where this problem occurred, was running high-intensity Ceph workloads with fio for many hours at a time, this may have interfered with Gnocchi's creation of ceph objects, causing the backlog.  However, my point would be that this may not have been as much an issue if Gnocchi wasn't creating 16 byte objects and was just appending to existing objects.

Another possible factor: Alex's cluster does not run Cinder volumes, whereas Tim's cluster had 500 of them, not sure how much additional work Gnocchi has to do for Cinder.

My suspicion is that Gnocchi stays stable when there isn't fio running, and if problem occurs when both fio and gnocchi are running.  If so, then this is still an issue, because people don't buy storage hardware to have it sit idle.

--- Additional comment from Ben England on 2017-03-21 07:38:18 EDT ---

So 2 out of 2 OSP 10 deployment I've seen where OSPd deploys Ceph has this problem.  This is because OSP director deploys Ceph, it puts the OpenStack Gnocchi service's "metrics" storage pool on Ceph.  It happened again in CNCF deployment, exactly the same symptoms:

- multiple Ceph OSDs stop with "suicide timeout" in logs
- "metrics" Ceph storage pool had ~20 million objects in it
- after dropping the metrics pool, the ceph OSDs can be successfully restarted
- it appears that there are way more RADOS objects in pool than normal, as if Gnocchi is not keeping up with the processing of these 16-byte objects
- it doesn't happen right away, so automated tests might have missed this.

What's different:
- configuration: CNCF had only 90 OSDs across 9 Ceph storage nodes
- workload: there was no intense fio application workload

We had to disable gnocchi to avoid use of this storage pool (how did we do it Alex?).  Then I dropped the "metrics" pool.  Then later in the day I simply restarted the OSDs that had been down to bring them back to life.  

The Ceph cluster has been stable ever since and is extremely efficient for its intended use of creating 2000 Nova guests backed by a single glance image.

Will add the ceph logs that show the suicide timeouts as an attachment.

Am going to try to reproduce the Ceph portion of the problem on a separate cluster using rados bench with 16-byte objects.  I've seen slowdowns before with ceph filestore and very small RADOS objects (e.g. 6 KB), but never have seen OSDs start to drop like this.  Ceph shouldn't die because of this workload.  

Even the Ceph problem is fixed, 16-byte objects are still not a good idea for reasons discussed above.  We want to minimize overhead of resource monitoring subsystems like Gnocchi.

--- Additional comment from Ben England on 2017-03-21 07:40 EDT ---

this shows the suicide timeout stack trace that the 3 OSDs generate.

--- Additional comment from Julien Danjou on 2017-03-21 08:24:45 EDT ---

20 million is a huge backlog. Are you sure gnocchi-metricd processes are running and processing metrics? How many are running?

--- Additional comment from Ben England on 2017-03-21 18:01:44 EDT ---

re .-1: we'll try to find out for you,  Alex Krzos would know. 

I'm trying to reproduce this in a ceph cluster by doing this script:

root@ip-172-31-56-131 to-installer]# more rbench.sh 
#!/bin/bash -x
ansible -i ~/to-installer/internal-hosts.list -m shell \
-a "rados bench --run-name \`hostname -s\`.`date '+%H-%M-%S'` -b $1 -o $1 -p rados-bench $2 write --no-cleanup" \
all

With a command like this to have it create 16-byte objects for 3000 seconds:

# ./rbench.sh 16 3000

It's starting to slow down but I haven't seen an OSD drop yet.

--- Additional comment from Ben England on 2017-03-28 17:12:16 EDT ---

Alex thinks this explains the problem.
-------------
---------- Forwarded message ----------
From: Mike Lowe <jomlowe>
Date: Tue, Mar 28, 2017 at 3:55 PM
Subject: Re: [Openstack-operators] scaling gnocchi metricd
To: Ionut Biru - Fleio <ionut>
Cc: "openstack-operators.org"
<openstack-operators.org>


I recently got into trouble with a large backlog. What I found was at
some point the backlog got too large for gnocchi to effectivly
function.  When using ceph list of metric objects is kept in a omap
object which normally is a quick and efficient way to store this list.
However, at some point the list grows too large for it to be managed
by the leveldb which implements the omap k/v store.  I finally moved
to some ssd’s to get enough iops for leveldb/omap to function.  What
I’m guessing is that if you are using ceph the increased number of
metrics grabbed per pass reduced the number of times a now expensive
operation is performed.  Indications are that the new bluestore should
make omap scale much better but isn’t slated to go stable for a few
months with the release of Luminous.

-----------------------

So this explanation makes some sense, but then the question is: why is list being stored in an omap and how big can it get?   RADOS has no notion of a directory in an object name, which is a flat namespace.  Can RADOS efficiently lookup all objects with a specified prefix string?  I still think if individual samples were appended to a RADOS object instead of being stored in separate ones, as suggested in comment 2, at least a good chunk of this problem would go away since there would be fewer RADOS objects to manage, so the omap list size would go down.  But this might just put off the day of reckoning.

--- Additional comment from Ben England on 2017-03-28 22:03:01 EDT ---

BTW, I tried using Ceph's "rados bench" command to create 7 million 16-byte objects on a 6-OSD EC2 cluster, it never broke (though it does slow down and stall a bit).  So I don't think it's just the 16-byte objects that cause this problem.

--- Additional comment from Alex Krzos on 2017-03-28 23:28:24 EDT ---

(In reply to Ben England from comment #11)
> Alex thinks this explains the problem.
> -------------
> ---------- Forwarded message ----------
> From: Mike Lowe <jomlowe>
> Date: Tue, Mar 28, 2017 at 3:55 PM
> Subject: Re: [Openstack-operators] scaling gnocchi metricd
> To: Ionut Biru - Fleio <ionut>
> Cc: "openstack-operators.org"
> <openstack-operators.org>
> 
> 
> I recently got into trouble with a large backlog. What I found was at
> some point the backlog got too large for gnocchi to effectivly
> function.  When using ceph list of metric objects is kept in a omap
> object which normally is a quick and efficient way to store this list.
> However, at some point the list grows too large for it to be managed
> by the leveldb which implements the omap k/v store.  I finally moved
> to some ssd’s to get enough iops for leveldb/omap to function.  What
> I’m guessing is that if you are using ceph the increased number of
> metrics grabbed per pass reduced the number of times a now expensive
> operation is performed.  Indications are that the new bluestore should
> make omap scale much better but isn’t slated to go stable for a few
> months with the release of Luminous.
> 
> -----------------------
> 
> So this explanation makes some sense, but then the question is: why is list
> being stored in an omap and how big can it get?   RADOS has no notion of a
> directory in an object name, which is a flat namespace.  Can RADOS
> efficiently lookup all objects with a specified prefix string?  I still
> think if individual samples were appended to a RADOS object instead of being
> stored in separate ones, as suggested in comment 2, at least a good chunk of
> this problem would go away since there would be fewer RADOS objects to
> manage, so the omap list size would go down.  But this might just put off
> the day of reckoning.

Yes I forwarded you that email earlier.

--- Additional comment from Julien Danjou on 2017-03-29 04:31:00 EDT ---

(In reply to Ben England from comment #11)
> So this explanation makes some sense, but then the question is: why is list
> being stored in an omap and how big can it get?   RADOS has no notion of a
> directory in an object name, which is a flat namespace.  Can RADOS
> efficiently lookup all objects with a specified prefix string?  I still
> think if individual samples were appended to a RADOS object instead of being
> stored in separate ones, as suggested in comment 2, at least a good chunk of
> this problem would go away since there would be fewer RADOS objects to
> manage, so the omap list size would go down.  But this might just put off
> the day of reckoning.

Listing objects in Ceph is extremely slow, and we managed to "break" Ceph while trying to list 20k objects where it takes 2 min to do it. That's why we switched to using OMAP as a "hack".

If the main problem is the OMAP listing, this could be fixed pretty easily in the next Gnocchi version where we split this OMAP index on several thousands of objects.

If the main problem is a large number of small objects, this is tougher. Appending to a file is obviously a good solution. But unfortunately there's no way to implement the entire workflow of creating measures and processing them in an atomic way. You can see my current attempt at improving that here https://review.openstack.org/#/c/450783/

--- Additional comment from Ben England on 2017-04-03 09:48:31 EDT ---

In my experience with Gnocchi, the "measure" object contains an omap with almost all of the objects in the pool in it, at least when the problem is occurring.  So there is not a lot of informational value in the omap as it exists today.  Perhaps when you split the omap index then things will be different.

Why is atomicity so important?  If you lose a measure, how far off will your results be?  Couldn't this happen if a node or service was down, for example?  

There is a rados_write_op_t data type that lets you specify a set of operations to be performed atomically, which in theory should allow you to do a sequence of writes, and then update an omap, as an atomic transaction, if I understand it correctly.  I'm looking at librados "C" API and am not sure how much of this if any applies to the python API.

Am going to post a rados omap test program that can perhaps simulate some of the behavior of Gnocchi+Ceph.

--- Additional comment from Ben England on 2017-04-03 13:04:38 EDT ---

Can Gnocchi batch updates to key-value pairs in omap?

rados-omap.c test program is available from here for now:

https://raw.githubusercontent.com/bengland2/rados_object_perf/master/rados-omap.c

I ran some tests with it, am digesting numbers now. write throughput seems to be ~300 keys/sec in this test with 0-length values:

[root@ip-172-31-56-131 ~]# rados rm -p ben hw && ./rados-omap --kvpairs-per-call 1 --total-kvpairs 100 --value-size 0
          1 : key-value pairs per call
        100 : total key-value pairs
          0 : value size in bytes
      write : operation type
elapsed time = 0.380065923 sec
[root@ip-172-31-56-131 ~]# rados rm -p ben hw && ./rados-omap --kvpairs-per-call 1 --total-kvpairs 1000 --value-size 0
          1 : key-value pairs per call
       1000 : total key-value pairs
          0 : value size in bytes
      write : operation type
elapsed time = 3.729845786 sec

I did notice that if you batch changes to key-value pairs, it does get a lot faster, meaning that the performance cost is per batch not per key-value pair.

[root@ip-172-31-56-131 ~]# rados rm -p ben hw && ./rados-omap --kvpairs-per-call 1 --total-kvpairs 10000 --value-size 0
          1 : key-value pairs per call
      10000 : total key-value pairs
          0 : value size in bytes
      write : operation type
elapsed time = 40.249775283 sec
[root@ip-172-31-56-131 ~]# rados rm -p ben hw && ./rados-omap --kvpairs-per-call 10 --total-kvpairs 10000 --value-size 0
         10 : key-value pairs per call
      10000 : total key-value pairs
          0 : value size in bytes
      write : operation type
elapsed time = 3.886266531 sec
[root@ip-172-31-56-131 ~]# rados rm -p ben hw && ./rados-omap --kvpairs-per-call 100 --total-kvpairs 10000 --value-size 0
        100 : key-value pairs per call
      10000 : total key-value pairs
          0 : value size in bytes
      write : operation type
elapsed time = 0.564232162 sec

The last result is almost 100 times faster than the first, to write the same number of keys.

--- Additional comment from Ben England on 2017-04-04 17:07:52 EDT ---

Jason Dillaman explained why I saw exactly 3 OSDs drop out in a cluster when 20,000,000 objects were created in "metrics" pool (see comment 2) - the omap for the "measure" object is stored in a single PG!  For a size 3 storage pool, that PG involves 3 OSDs.  So it makes sense that those OSDs are going to get nailed when we try to read or write to that object's omap.   

Is there a way to avoid this problem short-term (other than disabling Gnocchi and dropping metrics pool)?  Is there a tuning that will prevent the problem from happening?  My (limited) understanding of it is that the problem is caused because the consumer of measure_* objects is not keeping up with the producer, since the consumer (is it metricd?) is what reads the measure_* objects and then deletes them and removes them from the "measure" object omap.  So if the consumer stops consuming these measure objects for any reason, the size of the omap will grow in an unbounded way until the 3 OSDs storing that omap can't handle it anymore. 

Sebastien Han provided this test program for python RADOS omap access:

https://github.com/bengland2/rados_object_perf/blob/master/ceph-fill-omap.py 

which apparently can deal with omaps in python, which standard python rados module does not seem to do.

--- Additional comment from Julien Danjou on 2017-04-13 07:53:58 EDT ---

(In reply to Ben England from comment #15)
> In my experience with Gnocchi, the "measure" object contains an omap with
> almost all of the objects in the pool in it, at least when the problem is
> occurring.  So there is not a lot of informational value in the omap as it
> exists today.  Perhaps when you split the omap index then things will be
> different.

There is zero value in the OMAP, but listing objects in Ceph is utterly slow from our experience. That's why the OMAP is basically used, it's a workaround that.

> Why is atomicity so important?  If you lose a measure, how far off will your
> results be?  Couldn't this happen if a node or service was down, for
> example?  

No, everything is meant to be safe currently – at least when using Ceph for both incoming measure and archive storage.

> There is a rados_write_op_t data type that lets you specify a set of
> operations to be performed atomically, which in theory should allow you to
> do a sequence of writes, and then update an omap, as an atomic transaction,
> if I understand it correctly.  I'm looking at librados "C" API and am not
> sure how much of this if any applies to the python API.

Right, but that's only true for one write or one read operations. For both, that'd be mean Gnocchi would need to lock the file, which is currently avoided. Writing new measures and processing old ones are lock-free operation, which would not be possible anymore in such a world. It's a trade-off that we need to consider carefully.

> Am going to post a rados omap test program that can perhaps simulate some of
> the behavior of Gnocchi+Ceph.

That'd be great to have more insight on what's going wrong.

--- Additional comment from Julien Danjou on 2017-04-13 08:47:38 EDT ---

(In reply to Ben England from comment #17)
> Jason Dillaman explained why I saw exactly 3 OSDs drop out in a cluster when
> 20,000,000 objects were created in "metrics" pool (see comment 2) - the omap
> for the "measure" object is stored in a single PG!  For a size 3 storage
> pool, that PG involves 3 OSDs.  So it makes sense that those OSDs are going
> to get nailed when we try to read or write to that object's omap.   
> 
> Is there a way to avoid this problem short-term (other than disabling
> Gnocchi and dropping metrics pool)?  Is there a tuning that will prevent the
> problem from happening?  My (limited) understanding of it is that the
> problem is caused because the consumer of measure_* objects is not keeping
> up with the producer, since the consumer (is it metricd?) is what reads the
> measure_* objects and then deletes them and removes them from the "measure"
> object omap.  So if the consumer stops consuming these measure objects for
> any reason, the size of the omap will grow in an unbounded way until the 3
> OSDs storing that omap can't handle it anymore. 

You understood everything right. The best thing is to increase the number of metricd worker and decrease the processing delay. Alex wrote a nice kbase article about that:
https://access.redhat.com/solutions/2971771

> Sebastien Han provided this test program for python RADOS omap access:
> 
> https://github.com/bengland2/rados_object_perf/blob/master/ceph-fill-omap.py 

Yeah I wrote this for Seb (https://gist.github.com/jd/e679e7c43a0d8e54181b257e8f733c97) so we can do some test (but then I left for PTO :-)


The OMAP problem should go away in the next version of Gnocchi and OSP as we're working on splitting the backlog on several hundreds/thousands of OMAPs.

--- Additional comment from Ben England on 2017-04-26 07:45:13 EDT ---

great news From Alex Krzos via e-mail:

I actually characterized the difference in writing to the omap via
batch in Gnocchi vs the threaded model they had.  Here is a quick
Grafana Graph [0] showing the huge difference, you can see when I
implemented a batching method.  (Unselect the Count of requests so you
can see the min/avg/max latencies graphed of POST-ing new data through
Gnocchi API in httpd).  I opened a bug [1] and Julien already put in a
patch [2] to fix for it.  When I get back from training, I'll look at
how Metricd is read and deleting from the measure object since I
suspect that is also an issue I encountered but did not have time to
instrument metricd or adequently understand all of the debug level
messages you can have metricd output.

[0] http://norton.perf.lab.eng.rdu.redhat.com:3000/dashboard/snapshot/DKagyJjCT8DRonxCJkDGWD3LET7pOK09
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1444541
[2] https://review.openstack.org/#/c/459333

--- Additional comment from Julien Danjou on 2017-05-16 08:58:39 EDT ---

Alex, is your batch write patch enough to mark this bug as fixed or do we need to fix anything else at this stage?

--- Additional comment from Alex Krzos on 2017-05-16 09:23:49 EDT ---

(In reply to Julien Danjou from comment #21)
> Alex, is your batch write patch enough to mark this bug as fixed or do we
> need to fix anything else at this stage?

I would like to keep this open as I believe there is further issues that will require a multi-prong approach of patches and testing to reveal if they are solved or at least if the scale limits have thus been moved "sufficiently" (With the term sufficiently being a subjective guess on how many instances/objects we need Gnocchi to be able to store metrics on.)

The issues/actions left as I see it are:

1. Characterize the multiple Ceph objects backlog (Rather than single Ceph object hosting unprocessed backlog) - Both posting new data and processing data out of the backlog
2. Investigate Metricd limits/bottlenecks (We can only keep bumping worker count for so long until we really need to improve throughput of measures into metrics per process/worker would be less costly optimization) - Also the elimination of the scheduler process
3. Storing of measures/metrics in Ceph as small objects - Ceph Bluestore is supposed to help and we need to characterize this behavior and further more understand if the driver needs further implementation to support Ceph Bluestore
4. Using redis as incoming storage driver - I've heard it is fast but don't have any comparison data on same hardware

That is all I can think of right now.

--- Additional comment from Julien Danjou on 2017-05-19 10:35:20 EDT ---

Starting with Gnocchi 4, the new measures will be stored in the OMAP database for faster access. This has been merged: https://review.openstack.org/#/c/464757/

--- Additional comment from Pradeep Kilambi on 2017-05-23 09:01:24 EDT ---

patch merged upstream

Comment 1 Pradeep Kilambi 2017-05-24 12:09:31 UTC

*** Bug 1454943 has been marked as a duplicate of this bug. ***

Comment 2 Pradeep Kilambi 2017-05-24 12:11:14 UTC

https://review.openstack.org/464757 needs to be backported to stable/3.0 branch

Comment 3 Julien Danjou 2017-05-24 13:31:45 UTC

(In reply to Pradeep Kilambi from comment #2)
> https://review.openstack.org/464757 needs to be backported to stable/3.0
> branch

Unfortunately, this absolutely not possible. This is not compatible with 3.x as it changes completely how objects are treated.

Even for an upgrade from 3.x to 4.0 it currently requires a completely empty backlog currently.

I don't really see any way to fix this potential issue on 3.x unfortunately.

Comment 13 Sasha Smolyak 2017-07-09 13:04:38 UTC

Tested with 3.0.11, the number of low and medium archive-policies reduced:

[heat-admin@controller-0 ~]$ gnocchi archive-policy list
+--------+-------------+-------------------------------------------------------------------+------------------------------------------------+
| name   | back_window | definition                                                        | aggregation_methods                            |
+--------+-------------+-------------------------------------------------------------------+------------------------------------------------+
| high   |           0 | - points: 3600, granularity: 0:00:01, timespan: 1:00:00           | std, count, 95pct, min, max, sum, median, mean |
|        |             | - points: 10080, granularity: 0:01:00, timespan: 7 days, 0:00:00  |                                                |
|        |             | - points: 8760, granularity: 1:00:00, timespan: 365 days, 0:00:00 |                                                |
| low    |           0 | - points: 8640, granularity: 0:05:00, timespan: 30 days, 0:00:00  | std, count, 95pct, min, max, sum, median, mean |
| medium |           0 | - points: 10080, granularity: 0:01:00, timespan: 7 days, 0:00:00  | std, count, 95pct, min, max, sum, median, mean |
|        |             | - points: 8760, granularity: 1:00:00, timespan: 365 days, 0:00:00 |                                                |
+--------+-------------+-------------------------------------------------------------------+------------------------------------------------+

Comment 15 errata-xmlrpc 2017-07-12 14:07:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1748

Comment 16 Julien Danjou 2017-12-19 17:04:51 UTC

*** Bug 1526574 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.

agunn
akaris
akrzos
apevec
bengland
cschwede
dwilson
ggillies
hklein
jdanjou
jdillama
jdurgin
jeder
jjoyce
johfulto
jschluet
lhh
mburns
pgrist
pkilambi
rhel-osp-director-maint
rlondhe
smalleni
ssigwald
ssmolyak
thiago
tpetr
twilkins
vumrao