1430588 – Gnocchi with Ceph Storage Driver creates many objects

Bug 1430588 - Gnocchi with Ceph Storage Driver creates many objects

Summary: Gnocchi with Ceph Storage Driver creates many objects

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-gnocchi
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	Upstream M2
Target Release:	12.0 (Pike)
Assignee:	Julien Danjou
QA Contact:	Sasha Smolyak
Docs Contact:
URL:
Whiteboard:	scale_lab
Depends On:	1457767 1569192
Blocks:	1414467 1454939 1454941 1454943
TreeView+	depends on / blocked

Reported:	2017-03-09 02:18 UTC by Alex Krzos
Modified:	2023-09-07 18:51 UTC (History)
CC List:	36 users (show)
Fixed In Version:	openstack-gnocchi-4.0.0-0.20170612001619.4e17932.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1454939 1454941 (view as bug list)
Environment:
Last Closed:	2018-11-21 10:54:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Gnocchi Backlog vs # of Ceph Objects while 500 Instances are booted and sit static (60.97 KB, image/png) 2017-03-10 18:43 UTC, Alex Krzos	no flags	Details
OSD logs + some info about the ceph configuration (7.42 MB, application/x-gzip) 2017-03-21 11:40 UTC, Ben England	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack gerrit	464757	'None'	MERGED	ceph: store measures in omap values	2021-02-15 11:57:36 UTC
Red Hat Issue Tracker	OSP-4607	None	None	None	2022-03-13 14:45:30 UTC
Red Hat Product Errata	RHEA-2017:3462	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 12.0 Enhancement Advisory	2018-02-16 01:43:25 UTC

Description Alex Krzos 2017-03-09 02:18:29 UTC

Description of problem:
Gnocchi Configured with a Ceph Storage Driver will create many objects in the Ceph pool "metrics". There are two issues with this approach as I understand it from our storage team.

1. Disk space per object - Objects have been found to range between 16 bytes and 20KB in a bimodal data set. Since Ceph is a filestore which creates a file for every single object that is added thus each object (no matter its size) creates a file and uses inode space as well. This means each object pays a 2KB inode price so a 16 byte object has 2KB of inode space used as well. This is compounded when lots of tiny objects are created and then replicated 3 times over meaning a single 16 byte object will pay 6KB of space over the Ceph cluster.

2. Slab memory pressure - More important than the disk space usage, each object can invade precious slab memory spoace on the Ceph nodes. This is a larger problem in HCi deployments but my assumption there is the metrics pool should not be on the compute nodes.

Version-Release number of selected component (if applicable):
OpenStack Newton (OSP Director 10 deployed Red Hat OpenStack)

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:
I will investigate if the 16 byte objects are the Gnocchi "backlog" or if those continously grow as seen in my team member's environment. (I believe that possibly there was not enough Capacity to handle the number of instances he had in his cloud thus, the backlog was growing without any catching up on the processing occurring.)
Additional bugs to look at/refer to:
https://bugzilla.redhat.com/show_bug.cgi?id=1428888#c9
https://bugzilla.redhat.com/show_bug.cgi?id=1430084

Comment 1 Alex Krzos 2017-03-09 02:26:20 UTC

> 2. Slab memory pressure - More important than the disk space usage, each
> object can invade precious slab memory spoace on the Ceph nodes.  This is a
> larger problem in HCi deployments but my assumption there is the metrics
> pool should not be on the compute nodes.

I wanted to point out that if the objects stored in the metrics pool evict other inodes from the slab memory then performance will suffer for all the other pools such as vms, images, volumes etc.  

One short term option could be to reduce the replica count from 3 to 2 in Ceph for the metrics pool.  Another suggested long term option would be to use Ceph bluestore.

Comment 2 Ben England 2017-03-09 12:51:32 UTC

Thanks Alex,

just wanted to add that some Ceph OSDs were failing with a "suicide timeout" and would not stay running, until we deleted the Gnocchi "metrics" pool, at which point Ceph recovered immediately.  So this may be a more serious problem than we thought.  Some additional detail is available in

https://bugzilla.redhat.com/show_bug.cgi?id=1428888
- has suicide timeout behavior in initial post, but diagnosis was not correct
https://bugzilla.redhat.com/show_bug.cgi?id=1428888#c9
- has info about Gnocchi pool usage

The bimodal RADOS object size distribution was observed by looking at any of the /var/lib/ceph/osd/ceph-*/current/1.*/ directory trees on Ceph OSD nodes to find the replica data files, where the "metrics" pool used by Gnocchi was pool 1 in this cluster, found by "ceph osd pool ls detail".  

The derived statistics objects that I saw, like "mean" to be about 20K, but the individual "measure" objects were all 16 bytes and > 99% of the objects were of this type.

What is the upper bound on the number of objects that Gnocchi will create?

There are other ways to use RADOS API - for example, instead of creating separate objects for each sample, it is possible to append to existing objects.  This would allow Gnocchi to make more efficient use of Ceph and avoid the metadata overhead that it is inducing now, see Ioctx.aio_append() and Ioctx.write().  This will lower Ceph latency for Gnocchi as well.

Comment 3 Julien Danjou 2017-03-09 13:05:39 UTC

The measures objects are measure posted by external systems (in OSP it's Ceilometer). Since most of those measure are not batched but sent one-by-one they consiste of two 8 bytes floats (timestamp + value) which is why they are 16 bytes.

They are not upper bound on this number of objects, but they are processed and deleted by the metricd worker as fast as possible.

Appending could be a nice strategy to implement, I concur. We'll look into that!

Comment 4 Alex Krzos 2017-03-10 18:31:01 UTC

> Additional info:
> I will investigate if the 16 byte objects are the Gnocchi "backlog" or if
> those continously grow as seen in my team member's environment.  (I believe
> that possibly there was not enough Capacity to handle the number of
> instances he had in his cloud thus, the backlog was growing without any
> catching up on the processing occurring.)


Further investigation so far shows that the Gnocchi backlog is simply objects in the ceph metrics pool. 

I will attach a screen shot of grafana showing the correlation of gnocchi backlog to number of ceph objects in a cloud of static size over several hours.

Comment 5 Alex Krzos 2017-03-10 18:43:21 UTC

Created attachment 1262041 [details]
Gnocchi Backlog vs # of Ceph Objects while 500 Instances are booted and sit static

Attached is a screenshot that compares Gnocchi Backlog of Measures and Metrics vs the # of Ceph Objects along with the number of Instances booted in the cloud.

You can see shortly after all the instances are booted (500), that the backlog correlated with the number of Ceph Objects.  Next we need to analyze the size of the objects in this environment during the time period that the Object count is "stable" and does not include the backlog.


Your probably asking though: What does this mean?

Basically if your Gnocchi Capacity is less than the number of resources (instances, nics, disks, volumes or anything else that is sampled in the Ceilometer pipeline) You can expect to have a continuous stream of small objects put into the Ceph cluster and for Ceph performance to suffer overall.

Comment 6 Ben England 2017-03-10 19:56:01 UTC

Another factor: Tim's cluster, where this problem occurred, was running high-intensity Ceph workloads with fio for many hours at a time, this may have interfered with Gnocchi's creation of ceph objects, causing the backlog.  However, my point would be that this may not have been as much an issue if Gnocchi wasn't creating 16 byte objects and was just appending to existing objects.

Another possible factor: Alex's cluster does not run Cinder volumes, whereas Tim's cluster had 500 of them, not sure how much additional work Gnocchi has to do for Cinder.

My suspicion is that Gnocchi stays stable when there isn't fio running, and if problem occurs when both fio and gnocchi are running.  If so, then this is still an issue, because people don't buy storage hardware to have it sit idle.

Comment 7 Ben England 2017-03-21 11:38:18 UTC

So 2 out of 2 OSP 10 deployment I've seen where OSPd deploys Ceph has this problem. This is because OSP director deploys Ceph, it puts the OpenStack Gnocchi service's "metrics" storage pool on Ceph. It happened again in CNCF deployment, exactly the same symptoms:

- multiple Ceph OSDs stop with "suicide timeout" in logs
- "metrics" Ceph storage pool had ~20 million objects in it
- after dropping the metrics pool, the ceph OSDs can be successfully restarted
- it appears that there are way more RADOS objects in pool than normal, as if Gnocchi is not keeping up with the processing of these 16-byte objects
- it doesn't happen right away, so automated tests might have missed this.

What's different:
- configuration: CNCF had only 90 OSDs across 9 Ceph storage nodes
- workload: there was no intense fio application workload

We had to disable gnocchi to avoid use of this storage pool (how did we do it Alex?). Then I dropped the "metrics" pool. Then later in the day I simply restarted the OSDs that had been down to bring them back to life.

The Ceph cluster has been stable ever since and is extremely efficient for its intended use of creating 2000 Nova guests backed by a single glance image.

Will add the ceph logs that show the suicide timeouts as an attachment.

Am going to try to reproduce the Ceph portion of the problem on a separate cluster using rados bench with 16-byte objects. I've seen slowdowns before with ceph filestore and very small RADOS objects (e.g. 6 KB), but never have seen OSDs start to drop like this. Ceph shouldn't die because of this workload.

Even the Ceph problem is fixed, 16-byte objects are still not a good idea for reasons discussed above. We want to minimize overhead of resource monitoring subsystems like Gnocchi.

Comment 8 Ben England 2017-03-21 11:40:51 UTC

Created attachment 1265021 [details]
OSD logs + some info about the ceph configuration

this shows the suicide timeout stack trace that the 3 OSDs generate.

Comment 9 Julien Danjou 2017-03-21 12:24:45 UTC

20 million is a huge backlog. Are you sure gnocchi-metricd processes are running and processing metrics? How many are running?

Comment 10 Ben England 2017-03-21 22:01:44 UTC

re .-1: we'll try to find out for you,  Alex Krzos would know. 

I'm trying to reproduce this in a ceph cluster by doing this script:

root@ip-172-31-56-131 to-installer]# more rbench.sh 
#!/bin/bash -x
ansible -i ~/to-installer/internal-hosts.list -m shell \
-a "rados bench --run-name \`hostname -s\`.`date '+%H-%M-%S'` -b $1 -o $1 -p rados-bench $2 write --no-cleanup" \
all

With a command like this to have it create 16-byte objects for 3000 seconds:

# ./rbench.sh 16 3000

It's starting to slow down but I haven't seen an OSD drop yet.

Comment 11 Ben England 2017-03-28 21:12:16 UTC

Alex thinks this explains the problem.
-------------
---------- Forwarded message ----------
From: Mike Lowe <jomlowe>
Date: Tue, Mar 28, 2017 at 3:55 PM
Subject: Re: [Openstack-operators] scaling gnocchi metricd
To: Ionut Biru - Fleio <ionut>
Cc: "openstack-operators.org"
<openstack-operators.org>

I recently got into trouble with a large backlog. What I found was at
some point the backlog got too large for gnocchi to effectivly
function.  When using ceph list of metric objects is kept in a omap
object which normally is a quick and efficient way to store this list.
However, at some point the list grows too large for it to be managed
by the leveldb which implements the omap k/v store.  I finally moved
to some ssd’s to get enough iops for leveldb/omap to function.  What
I’m guessing is that if you are using ceph the increased number of
metrics grabbed per pass reduced the number of times a now expensive
operation is performed.  Indications are that the new bluestore should
make omap scale much better but isn’t slated to go stable for a few
months with the release of Luminous.

-----------------------

So this explanation makes some sense, but then the question is: why is list being stored in an omap and how big can it get?   RADOS has no notion of a directory in an object name, which is a flat namespace.  Can RADOS efficiently lookup all objects with a specified prefix string?  I still think if individual samples were appended to a RADOS object instead of being stored in separate ones, as suggested in comment 2, at least a good chunk of this problem would go away since there would be fewer RADOS objects to manage, so the omap list size would go down.  But this might just put off the day of reckoning.

Comment 12 Ben England 2017-03-29 02:03:01 UTC

BTW, I tried using Ceph's "rados bench" command to create 7 million 16-byte objects on a 6-OSD EC2 cluster, it never broke (though it does slow down and stall a bit).  So I don't think it's just the 16-byte objects that cause this problem.

Comment 13 Alex Krzos 2017-03-29 03:28:24 UTC

(In reply to Ben England from comment #11)
> Alex thinks this explains the problem.
> -------------
> ---------- Forwarded message ----------
> From: Mike Lowe <jomlowe>
> Date: Tue, Mar 28, 2017 at 3:55 PM
> Subject: Re: [Openstack-operators] scaling gnocchi metricd
> To: Ionut Biru - Fleio <ionut>
> Cc: "openstack-operators.org"
> <openstack-operators.org>
> 
> 
> I recently got into trouble with a large backlog. What I found was at
> some point the backlog got too large for gnocchi to effectivly
> function.  When using ceph list of metric objects is kept in a omap
> object which normally is a quick and efficient way to store this list.
> However, at some point the list grows too large for it to be managed
> by the leveldb which implements the omap k/v store.  I finally moved
> to some ssd’s to get enough iops for leveldb/omap to function.  What
> I’m guessing is that if you are using ceph the increased number of
> metrics grabbed per pass reduced the number of times a now expensive
> operation is performed.  Indications are that the new bluestore should
> make omap scale much better but isn’t slated to go stable for a few
> months with the release of Luminous.
> 
> -----------------------
> 
> So this explanation makes some sense, but then the question is: why is list
> being stored in an omap and how big can it get?   RADOS has no notion of a
> directory in an object name, which is a flat namespace.  Can RADOS
> efficiently lookup all objects with a specified prefix string?  I still
> think if individual samples were appended to a RADOS object instead of being
> stored in separate ones, as suggested in comment 2, at least a good chunk of
> this problem would go away since there would be fewer RADOS objects to
> manage, so the omap list size would go down.  But this might just put off
> the day of reckoning.

Yes I forwarded you that email earlier.

Comment 14 Julien Danjou 2017-03-29 08:31:00 UTC

(In reply to Ben England from comment #11)
> So this explanation makes some sense, but then the question is: why is list
> being stored in an omap and how big can it get?   RADOS has no notion of a
> directory in an object name, which is a flat namespace.  Can RADOS
> efficiently lookup all objects with a specified prefix string?  I still
> think if individual samples were appended to a RADOS object instead of being
> stored in separate ones, as suggested in comment 2, at least a good chunk of
> this problem would go away since there would be fewer RADOS objects to
> manage, so the omap list size would go down.  But this might just put off
> the day of reckoning.

Listing objects in Ceph is extremely slow, and we managed to "break" Ceph while trying to list 20k objects where it takes 2 min to do it. That's why we switched to using OMAP as a "hack".

If the main problem is the OMAP listing, this could be fixed pretty easily in the next Gnocchi version where we split this OMAP index on several thousands of objects.

If the main problem is a large number of small objects, this is tougher. Appending to a file is obviously a good solution. But unfortunately there's no way to implement the entire workflow of creating measures and processing them in an atomic way. You can see my current attempt at improving that here https://review.openstack.org/#/c/450783/

Comment 15 Ben England 2017-04-03 13:48:31 UTC

In my experience with Gnocchi, the "measure" object contains an omap with almost all of the objects in the pool in it, at least when the problem is occurring.  So there is not a lot of informational value in the omap as it exists today.  Perhaps when you split the omap index then things will be different.

Why is atomicity so important?  If you lose a measure, how far off will your results be?  Couldn't this happen if a node or service was down, for example?  

There is a rados_write_op_t data type that lets you specify a set of operations to be performed atomically, which in theory should allow you to do a sequence of writes, and then update an omap, as an atomic transaction, if I understand it correctly.  I'm looking at librados "C" API and am not sure how much of this if any applies to the python API.

Am going to post a rados omap test program that can perhaps simulate some of the behavior of Gnocchi+Ceph.

Comment 16 Ben England 2017-04-03 17:04:38 UTC

Can Gnocchi batch updates to key-value pairs in omap?

rados-omap.c test program is available from here for now:

https://raw.githubusercontent.com/bengland2/rados_object_perf/master/rados-omap.c

I ran some tests with it, am digesting numbers now. write throughput seems to be ~300 keys/sec in this test with 0-length values:

[root@ip-172-31-56-131 ~]# rados rm -p ben hw && ./rados-omap --kvpairs-per-call 1 --total-kvpairs 100 --value-size 0
          1 : key-value pairs per call
        100 : total key-value pairs
          0 : value size in bytes
      write : operation type
elapsed time = 0.380065923 sec
[root@ip-172-31-56-131 ~]# rados rm -p ben hw && ./rados-omap --kvpairs-per-call 1 --total-kvpairs 1000 --value-size 0
          1 : key-value pairs per call
       1000 : total key-value pairs
          0 : value size in bytes
      write : operation type
elapsed time = 3.729845786 sec

I did notice that if you batch changes to key-value pairs, it does get a lot faster, meaning that the performance cost is per batch not per key-value pair.

[root@ip-172-31-56-131 ~]# rados rm -p ben hw && ./rados-omap --kvpairs-per-call 1 --total-kvpairs 10000 --value-size 0
          1 : key-value pairs per call
      10000 : total key-value pairs
          0 : value size in bytes
      write : operation type
elapsed time = 40.249775283 sec
[root@ip-172-31-56-131 ~]# rados rm -p ben hw && ./rados-omap --kvpairs-per-call 10 --total-kvpairs 10000 --value-size 0
         10 : key-value pairs per call
      10000 : total key-value pairs
          0 : value size in bytes
      write : operation type
elapsed time = 3.886266531 sec
[root@ip-172-31-56-131 ~]# rados rm -p ben hw && ./rados-omap --kvpairs-per-call 100 --total-kvpairs 10000 --value-size 0
        100 : key-value pairs per call
      10000 : total key-value pairs
          0 : value size in bytes
      write : operation type
elapsed time = 0.564232162 sec

The last result is almost 100 times faster than the first, to write the same number of keys.

Comment 17 Ben England 2017-04-04 21:07:52 UTC

Jason Dillaman explained why I saw exactly 3 OSDs drop out in a cluster when 20,000,000 objects were created in "metrics" pool (see comment 2) - the omap for the "measure" object is stored in a single PG!  For a size 3 storage pool, that PG involves 3 OSDs.  So it makes sense that those OSDs are going to get nailed when we try to read or write to that object's omap.   

Is there a way to avoid this problem short-term (other than disabling Gnocchi and dropping metrics pool)?  Is there a tuning that will prevent the problem from happening?  My (limited) understanding of it is that the problem is caused because the consumer of measure_* objects is not keeping up with the producer, since the consumer (is it metricd?) is what reads the measure_* objects and then deletes them and removes them from the "measure" object omap.  So if the consumer stops consuming these measure objects for any reason, the size of the omap will grow in an unbounded way until the 3 OSDs storing that omap can't handle it anymore. 

Sebastien Han provided this test program for python RADOS omap access:

https://github.com/bengland2/rados_object_perf/blob/master/ceph-fill-omap.py 

which apparently can deal with omaps in python, which standard python rados module does not seem to do.

Comment 18 Julien Danjou 2017-04-13 11:53:58 UTC

(In reply to Ben England from comment #15)
> In my experience with Gnocchi, the "measure" object contains an omap with
> almost all of the objects in the pool in it, at least when the problem is
> occurring.  So there is not a lot of informational value in the omap as it
> exists today.  Perhaps when you split the omap index then things will be
> different.

There is zero value in the OMAP, but listing objects in Ceph is utterly slow from our experience. That's why the OMAP is basically used, it's a workaround that.

> Why is atomicity so important?  If you lose a measure, how far off will your
> results be?  Couldn't this happen if a node or service was down, for
> example?  

No, everything is meant to be safe currently – at least when using Ceph for both incoming measure and archive storage.

> There is a rados_write_op_t data type that lets you specify a set of
> operations to be performed atomically, which in theory should allow you to
> do a sequence of writes, and then update an omap, as an atomic transaction,
> if I understand it correctly.  I'm looking at librados "C" API and am not
> sure how much of this if any applies to the python API.

Right, but that's only true for one write or one read operations. For both, that'd be mean Gnocchi would need to lock the file, which is currently avoided. Writing new measures and processing old ones are lock-free operation, which would not be possible anymore in such a world. It's a trade-off that we need to consider carefully.

> Am going to post a rados omap test program that can perhaps simulate some of
> the behavior of Gnocchi+Ceph.

That'd be great to have more insight on what's going wrong.

Comment 19 Julien Danjou 2017-04-13 12:47:38 UTC

(In reply to Ben England from comment #17)
> Jason Dillaman explained why I saw exactly 3 OSDs drop out in a cluster when
> 20,000,000 objects were created in "metrics" pool (see comment 2) - the omap
> for the "measure" object is stored in a single PG!  For a size 3 storage
> pool, that PG involves 3 OSDs.  So it makes sense that those OSDs are going
> to get nailed when we try to read or write to that object's omap.   
> 
> Is there a way to avoid this problem short-term (other than disabling
> Gnocchi and dropping metrics pool)?  Is there a tuning that will prevent the
> problem from happening?  My (limited) understanding of it is that the
> problem is caused because the consumer of measure_* objects is not keeping
> up with the producer, since the consumer (is it metricd?) is what reads the
> measure_* objects and then deletes them and removes them from the "measure"
> object omap.  So if the consumer stops consuming these measure objects for
> any reason, the size of the omap will grow in an unbounded way until the 3
> OSDs storing that omap can't handle it anymore. 

You understood everything right. The best thing is to increase the number of metricd worker and decrease the processing delay. Alex wrote a nice kbase article about that:
https://access.redhat.com/solutions/2971771

> Sebastien Han provided this test program for python RADOS omap access:
> 
> https://github.com/bengland2/rados_object_perf/blob/master/ceph-fill-omap.py 

Yeah I wrote this for Seb (https://gist.github.com/jd/e679e7c43a0d8e54181b257e8f733c97) so we can do some test (but then I left for PTO :-)


The OMAP problem should go away in the next version of Gnocchi and OSP as we're working on splitting the backlog on several hundreds/thousands of OMAPs.

Comment 20 Ben England 2017-04-26 11:45:13 UTC

great news From Alex Krzos via e-mail:

I actually characterized the difference in writing to the omap via
batch in Gnocchi vs the threaded model they had.  Here is a quick
Grafana Graph [0] showing the huge difference, you can see when I
implemented a batching method.  (Unselect the Count of requests so you
can see the min/avg/max latencies graphed of POST-ing new data through
Gnocchi API in httpd).  I opened a bug [1] and Julien already put in a
patch [2] to fix for it.  When I get back from training, I'll look at
how Metricd is read and deleting from the measure object since I
suspect that is also an issue I encountered but did not have time to
instrument metricd or adequently understand all of the debug level
messages you can have metricd output.

[0] http://norton.perf.lab.eng.rdu.redhat.com:3000/dashboard/snapshot/DKagyJjCT8DRonxCJkDGWD3LET7pOK09
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1444541
[2] https://review.openstack.org/#/c/459333

Comment 21 Julien Danjou 2017-05-16 12:58:39 UTC

Alex, is your batch write patch enough to mark this bug as fixed or do we need to fix anything else at this stage?

Comment 22 Alex Krzos 2017-05-16 13:23:49 UTC

(In reply to Julien Danjou from comment #21)
> Alex, is your batch write patch enough to mark this bug as fixed or do we
> need to fix anything else at this stage?

I would like to keep this open as I believe there is further issues that will require a multi-prong approach of patches and testing to reveal if they are solved or at least if the scale limits have thus been moved "sufficiently" (With the term sufficiently being a subjective guess on how many instances/objects we need Gnocchi to be able to store metrics on.)

The issues/actions left as I see it are:

1. Characterize the multiple Ceph objects backlog (Rather than single Ceph object hosting unprocessed backlog) - Both posting new data and processing data out of the backlog
2. Investigate Metricd limits/bottlenecks (We can only keep bumping worker count for so long until we really need to improve throughput of measures into metrics per process/worker would be less costly optimization) - Also the elimination of the scheduler process
3. Storing of measures/metrics in Ceph as small objects - Ceph Bluestore is supposed to help and we need to characterize this behavior and further more understand if the driver needs further implementation to support Ceph Bluestore
4. Using redis as incoming storage driver - I've heard it is fast but don't have any comparison data on same hardware

That is all I can think of right now.

Comment 23 Julien Danjou 2017-05-19 14:35:20 UTC

Starting with Gnocchi 4, the new measures will be stored in the OMAP database for faster access. This has been merged: https://review.openstack.org/#/c/464757/

Comment 24 Pradeep Kilambi 2017-05-23 13:01:24 UTC

patch merged upstream

Comment 25 Julien Danjou 2017-06-01 09:27:26 UTC

I've reported a bug about that issue as https://bugzilla.redhat.com/show_bug.cgi?id=1457767 in Ceph. I'd love to have some feedback from the Ceph team around that issue.

Comment 34 Andreas Karis 2017-07-18 15:16:43 UTC

Oh, so I can only use a single definition and lower the computational overhead?

Simply changing this doesn't work, but ...

[stack@undercloud-1 ~]$ gnocchi archive-policy update low -d granularity:1h,timespan:3d
Archive policy low does not support change: Cannot add or drop granularities (HTTP 400)

I also cannot delete it, as it's still in use (so https://access.redhat.com/node/2971771 on an already deployed cluster may be difficult to achieve).
~~~
[stack@undercloud-1 ~]$ gnocchi archive-policy delete low
Archive policy low is still in use (HTTP 400)
[stack@undercloud-1 ~]$ gnocchi archive-policy create low_overhead -d granularity:1h,timespan:3d
+---------------------+---------------------------------------------------------------+
| Field               | Value                                                         |
+---------------------+---------------------------------------------------------------+
| aggregation_methods | std, count, 95pct, min, max, sum, median, mean                |
| back_window         | 0                                                             |
| definition          | - points: 72, granularity: 1:00:00, timespan: 3 days, 0:00:00 |
| name                | low_overhead                                                  |
+---------------------+---------------------------------------------------------------+
~~~

Followed by an update of this on all controllers and restart of all ceilometer and gnocchi services?
~~~
/etc/ceilometer/ceilometer.conf:archive_policy=low_overhead
/etc/ceilometer/gnocchi_resources.yaml:    archive_policy: low_overhead
~~~

Comment 35 Julien Danjou 2017-07-18 15:29:32 UTC

(In reply to Andreas Karis from comment #34)
> Oh, so I can only use a single definition and lower the computational
> overhead?

Yes.

> Simply changing this doesn't work, but ...
> 
> [stack@undercloud-1 ~]$ gnocchi archive-policy update low -d
> granularity:1h,timespan:3d
> Archive policy low does not support change: Cannot add or drop granularities
> (HTTP 400)

This is why the KCS talks about removing/re-creating it.

> I also cannot delete it, as it's still in use (so
> https://access.redhat.com/node/2971771 on an already deployed cluster may be
> difficult to achieve).
> ~~~
> [stack@undercloud-1 ~]$ gnocchi archive-policy delete low
> Archive policy low is still in use (HTTP 400)
> [stack@undercloud-1 ~]$ gnocchi archive-policy create low_overhead -d
> granularity:1h,timespan:3d
> +---------------------+------------------------------------------------------
> ---------+
> | Field               | Value                                               
> |
> +---------------------+------------------------------------------------------
> ---------+
> | aggregation_methods | std, count, 95pct, min, max, sum, median, mean      
> |
> | back_window         | 0                                                   
> |
> | definition          | - points: 72, granularity: 1:00:00, timespan: 3
> days, 0:00:00 |
> | name                | low_overhead                                        
> |
> +---------------------+------------------------------------------------------
> ---------+
> ~~~
> 
> Followed by an update of this on all controllers and restart of all
> ceilometer and gnocchi services?
> ~~~
> /etc/ceilometer/ceilometer.conf:archive_policy=low_overhead
> /etc/ceilometer/gnocchi_resources.yaml:    archive_policy: low_overhead
> ~~~

Yes, that'll work and it's a good way to go for the future metrics. It's a good trade-off if you already have deployed your cluster.

Comment 36 Andreas Karis 2017-07-18 15:35:51 UTC

Merci!

Comment 39 Julien Danjou 2017-11-15 14:52:02 UTC

Starting with Gnocchi 4.0, the new incoming measures are not stored in Ceph objects anymore but in 128 (by default) different OMAP databases directly.

Comment 42 Don Weeks 2017-12-04 15:50:41 UTC

Can this a KBD be written for this so that we can fix it in RHOSP 10. This is affecting our ability to gather metrics and is causing issues across our customer base. I don't see a link to a knowledge base article?

Comment 43 Julien Danjou 2017-12-04 16:32:34 UTC

(In reply to Don Weeks from comment #42)
> Can this a KBD be written for this so that we can fix it in RHOSP 10. This
> is affecting our ability to gather metrics and is causing issues across our
> customer base. I don't see a link to a knowledge base article?

Can you be more precise about what you would want in the KBD?
Upgrading to >= OSP10z5 should fix the issue.

Comment 46 errata-xmlrpc 2017-12-13 21:11:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Comment 50 rohit londhe 2018-11-21 02:01:20 UTC

Hello,

Do we have any update here?

Note You need to log in before you can comment on or make changes to this bug.

agunn
akaris
akrzos
apevec
asimonel
assingh
augol
bengland
cpatters
djuran
don.weeks
dwilson
ggillies
gkadam
hklein
jbiao
jdanjou
jdillama
jdurgin
jeder
johfulto
jschluet
lhh
mabaakou
marjones
mschuppe
nchandek
pdhange
pgrist
pkilambi
plancast
rlondhe
smalleni
tpetr
twilkins
vumrao