Description of problem: The mongodb storage driver currently uses an aggregation pipeline over the meter collection in order to construct a list of resources adorned with first & last sample timestamps etc. The problem with this approach is that the mongodb aggregation framework performs sorting in-memory, in this case operating over a potentially very large collection (particularly if the GET /v2/resources was not constrained with query params, e.g. to limit to a single tenant for example). It turns out the mongodb innards are hardcoded to abort any sorts in an aggregation pipeline that will consume more than 10% of physical memory. Version-Release number of selected component (if applicable): mongodb-server-2.4.6-1.el6.x86_64 openstack-ceilometer-api-2013.2-4.el6ost.noarch openstack-ceilometer-central-2013.2-4.el6ost.noarch openstack-ceilometer-collector-2013.2-4.el6ost.noarch openstack-ceilometer-common-2013.2-4.el6ost.noarch How reproducible: 100% if the meter collection is sufficiently large. Steps to Reproduce: 1. Allow meter collection to grow to at least X elements (actual value of X to be filled in by gilles, who has observed this issue in production with the new internal lab). Note that the meter collection size can be retrieved via: $ mongo ceilometer > db.meter.count() 2. Attempt to list resources with an unconstrained query: $ ceilometer resource-list Actual results: The resource listing fails: $ ceilometer resource-list WARNING (http:172) Request returned failure status. HTTPInternalServerError (HTTP 500) with an error similar to the following observed in the API logfile /var/log/ceilometer/api.log: 2013-12-17 03:56:57.516 21917 ERROR wsme.api [-] Server-side error: "command SON([('aggregate', u'meter'), ('pipeline', [{'$match': {}}, {'$sort': {'timestamp': -1, 'project_id': -1, 'user_id': -1}}, {'$group': {'meters_unit': {'$push': '$counter_unit'}, 'source': {'$first': '$source'}, 'project_id': {'$first': '$project_id'}, 'user_id': {'$first': '$user_id'}, 'last_sample_timestamp': {'$max': '$timestamp'}, 'meters_name': {'$push': '$counter_name'}, 'first_sample_timestamp': {'$min': '$timestamp'}, 'meters_type': {'$push': '$counter_type'}, '_id': '$resource_id', 'metadata': {'$first': '$resource_metadata'}}}])]) failed: exception: terminating request: request heap use exceeded 10% of physical RAM". Detail:<TRUNCATED> Expected results: The resource list should display all known resources. Additional info: The issue can worked around by partitioning the resource query per-tenant, e.g.: for project in $(keystone tenant-list | awk '/ True / {print $2}') do ceilometer resource-list -q project=$project | grep -vE '(\+-|Resource ID)' done
Fix part 1 proposed on master upstream: https://review.openstack.org/65671 and duely landed: https://github.com/openstack/ceilometer/commit/7c4c0e8f Backport proposed on stable/havana upstream: https://review.openstack.org/65947
Fix part 1 proposed on master upstream: https://review.openstack.org/65962
Typo in Comment 2 above: s/part 1/part 2/
Internal backports for both fixes: https://code.engineering.redhat.com/gerrit/18265 https://code.engineering.redhat.com/gerrit/18266
Internal backports have landed.
Fix part 1 landed on stable/havana upstream: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=ef71dc6a11 Fix part 2 landed on master upstream: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=ba6641af
Backport of fix part 2 proposed to stable/havana upstream: https://review.openstack.org/66861
Verified 1) Create a meter (dummy) and resource (resdummy) 2) Add 15 samples to meter dummy mongo ceilometer db.resource.find({ "meter.counter_name": "dummy", "_id" : "resdummy"}).count() 1 db.meter.find({"counter_name":"dummy"}).count() 15 Examine the resource document and verify it does not contain an entry for each data sample. db.resource.find({ "meter.counter_name": "dummy", "_id" : "resdummy"}) db.resource.find({"meter.counter_name": "dummy", "_id" : "resdummy"}) { "_id" : "res45000", "metadata" : { }, "meter" : [ { "counter_name" : "dummy", "counter_unit" : "something", "counter_type" : "cumulative" } ], "project_id" : "e97a90c759f64dfaadf319cf08cb1ab2", "source" : "e97a90c759f64dfaadf319cf08cb1ab2:openstack", "user_id" : "e1aa40339c9d45a582b4a13640ae3eab" } 3) create 25,000 resources ceilometer resource-list <works as expected>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2014-0046.html
*** Bug 1065420 has been marked as a duplicate of this bug. ***