Bug 1065420

Summary: mongdb error using aggregate 16mb size limit
Product: Red Hat OpenStack Reporter: Dave Sullivan <dsulliva>
Component: openstack-ceilometerAssignee: Eoghan Glynn <eglynn>
Status: CLOSED DUPLICATE QA Contact: Shai Revivo <srevivo>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.0CC: dsulliva, eglynn, fpercoco, jruzicka, pbrady, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-17 19:23:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dave Sullivan 2014-02-14 15:59:19 UTC
Description of problem:

Ceilometer seems to fail when clicking on Resource Usage in Horizon Dashboard.   In looking at the logs of Ceilometer API I can see the following traceback:

2014-02-13 06:43:08.178 8904 ERROR wsme.api [-] Server-side error: "command SON([('aggregate', u'meter'), ('pipeline', [{'$match': {u'resource_metadata.OS-EXT-AZ:availability_zone': u'nova'}}, {'$sort': {'timestamp': -1, 'project_id': -1, 'user_id': -1}}, {'$group': {'meters_unit': {'$push': '$counter_unit'}, 'source': {'$first': '$source'}, 'project_id': {'$first': '$project_id'}, 'user_id': {'$first': '$user_id'}, 'last_sample_timestamp': {'$max': '$timestamp'}, 'meters_name': {'$push': '$counter_name'}, 'first_sample_timestamp': {'$min': '$timestamp'}, 'meters_type': {'$push': '$counter_type'}, '_id': '$resource_id', 'metadata': {'$first': '$resource_metadata'}}}])]) failed: exception: aggregation result exceeds maximum document size (16MB)". Detail:
Traceback (most recent call last):

  File "/usr/lib/python2.6/site-packages/wsmeext/pecan.py", line 72, in callfunction
    result = f(self, *args, **kwargs)

  File "/usr/lib/python2.6/site-packages/ceilometer/api/controllers/v2.py", line 965, in get_all
    for r in pecan.request.storage_conn.get_resources(**kwargs)]

  File "/usr/lib/python2.6/site-packages/ceilometer/storage/impl_mongodb.py", line 651, in get_resources
    "meters_unit": {"$push": "$counter_unit"},

  File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 1061, in aggregate
    _use_master=use_master)

  File "/usr/lib64/python2.6/site-packages/pymongo/database.py", line 393, in command
    msg, allowable_errors)

  File "/usr/lib64/python2.6/site-packages/pymongo/helpers.py", line 147, in _check_command_response
    raise OperationFailure(msg % errmsg, code)

OperationFailure: command SON([('aggregate', u'meter'), ('pipeline', [{'$match': {u'resource_metadata.OS-EXT-AZ:availability_zone': u'nova'}}, {'$sort': {'timestamp': -1, 'project_id': -1, 'user_id': -1}}, {'$group': {'meters_unit': {'$push': '$counter_unit'}, 'source': {'$first': '$source'}, 'project_id': {'$first': '$project_id'}, 'user_id': {'$first': '$user_id'}, 'last_sample_timestamp': {'$max': '$timestamp'}, 'meters_name': {'$push': '$counter_name'}, 'first_sample_timestamp': {'$min': '$timestamp'}, 'meters_type': {'$push': '$counter_type'}, '_id': '$resource_id', 'metadata': {'$first': '$resource_metadata'}}}])]) failed: exception: aggregation result exceeds maximum document size (16MB)

The 16MB size limit is from mongodb which for any BSON aggregation queries limits the document size to 16MB.   

Version-Release number of selected component (if applicable):

RHOS 4.0 Havana

See above

known upstream issue where mapReduce replaces aggregrate

And there is already a stable havana backport

https://review.openstack.org/#/c/66861/

Comment 2 Eoghan Glynn 2014-02-16 20:57:19 UTC
This looks like a duplicate of https://bugzilla.redhat.com/1047872 which I have fixed upstream by changing the problematic mongo aggregation usage to a conventional map-reduce.

Due to the urgency, I landed that fix internally for RHOS 4.0.z A1 *prior* to my upstream fixes being landed and backported to stable:

  https://review.openstack.org/#/q/Ibef4a95acada411af385ff75ccb36c5724068b59,n,z

and then recently released upstream in 2013.2.2.

Since RHOS 4.0.z A2 has been rebased onto 2013.2.2, the fix I've landed upstream (identical for intents and purposes to the original internal fix) will be available in RHOS at that point.

Dave - is the customer seeing the issue in a bare 4.0 install?

If so, they'll need to update to A1 immediately, or alternatively wait for A2 and pick up a number of other fixes in the process:

  https://launchpad.net/ceilometer/+milestone/2013.2.2