Bug 1252486 - Ceilometer taking over 20 minutes to return queries
Ceilometer taking over 20 minutes to return queries
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer (Show other bugs)
5.0 (RHEL 7)
x86_64 Linux
unspecified Severity high
: ---
: 8.0 (Liberty)
Assigned To: Pradeep Kilambi
Yurii Prokulevych
Depends On:
  Show dependency treegraph
Reported: 2015-08-11 10:25 EDT by Chris Henderson
Modified: 2016-01-04 12:31 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-01-04 12:31:34 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Chris Henderson 2015-08-11 10:25:08 EDT
Customer environment:
Cisco UCS B200 blades for controllers and computes
Cisco UCS C240 rack servers for Ceph storage
RHEL 7.1
RHEL-OSP5 A4 (openstack-ceilometer-api-2014.1.4-1.el7ost.noarch)

ceilometer commands (see below) get bogged down fairly quickly (only after about 1-2 days of gathering metering samples).
The client's DEV openstack environment is quite active with ~1000 instances across 30 nova compute nodes.
The slowness just gets worse as we approach the TTL expiration (set to 5 days).  The targeted queries (-q resource=<UUID>) seem to work ok at first, but also get sluggish as more samples are gathered.
We've checked on disk IO (behind /var/lib/mongodb), but it does not look to be the source of the problem.  We used Ceph rbd initially, then went back to using local disk (300GB SAS RAID1).  But the slowness has persisted.

# time ceilometer resource-list|wc -l

real        0m12.273s
user        0m2.075s
sys        0m0.194s

# time ceilometer meter-list|wc -l

real        0m25.125s
user        0m15.027s
sys        0m0.398s

# time ceilometer sample-list -m volume|wc -l

real        0m25.073s
user        0m0.979s
sys        0m0.096s

# time ceilometer sample-list -m image|wc -l

real        1m7.497s
user        0m13.377s
sys        0m0.706s

# time ceilometer sample-list -m instance|wc -l
Error communicating with timed out

real        10m0.583s
user        0m0.271s
sys        0m0.056s

# from ceilometer db
> db.meter.find().count()

NOTE:  Open ended queries on 'instance' meter consistently hang and eventually time out after 10min. During these queries, the ceilometer-api and mongod processes spike on CPU.
Also, it seems to be an issue only in the customer's DEV environment where the activity is high.  Other environments (PROD, TEST) have much less activity and does not exhibit this slowness.

Note You need to log in before you can comment on or make changes to this bug.