Red Hat Bugzilla – Bug 1252486
Ceilometer taking over 20 minutes to return queries
Last modified: 2016-01-04 12:31:34 EST
Cisco UCS B200 blades for controllers and computes
Cisco UCS C240 rack servers for Ceph storage
RHEL-OSP5 A4 (openstack-ceilometer-api-2014.1.4-1.el7ost.noarch)
RH CEPH 1.3
ceilometer commands (see below) get bogged down fairly quickly (only after about 1-2 days of gathering metering samples).
The client's DEV openstack environment is quite active with ~1000 instances across 30 nova compute nodes.
The slowness just gets worse as we approach the TTL expiration (set to 5 days). The targeted queries (-q resource=<UUID>) seem to work ok at first, but also get sluggish as more samples are gathered.
We've checked on disk IO (behind /var/lib/mongodb), but it does not look to be the source of the problem. We used Ceph rbd initially, then went back to using local disk (300GB SAS RAID1). But the slowness has persisted.
# time ceilometer resource-list|wc -l
# time ceilometer meter-list|wc -l
# time ceilometer sample-list -m volume|wc -l
# time ceilometer sample-list -m image|wc -l
# time ceilometer sample-list -m instance|wc -l
Error communicating with http://10.63.168.100:8777 timed out
# from ceilometer db
NOTE: Open ended queries on 'instance' meter consistently hang and eventually time out after 10min. During these queries, the ceilometer-api and mongod processes spike on CPU.
Also, it seems to be an issue only in the customer's DEV environment where the activity is high. Other environments (PROD, TEST) have much less activity and does not exhibit this slowness.