Cloned from launchpad blueprint https://blueprints.launchpad.net/ceilometer/+spec/central-agent-partitioning. Description: Provide a mechanism to allow the central agent to be horizontally scaled out, such that each agent polls a disjoint subset of resources. Specification URL (additional information): None
Here's an approach to verifying central agent parititioning based on tooz, by examining the way swift polling is partitioned by tenant. Ensure the tooz and redis packages are installed and the redis service is running on the controller host: $ sudo yum install -y python-tooz python-redis redis $ sudo service redis restart (note that these packages are not all yet available for RHEL7 via the LPC channel, so for now you could use the Fedora/EPEL & RDO packages to get started) Configure the tooz backend as redis with the following setting in the /etc/ceilometer/ceilometer.conf: [coordination] backend_url = redis://CONTROLLER_HOSTNAME:6379 Accelerate the polling interval, so that we see a steady stream of samples: $ sudo sed -i 's/interval: 600$/interval: 60/' \ /etc/ceilometer/pipeline.yaml Restart the central agent and check the subset of swift tenants that this agent taking as its partition (swift polling is partitioned per-tenant). $ grep 'My subset.*Tenant' /var/log/ceilometer/central.log | head This subset should initially only comprise the stock tenants such as admin and services, assuming a fresh installation. We then proceed to create 10 new tenants, and post a swift container for each: $ for i in {1..10} ; do keystone tenant-create --name swift-tenant-$i \ --description "swift tenant $i" keystone user-create --name swift-user-$i --tenant swift-tenant-$i \ --pass swift-pass-$i keystone user-role-add --user swift-user-$i --role ResellerAdmin \ --tenant swift-tenant-$i swift --os-project-name swift-tenant-$i --os-username swift-user-$i \ --os-password swift-pass-$i post swift-container-$i done Now we check that all these tenants are being taken care of by the single central agent: $ function swift_tenants_in_subset { grep 'My subset' $1 | awk '/swift-tenant/ \ {count=0; for (i = 1; i <= NF; i++) { if ($i ~ "swift-tenant-") { printf("%s ", $i); count++; } if ($i == $NF) printf(" count: %d\n", count); } }' | sed "s/u'/'/g" } $ sleep 60 ; swift_tenants_in_subset /var/log/ceilometer/central.log The count in this case indicates that all 10 new tenants are included in the partition for the single central agent (i.e. a trivial partitioning). We also ensure that the same number of storage.objects.containers has been submitted for each tenant over the past 5 minutes: $ function count_samples_per_tenant { FIVE_MINS_AGO=$(date -u +"%Y-%m-%dT%H:%M:%SZ" -d '-5mins') ceilometer statistics -a count -g project_id -m $1 \ -q "timestamp>=$FIVE_MINS_AGO" } $ count_samples_per_tenant storage.objects.containers Either install & run openstack-ceilometer-central on a second node, or as a shortcut simply launch a second process on the controller: $ sudo /usr/bin/python /usr/bin/ceilometer-agent-central \ --logfile /var/log/ceilometer/central-2.log & Check that an approximately even split of swift tenants is allocated to each central agent: $ sleep 60 ; swift_tenants_in_subset /var/log/ceilometer/central.log $ swift_tenants_in_subset /var/log/ceilometer/central-2.log Ensure that there is no duplication in the samples collected per-tenant: $ count_samples_per_tenant storage.objects.containers Then kill the additional central agent and check that all tenants revert to the original agent: sudo kill $(ps -fe | grep central-2 | awk '{print $2}') sleep 60 ; swift_tenants_in_subset /var/log/ceilometer/central.log
Note that debug logging must be enabled for the testing approach described in comment #2, by ensuring the debug & verbose config options are set in /etc/ceilometer/ceilometer.conf and restarting central agent if necessary.
Please see the testing description in comments to see the related configuration settings to be used to enable the feature.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2015-0149.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days