Cloned from launchpad blueprint https://blueprints.launchpad.net/ceilometer/+spec/hash-based-alarm-partitioning. Description: The assignment of alarms to individual partitioned alarm evaluators could follow the same pattern as the division of resources between scaled-out central agents. The evaluators will each join a tooz group, emit a periodic heartbeat to tooz, and accept callbacks from tooz when other evaluators join or leave the group. Thus the set of evaluators share minimal knowledge, but this is sufficient to guide a hash-based approach to determining whether an individual alarm UUID falls under the responsibility of an individual evaluator. So the current RPC-fanout-based presence reporting and the master/slave division of responsibilities can be dropped. Also the rebalancing logic when a certain threshold of alarm deletion is crossed will no longer be required. Specification URL (additional information): None
Here's an approach to test the alarm partitioning using tooz. Ensure memcached is running on the controller host (it should be in any case for keystone): $ sudo rpm -qa memcached $ sudo service memcached status Configure the tooz backend as memcache with the following setting in the /etc/ceilometer/ceilometer.conf: [coordination] backend_url = memcached://CONTROLLER_HOSTNAME Restart the alarm-evaluator service and ensure that it's evaluating zero alarms initially: $ sudo service openstack-ceilometer-alarm-evaluator restart $ tail -f /var/log/ceilometer/alarm-evaluator.log | grep 'initiating evaluation cycle' ... INFO ceilometer.alarm.service [-] initiating evaluation cycle on 0 alarms Spin up a nova instance in the usual way if not already running, and record the instance ID: $ INSTANCE_ID=`nova list | awk '/ACTIVE/ {print $2}' | head -1` For speed of testing, modify the interval on the cpu_source in /etc/ceilometer/pipeline.yaml from 600s to 60s, then restart the compute agent: $ sudo service openstack-ceilometer-compute restart Create a sequence of alarms based on the CPU utilization of that instance: $ for i in {0..20} do ceilometer alarm-threshold-create --name cpu_high_$i --description 'instance running hot' \ --meter-name cpu_util --threshold 70.0 --comparison-operator gt \ --statistic avg --period 60 --evaluation-periods 1 \ --alarm-action 'log://' --query resource_id=$INSTANCE_ID done Check that all 21 alarms are being evaluated by the single evaluator: $ tail -f /var/log/ceilometer/alarm-evaluator.log | grep 'initiating evaluation cycle' ... INFO ceilometer.alarm.service [-] initiating evaluation cycle on 21 alarms Ensure that the alarm state flips to ok (or alarm if the instance is indeed running hot) within 60s: $ sleep 60 ; ceilometer alarm-list | grep cpu_high Either install & run the openstack-ceilometer-alarm-evaluator on a second node, or as shortcut simply launch a second process on the controller: $ sudo /usr/bin/python /usr/bin/ceilometer-alarm-evaluator --logfile /var/log/ceilometer/alarm-evaluator-2.log & Check that an approximately even split of alarms is allocated to each evaluator: $ tail -f /var/log/ceilometer/alarm-evaluator.log | grep 'initiating evaluation cycle' $ tail -f /var/log/ceilometer/alarm-evaluator-2.log | grep 'initiating evaluation cycle' To ensure every alarm is being actively evaluated, flip all alarms back to 'insufficient data' and then expect them all to revert to the appropriate state (ok or alarm) on the next evaluation cycle. $ for a in `ceilometer alarm-list | awk '/cpu_high_/ {print $2}'` do ceilometer alarm-state-set -a $a --state "insufficient data" done $ sleep 60 ; ceilometer alarm-list | grep cpu_high Kill one of alarm-evaluators and ensure that all alarms are then allocated to the surviving evaluator. Repeat the step above to flip all alarms to 'insufficient data' and check that all are being actively evaluated as they revert to the appropriate state (ok or alarm) on the next evaluation cycle.
One additional setup step for the above, before restarting the alarm-evalator service: Ensure that the alarm.evaluation_service config option is set to the upstream default (and not overridden to "ceilometer.alarm.service.SingletonAlarmService"), i.e. in /etc/ceilometer/ceilometer.conf: [alarm] # ... evaluation_service=default
In addition to the doc notes above, please note that of the configuration settings described above this one could do with some adjustment: [coordination] backend_url = memcached://CONTROLLER_HOSTNAME The preferred backend is now redis and where possible redis+sentinel.
This bug has been closed as a part of the RHEL-OSP 6 general availability release. For details, see https://rhn.redhat.com/errata/rhel7-rhos-6-errata.html