We need logic to compare observed sample datapoints against alarm thresholds, and for this to be capable of being scaled either narrowly, when hosted by a singleton service, or very widely in order to trigger timely notifications on a very large population of alarms. The threshold evaluator should encapsulate all the logic required to manage the dynamic state of a constrained set of alarms: polling for the required statistics over an appropriate time window, correcting for metric lag, handling sparse metrics, and initiating state transitions & notification when threshold crossing is detected. Upstream blueprint: https://blueprints.launchpad.net/ceilometer/+spec/alarm-distributed-threshold-evaluation
hi Eoghan, will this be part of RHOS 4.0? Currently it doesn't have the "blocks" havana tracker bug
Merged upstream as an FFE for RC1, so was not in the packages based on havana-3 but will be in the packages rebuilt for Havana RC1: https://github.com/openstack/ceilometer/commit/ede2329e
How To Test =========== Similarly to https://bugzilla.redhat.com/986381 0. Install packstack allinone, and also on an additional compute node. Ensure the compute agent is gathering metrics at a reasonable cadence (every 60s for example instead of every 10mins as per the default): sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml sudo service openstack-ceilometer-compute restart 1. Ensure the ceilometer-alarm-evaluator and ceilometer-alarm-notifier services are running on the controller node: sudo yum install -y openstack-ceilometer-alarm sudo openstack-config --set /etc/ceilometer/ceilometer.conf alarm evaluation_service ceilometer.alarm.service.PartitionedAlarmService export CEILO_ALARM_SVCS='evaluator notifier' for svc in $CEILO_ALARM_SVCS; do sudo service openstack-ceilometer-alarm-$svc restart; done 2. Ensure a second ceilometer-alarm-evaluator service is running on the compute node: sudo yum install -y openstack-ceilometer-alarm sudo openstack-config --set /etc/ceilometer/ceilometer.conf alarm evaluation_service ceilometer.alarm.service.PartitionedAlarmService export CEILO_ALARM_SVCS='evaluator' for svc in $CEILO_ALARM_SVCS; do sudo service openstack-ceilometer-alarm-$svc start; done 3. Spin up an instance in the usual way: nova boot --image $IMAGE_ID --flavor 1 test_instance 4. Create multiple alarms with thresholds sufficiently low that they are guaranteed to go into alarm: for i in $(seq 10) do ceilometer alarm-threshold-create --name high_cpu_alarm_${i} --description 'instance running hot' \ --meter-name cpu_util --threshold 0.01 --comparison-operator gt --statistic avg \ --period 60 --evaluation-periods 1 \ --alarm-action 'log://' \ --query resource_id=$INSTANCE_ID done 5. Ensure that the alarms are partitioned over the multiple evaluators: tail -f /var/log/alarm-evaluator.log | grep 'initiating evaluation cycle' On each host, expect approximately half the alarms to be evaluated, i.e. '... initiating evaluation cycle on 5 alarms' 6. Ensure all alarms have transitioned to the 'alarm' state: ceilometer alarm-list 7. Create some more alarms: for i in $(seq 10) do ceilometer alarm-threshold-create --name low_cpu_alarm_${i} --description 'instance running cold' \ --meter-name cpu_util --threshold 99.9 --comparison-operator le --statistic avg \ --period 60 --evaluation-periods 1 \ --alarm-action 'log://' \ --query resource_id=$INSTANCE_ID done and also delete a few alarms: ceilometer delete-alarm -a $ALARM_ID and ensure that the alarm allocation is still roughly even between the evaluation services: tail -f /var/log/alarm-evaluator.log | grep 'initiating evaluation cycle' 8. Shutdown the partitioned ceilometer alarm service on each host: sudo service openstack-ceilometer-alarm-evaluator stop then restart on the controller host *only* with the singleton evaluator: sudo openstack-config --set /etc/ceilometer/ceilometer.conf alarm evaluation_service ceilometer.alarm.service.SingletonAlarmService sudo service openstack-ceilometer-alarm-evaluator start 9. Reset all alarms to the 'ok' state and ensure that they flip back to 'alarm': for a in $(ceilometer alarm-list | grep _cpu_alarm_ | awk -F\| '{print $2}') do ceilometer alarm-update --state ok -a $a done sleep 60 ceilometer alarm-list
Pending the fix for: https://bugzilla.redhat.com/1040404 testing this requires that a less constrained firewall rule is added for the ceilometer-api service: $ INDEX=$(sudo iptables -L | grep -A 20 'INPUT.*policy ACCEPT' | grep -- -- | grep -n ceilometer-api | cut -f1 -d:) $ sudo iptables -I INPUT $INDEX -p tcp --dport 8777 -j ACCEPT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1859.html