Bug 986410

Summary: [RFE] Aggregation over multiple alarm states
Product: Red Hat OpenStack Reporter: Eoghan Glynn <eglynn>
Component: openstack-ceilometerAssignee: Eoghan Glynn <eglynn>
Status: CLOSED ERRATA QA Contact: Kevin Whitney <kwhitney>
Severity: low Docs Contact:
Priority: high    
Version: 4.0CC: ajeain, eglynn, jruzicka, mlopes, pbrady, sgordon, sradvan, srevivo
Target Milestone: rcKeywords: FutureFeature, OtherQA
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
URL: https://blueprints.launchpad.net/ceilometer/+spec/alarming-logical-combination
Whiteboard:
Fixed In Version: openstack-ceilometer-2013.2-0.12.rc2.el6ost Doc Type: Enhancement
Doc Text:
Feature: Aggregation of the states of multiple basic alarms into overarching meta-alarms. Reason: Reducing noise from detailed monitoring, also allowing alarm-driven workflows (e.g. Heat autoscaling) to be gated by more complex logical conditions. Result: Combination alarms may be created to combine the states of the under-pinning alarms via logical AND or OR.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-20 00:14:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 988358    
Bug Blocks: 973191, 975499, 1055813    

Description Eoghan Glynn 2013-07-19 17:11:45 UTC
A mechanism to combine the states of multiple basic alarms into overarching meta-alarms could be useful in reducing noise from detailed monitoring.

The aggregation styles supported should be on the basis of both logical combination (and, or, not) and also subset-of (>X%). 

Upstream blueprint: https://blueprints.launchpad.net/ceilometer/+spec/alarming-logical-combination

Comment 3 Eoghan Glynn 2013-09-23 13:29:26 UTC
Merged upstream as an FFE for Havana RC1:

  https://github.com/openstack/ceilometer/commit/d30bf2fa2
  https://github.com/openstack/ceilometer/commit/985f48270

Comment 5 Eoghan Glynn 2013-10-21 16:40:14 UTC
How To Test
===========

0. Install packstack allinone, then spin up an instance in the usual way. 

Ensure the compute agent is gathering metrics at a reasonable cadence (every 60s for example instead of every 10mins as per the default):

  sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml
  sudo service openstack-ceilometer-compute restart

Ensure the ceilometer alarm services are installed and running:

  sudo yum install -y openstack-ceilometer-alarm
  export CEILO_ALARM_SVCS='evaluator notifier'
  for svc in $CEILO_ALARM_SVCS; do sudo service openstack-ceilometer-alarm-$svc start; done


1. Create multiple basic alarms with thresholds sufficiently low that they are guaranteed to go into alarm:

  for i in $(seq 5)
  do
    ceilometer alarm-threshold-create --name basic_cpu_low_threshold_${i} \
     --meter-name cpu_util  --threshold 0.01 --comparison-operator gt  --statistic avg \
     --period 60 --evaluation-periods 1 \
     --alarm-action 'log://' \
     --query resource_id=$INSTANCE_ID
  done


2. Create a meta-alarm combining (with logical AND) the state of these basic alarms:

  ALARM_IDS=
  for a in `ceilometer alarm-list | awk -F\| '/basic_cpu_/ {print $2}'`; do ALARM_IDS="$ALARM_IDS --alarm_ids $a"; done
  ceilometer --debug alarm-combination-create --name combination_cpu_low --description 'combination of high CPU util alarms' --alarm-action 'log://' $ALARM_IDS

Ensure that this combination alarm transitions into the alarm state within one evaluation period (60s by default):

  sleep 60 ; ceilometer alarm-list | grep combination_cpu_low


3. Create another set of basic alarms with thresholds sufficiently high that they are guaranteed not to go into alarm:

  for i in $(seq 5)
  do
    ceilometer alarm-threshold-create --name basic_cpu_high_threshold_${i} \
     --meter-name cpu_util --threshold 99.99 --comparison-operator gt  --statistic max \
     --period 60 --evaluation-periods 1 \
     --alarm-action 'log://' \
     --query resource_id=$INSTANCE_ID
  done


4. Create 2 further meta-alarms combining (with logical AND & OR respectively) the state of these basic alarms:

  ALARM_IDS=
  for a in `ceilometer alarm-list | awk -F\| '/basic_cpu_/ {print $2}'`; do   ALARM_IDS="$ALARM_IDS --alarm_ids $a"; done
  ceilometer --debug alarm-combination-create --name combination_cpu_mixed_and --description 'combination (AND) of mixed CPU util alarms' --alarm-action 'log://' $ALARM_IDS --operator and
  ceilometer --debug alarm-combination-create --name combination_cpu_mixed_or --description 'combination (OR) of mixed CPU util alarms' --alarm-action 'log://' $ALARM_IDS --operator or

Ensure that this combination alarms transition into the alarm and ok state (for the combination_cpu_mixed_or and combination_cpu_mixed_and alarms respectively) within one evaluation period (60s by default):

  sleep 60 ; ceilometer alarm-list | grep combination_cpu_mixed

Comment 10 errata-xmlrpc 2013-12-20 00:14:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html