Bug 986410 - [RFE] Aggregation over multiple alarm states
Summary: [RFE] Aggregation over multiple alarm states
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer
Version: 4.0
Hardware: Unspecified
OS: Unspecified
high
low
Target Milestone: rc
: 4.0
Assignee: Eoghan Glynn
QA Contact: Kevin Whitney
URL: https://blueprints.launchpad.net/ceil...
Whiteboard:
Depends On: 988358
Blocks: 973191 RHOS40RFE 1055813
TreeView+ depends on / blocked
 
Reported: 2013-07-19 17:11 UTC by Eoghan Glynn
Modified: 2016-04-26 13:25 UTC (History)
8 users (show)

Fixed In Version: openstack-ceilometer-2013.2-0.12.rc2.el6ost
Doc Type: Enhancement
Doc Text:
Feature: Aggregation of the states of multiple basic alarms into overarching meta-alarms. Reason: Reducing noise from detailed monitoring, also allowing alarm-driven workflows (e.g. Heat autoscaling) to be gated by more complex logical conditions. Result: Combination alarms may be created to combine the states of the under-pinning alarms via logical AND or OR.
Clone Of:
Environment:
Last Closed: 2013-12-20 00:14:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 41971 0 None None None Never
OpenStack gerrit 42832 0 None None None Never
OpenStack gerrit 43413 0 None None None Never
OpenStack gerrit 45085 0 None None None Never
Red Hat Product Errata RHEA-2013:1859 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory 2013-12-21 00:01:48 UTC

Description Eoghan Glynn 2013-07-19 17:11:45 UTC
A mechanism to combine the states of multiple basic alarms into overarching meta-alarms could be useful in reducing noise from detailed monitoring.

The aggregation styles supported should be on the basis of both logical combination (and, or, not) and also subset-of (>X%). 

Upstream blueprint: https://blueprints.launchpad.net/ceilometer/+spec/alarming-logical-combination

Comment 3 Eoghan Glynn 2013-09-23 13:29:26 UTC
Merged upstream as an FFE for Havana RC1:

  https://github.com/openstack/ceilometer/commit/d30bf2fa2
  https://github.com/openstack/ceilometer/commit/985f48270

Comment 5 Eoghan Glynn 2013-10-21 16:40:14 UTC
How To Test
===========

0. Install packstack allinone, then spin up an instance in the usual way. 

Ensure the compute agent is gathering metrics at a reasonable cadence (every 60s for example instead of every 10mins as per the default):

  sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml
  sudo service openstack-ceilometer-compute restart

Ensure the ceilometer alarm services are installed and running:

  sudo yum install -y openstack-ceilometer-alarm
  export CEILO_ALARM_SVCS='evaluator notifier'
  for svc in $CEILO_ALARM_SVCS; do sudo service openstack-ceilometer-alarm-$svc start; done


1. Create multiple basic alarms with thresholds sufficiently low that they are guaranteed to go into alarm:

  for i in $(seq 5)
  do
    ceilometer alarm-threshold-create --name basic_cpu_low_threshold_${i} \
     --meter-name cpu_util  --threshold 0.01 --comparison-operator gt  --statistic avg \
     --period 60 --evaluation-periods 1 \
     --alarm-action 'log://' \
     --query resource_id=$INSTANCE_ID
  done


2. Create a meta-alarm combining (with logical AND) the state of these basic alarms:

  ALARM_IDS=
  for a in `ceilometer alarm-list | awk -F\| '/basic_cpu_/ {print $2}'`; do ALARM_IDS="$ALARM_IDS --alarm_ids $a"; done
  ceilometer --debug alarm-combination-create --name combination_cpu_low --description 'combination of high CPU util alarms' --alarm-action 'log://' $ALARM_IDS

Ensure that this combination alarm transitions into the alarm state within one evaluation period (60s by default):

  sleep 60 ; ceilometer alarm-list | grep combination_cpu_low


3. Create another set of basic alarms with thresholds sufficiently high that they are guaranteed not to go into alarm:

  for i in $(seq 5)
  do
    ceilometer alarm-threshold-create --name basic_cpu_high_threshold_${i} \
     --meter-name cpu_util --threshold 99.99 --comparison-operator gt  --statistic max \
     --period 60 --evaluation-periods 1 \
     --alarm-action 'log://' \
     --query resource_id=$INSTANCE_ID
  done


4. Create 2 further meta-alarms combining (with logical AND & OR respectively) the state of these basic alarms:

  ALARM_IDS=
  for a in `ceilometer alarm-list | awk -F\| '/basic_cpu_/ {print $2}'`; do   ALARM_IDS="$ALARM_IDS --alarm_ids $a"; done
  ceilometer --debug alarm-combination-create --name combination_cpu_mixed_and --description 'combination (AND) of mixed CPU util alarms' --alarm-action 'log://' $ALARM_IDS --operator and
  ceilometer --debug alarm-combination-create --name combination_cpu_mixed_or --description 'combination (OR) of mixed CPU util alarms' --alarm-action 'log://' $ALARM_IDS --operator or

Ensure that this combination alarms transition into the alarm and ok state (for the combination_cpu_mixed_or and combination_cpu_mixed_and alarms respectively) within one evaluation period (60s by default):

  sleep 60 ; ceilometer alarm-list | grep combination_cpu_mixed

Comment 10 errata-xmlrpc 2013-12-20 00:14:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html


Note You need to log in before you can comment on or make changes to this bug.