Bug 986410 - [RFE] Aggregation over multiple alarm states
[RFE] Aggregation over multiple alarm states
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer (Show other bugs)
4.0
Unspecified Unspecified
high Severity low
: rc
: 4.0
Assigned To: Eoghan Glynn
Kevin Whitney
https://blueprints.launchpad.net/ceil...
: FutureFeature, OtherQA
Depends On: 988358
Blocks: 973191 RHOS40RFE 1055813
  Show dependency treegraph
 
Reported: 2013-07-19 13:11 EDT by Eoghan Glynn
Modified: 2016-04-26 09:25 EDT (History)
8 users (show)

See Also:
Fixed In Version: openstack-ceilometer-2013.2-0.12.rc2.el6ost
Doc Type: Enhancement
Doc Text:
Feature: Aggregation of the states of multiple basic alarms into overarching meta-alarms. Reason: Reducing noise from detailed monitoring, also allowing alarm-driven workflows (e.g. Heat autoscaling) to be gated by more complex logical conditions. Result: Combination alarms may be created to combine the states of the under-pinning alarms via logical AND or OR.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-19 19:14:29 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 41971 None None None Never
OpenStack gerrit 42832 None None None Never
OpenStack gerrit 43413 None None None Never
OpenStack gerrit 45085 None None None Never

  None (edit)
Description Eoghan Glynn 2013-07-19 13:11:45 EDT
A mechanism to combine the states of multiple basic alarms into overarching meta-alarms could be useful in reducing noise from detailed monitoring.

The aggregation styles supported should be on the basis of both logical combination (and, or, not) and also subset-of (>X%). 

Upstream blueprint: https://blueprints.launchpad.net/ceilometer/+spec/alarming-logical-combination
Comment 3 Eoghan Glynn 2013-09-23 09:29:26 EDT
Merged upstream as an FFE for Havana RC1:

  https://github.com/openstack/ceilometer/commit/d30bf2fa2
  https://github.com/openstack/ceilometer/commit/985f48270
Comment 5 Eoghan Glynn 2013-10-21 12:40:14 EDT
How To Test
===========

0. Install packstack allinone, then spin up an instance in the usual way. 

Ensure the compute agent is gathering metrics at a reasonable cadence (every 60s for example instead of every 10mins as per the default):

  sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml
  sudo service openstack-ceilometer-compute restart

Ensure the ceilometer alarm services are installed and running:

  sudo yum install -y openstack-ceilometer-alarm
  export CEILO_ALARM_SVCS='evaluator notifier'
  for svc in $CEILO_ALARM_SVCS; do sudo service openstack-ceilometer-alarm-$svc start; done


1. Create multiple basic alarms with thresholds sufficiently low that they are guaranteed to go into alarm:

  for i in $(seq 5)
  do
    ceilometer alarm-threshold-create --name basic_cpu_low_threshold_${i} \
     --meter-name cpu_util  --threshold 0.01 --comparison-operator gt  --statistic avg \
     --period 60 --evaluation-periods 1 \
     --alarm-action 'log://' \
     --query resource_id=$INSTANCE_ID
  done


2. Create a meta-alarm combining (with logical AND) the state of these basic alarms:

  ALARM_IDS=
  for a in `ceilometer alarm-list | awk -F\| '/basic_cpu_/ {print $2}'`; do ALARM_IDS="$ALARM_IDS --alarm_ids $a"; done
  ceilometer --debug alarm-combination-create --name combination_cpu_low --description 'combination of high CPU util alarms' --alarm-action 'log://' $ALARM_IDS

Ensure that this combination alarm transitions into the alarm state within one evaluation period (60s by default):

  sleep 60 ; ceilometer alarm-list | grep combination_cpu_low


3. Create another set of basic alarms with thresholds sufficiently high that they are guaranteed not to go into alarm:

  for i in $(seq 5)
  do
    ceilometer alarm-threshold-create --name basic_cpu_high_threshold_${i} \
     --meter-name cpu_util --threshold 99.99 --comparison-operator gt  --statistic max \
     --period 60 --evaluation-periods 1 \
     --alarm-action 'log://' \
     --query resource_id=$INSTANCE_ID
  done


4. Create 2 further meta-alarms combining (with logical AND & OR respectively) the state of these basic alarms:

  ALARM_IDS=
  for a in `ceilometer alarm-list | awk -F\| '/basic_cpu_/ {print $2}'`; do   ALARM_IDS="$ALARM_IDS --alarm_ids $a"; done
  ceilometer --debug alarm-combination-create --name combination_cpu_mixed_and --description 'combination (AND) of mixed CPU util alarms' --alarm-action 'log://' $ALARM_IDS --operator and
  ceilometer --debug alarm-combination-create --name combination_cpu_mixed_or --description 'combination (OR) of mixed CPU util alarms' --alarm-action 'log://' $ALARM_IDS --operator or

Ensure that this combination alarms transition into the alarm and ok state (for the combination_cpu_mixed_or and combination_cpu_mixed_and alarms respectively) within one evaluation period (60s by default):

  sleep 60 ; ceilometer alarm-list | grep combination_cpu_mixed
Comment 10 errata-xmlrpc 2013-12-19 19:14:29 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html

Note You need to log in before you can comment on or make changes to this bug.