Bug 986393

Summary: [RFE] Alarm audit/history API
Product: Red Hat OpenStack Reporter: Eoghan Glynn <eglynn>
Component: openstack-ceilometerAssignee: Eoghan Glynn <eglynn>
Status: CLOSED ERRATA QA Contact: Kevin Whitney <kwhitney>
Severity: low Docs Contact:
Priority: high    
Version: 4.0CC: ajeain, breeler, eglynn, jruzicka, mlopes, pbrady, sgordon, srevivo
Target Milestone: Upstream M3Keywords: FutureFeature, OtherQA
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
URL: https://blueprints.launchpad.net/ceilometer/+spec/alarm-audit-api
Whiteboard:
Fixed In Version: openstack-ceilometer-2013.2-0.10.1.b3.el6ost Doc Type: Enhancement
Doc Text:
A feature has been added in OpenStack Metering (Ceilometer) which allows the retention of alarm history in terms of lifecycle events, rule changes and state transformations. This was required because alarms encapsulate a transient state and a snapshot of their current evaluation rule, but users also need the capability of inspecting how the alarm state and rules changed over a longer timespan, including the period after the alarm no longer exists. Now, alarm history is configurably retained for lifecycle events, rule changes and state transformations.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-20 00:14:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 988358    
Bug Blocks: 973191, 975499, 1055813    

Description Eoghan Glynn 2013-07-19 16:22:00 UTC
We to need persist and expose a limited period of alarm history to users.

For each alarm, this would be composed of lifecycle events (creation, deletion), state transitions (in and out of alarm), and attribute updates (especially those attributes that pertain to threshold evaluation).

The retention period must necessarily be limited, as alarm state may potentially flap rapidly producing high volumes.

The history retrieval API sould be:

* paginated with a limit and next marker
* constrainable by timestamp
* filterable by lifecycle event, state transition, attribute update

Upstream blueprint: https://blueprints.launchpad.net/ceilometer/+spec/alarm-audit-api

Comment 4 Eoghan Glynn 2013-10-21 15:36:31 UTC
How To Test
===========

0. Install packstack allinone, then spin up an instance in the usual way. 

Ensure the compute agent is gathering metrics at a reasonable cadence (every 60s for example instead of every 10mins as per the default):

  sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml
  sudo service openstack-ceilometer-compute restart


1. Create an alarm with a threshold sufficiently low that it's guaranteed to go into alarm:

  ceilometer alarm-threshold-create --name cpu_high --description 'instance running hot'  \
     --meter-name cpu_util  --threshold 0.01 --comparison-operator gt  --statistic avg \
     --period 60 --evaluation-periods 1 \
     --alarm-action 'log://' \
     --query resource_id=$INSTANCE_ID


2. Update the alarm:

  ceilometer alarm-update --threshold 75.0 -a $ALARM_ID



3. Wait a while, then delete the alarm:

  ceilometer alarm-delete -a $ALARM_ID


3. Ensure that the alarm-history reports the following events:

  * creation
  * rule change
  * state transition
  * deletion

  ceilometer alarm-history -a ALARM_ID
 +------------------+----------------------------+---------------------------------------+
 | Type             | Timestamp                  | Detail                                |
 +------------------+----------------------------+---------------------------------------+
 | creation         | 2013-10-01T16:20:29.238000 | name: cpu_high                        |
 |                  |                            | description: instance running hot     |
 |                  |                            | type: threshold                       |
 |                  |                            | rule: cpu_util > 0.01 during 1 x 60s |
 | state transition | 2013-10-01T16:20:40.626000 | state: alam                             |
 | rule change      | 2013-10-01T16:22:40.718000 | rule: cpu_util > 75.0 during 3 x 600s |
 | creation         | 2013-10-01T16:20:29.238000 | name: cpu_high                        |
 |                  |                            | description: instance running hot     |
 |                  |                            | type: threshold                       |
 |                  |                            | rule: cpu_util > 75. during 1 x 60s |

 +------------------+----------------------------+---------------------------------------+

Comment 5 Ami Jeain 2013-10-28 11:48:01 UTC
QANAK'ing due to QE capacity

Comment 11 errata-xmlrpc 2013-12-20 00:14:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html