Bug 986393 - [RFE] Alarm audit/history API
[RFE] Alarm audit/history API
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer (Show other bugs)
Unspecified Unspecified
high Severity low
: Upstream M3
: 4.0
Assigned To: Eoghan Glynn
Kevin Whitney
: FutureFeature, OtherQA
Depends On: 988358
Blocks: 973191 RHOS40RFE 1055813
  Show dependency treegraph
Reported: 2013-07-19 12:22 EDT by Eoghan Glynn
Modified: 2016-04-26 12:19 EDT (History)
8 users (show)

See Also:
Fixed In Version: openstack-ceilometer-2013.2-0.10.1.b3.el6ost
Doc Type: Enhancement
Doc Text:
A feature has been added in OpenStack Metering (Ceilometer) which allows the retention of alarm history in terms of lifecycle events, rule changes and state transformations. This was required because alarms encapsulate a transient state and a snapshot of their current evaluation rule, but users also need the capability of inspecting how the alarm state and rules changed over a longer timespan, including the period after the alarm no longer exists. Now, alarm history is configurably retained for lifecycle events, rule changes and state transformations.
Story Points: ---
Clone Of:
Last Closed: 2013-12-19 19:14:24 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 41065 None None None Never
OpenStack gerrit 41135 None None None Never
OpenStack gerrit 43848 None None None Never
OpenStack gerrit 43849 None None None Never
OpenStack gerrit 43850 None None None Never
OpenStack gerrit 44908 None None None Never
OpenStack gerrit 45244 None None None Never

  None (edit)
Description Eoghan Glynn 2013-07-19 12:22:00 EDT
We to need persist and expose a limited period of alarm history to users.

For each alarm, this would be composed of lifecycle events (creation, deletion), state transitions (in and out of alarm), and attribute updates (especially those attributes that pertain to threshold evaluation).

The retention period must necessarily be limited, as alarm state may potentially flap rapidly producing high volumes.

The history retrieval API sould be:

* paginated with a limit and next marker
* constrainable by timestamp
* filterable by lifecycle event, state transition, attribute update

Upstream blueprint: https://blueprints.launchpad.net/ceilometer/+spec/alarm-audit-api
Comment 4 Eoghan Glynn 2013-10-21 11:36:31 EDT
How To Test

0. Install packstack allinone, then spin up an instance in the usual way. 

Ensure the compute agent is gathering metrics at a reasonable cadence (every 60s for example instead of every 10mins as per the default):

  sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml
  sudo service openstack-ceilometer-compute restart

1. Create an alarm with a threshold sufficiently low that it's guaranteed to go into alarm:

  ceilometer alarm-threshold-create --name cpu_high --description 'instance running hot'  \
     --meter-name cpu_util  --threshold 0.01 --comparison-operator gt  --statistic avg \
     --period 60 --evaluation-periods 1 \
     --alarm-action 'log://' \
     --query resource_id=$INSTANCE_ID

2. Update the alarm:

  ceilometer alarm-update --threshold 75.0 -a $ALARM_ID

3. Wait a while, then delete the alarm:

  ceilometer alarm-delete -a $ALARM_ID

3. Ensure that the alarm-history reports the following events:

  * creation
  * rule change
  * state transition
  * deletion

  ceilometer alarm-history -a ALARM_ID
 | Type             | Timestamp                  | Detail                                |
 | creation         | 2013-10-01T16:20:29.238000 | name: cpu_high                        |
 |                  |                            | description: instance running hot     |
 |                  |                            | type: threshold                       |
 |                  |                            | rule: cpu_util > 0.01 during 1 x 60s |
 | state transition | 2013-10-01T16:20:40.626000 | state: alam                             |
 | rule change      | 2013-10-01T16:22:40.718000 | rule: cpu_util > 75.0 during 3 x 600s |
 | creation         | 2013-10-01T16:20:29.238000 | name: cpu_high                        |
 |                  |                            | description: instance running hot     |
 |                  |                            | type: threshold                       |
 |                  |                            | rule: cpu_util > 75. during 1 x 60s |

Comment 5 Ami Jeain 2013-10-28 07:48:01 EDT
QANAK'ing due to QE capacity
Comment 11 errata-xmlrpc 2013-12-19 19:14:24 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.