Bug 1381154 - Composite alarm uses last value from the evaluation for alarm evaluation
Summary: Composite alarm uses last value from the evaluation for alarm evaluation
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-aodh
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 12.0 (Pike)
Assignee: Mehdi ABAAKOUK
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-03 09:19 UTC by Yurii Prokulevych
Modified: 2017-06-28 12:51 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1629808 0 None None None 2016-10-03 09:24:37 UTC

Description Yurii Prokulevych 2016-10-03 09:19:27 UTC
Description of problem:
-----------------------
Composite alarm uses last value from the evaluation periods, causing false positive alarm transition.

    2016-10-03 08:49:35.884 11794 DEBUG aodh.evaluator [-] evaluating alarm c9891f7b-42ac-40fc-8b30-9631d21d228e _evaluate_alarm /usr/lib/python2.7/site-packages/aodh/evaluator/__init__.py:257              [63/1810]
    2016-10-03 08:49:35.885 11794 DEBUG aodh.evaluator.composite [-] Evaluating composite rule alarm c9891f7b-42ac-40fc-8b30-9631d21d228e ... evaluate /usr/lib/python2.7/site-packages/aodh/evaluator/composite.py:213
    2016-10-03 08:49:35.885 11794 DEBUG aodh.evaluator.composite [-] Evaluating gnocchi_aggregation_by_metrics_threshold rule: {u'evaluation_periods': 3, u'metrics': [u'b6ba3db7-78d5-4e66-a592-d999a2988a91', u'7f16d
    ccc-92e6-43a9-a87e-009f078b9b55'], u'threshold': 6.0, u'granularity': 60, u'aggregation_method': u'mean', u'type': u'gnocchi_aggregation_by_metrics_threshold', u'comparison_operator': u'ge'} evaluate /usr/lib/py
    thon2.7/site-packages/aodh/evaluator/composite.py:45
    2016-10-03 08:49:35.885 11794 DEBUG aodh.evaluator.threshold [-] query stats from 2016-10-03 08:45:35.885525 to 2016-10-03 08:49:35.885525 _bound_duration /usr/lib/python2.7/site-packages/aodh/evaluator/threshol
    d.py:89
    2016-10-03 08:49:36.097 11794 DEBUG aodh.evaluator.gnocchi [-] sanitize stats [] _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:48
    2016-10-03 08:49:36.098 11794 DEBUG aodh.evaluator.gnocchi [-] pruned statistics to 0 _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:52
    2016-10-03 08:49:36.098 11794 DEBUG aodh.evaluator.composite [-] Evaluating gnocchi_aggregation_by_resources_threshold rule: {u'evaluation_periods': 3, u'metric': u'radosgw.api.request', u'threshold': 4.0, u'gra
    nularity': 60, u'aggregation_method': u'mean', u'query': u'{"or":[{"=":{"id":"alarm-resource-3"}},{"=":{"id":"alarm-resource-4"}}]}', u'type': u'gnocchi_aggregation_by_resources_threshold', u'comparison_operator
    ': u'ge', u'resource_type': u'ceph_account'} evaluate /usr/lib/python2.7/site-packages/aodh/evaluator/composite.py:45
    2016-10-03 08:49:36.098 11794 DEBUG aodh.evaluator.threshold [-] query stats from 2016-10-03 08:45:36.098557 to 2016-10-03 08:49:36.098557 _bound_duration /usr/lib/python2.7/site-packages/aodh/evaluator/threshol
    d.py:89
    2016-10-03 08:49:36.598 11794 DEBUG aodh.evaluator.gnocchi [-] sanitize stats [[u'2016-10-03T08:00:00+00:00', 3600.0, 3.1875], [u'2016-10-03T08:45:00+00:00', 900.0, 3.3], [u'2016-10-03T08:45:00+00:00', 300.0, 3$
    3], [u'2016-10-03T08:45:00+00:00', 60.0, 3.0], [u'2016-10-03T08:46:00+00:00', 60.0, 3.0], [u'2016-10-03T08:47:00+00:00', 60.0, 3.0], [u'2016-10-03T08:48:00+00:00', 60.0, 3.5], [u'2016-10-03T08:49:00+00:00', 60.$
    , 4.0]] _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:48
    2016-10-03 08:49:36.599 11794 DEBUG aodh.evaluator.gnocchi [-] pruned statistics to 3 _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:52
    2016-10-03 08:49:36.599 11794 DEBUG aodh.evaluator.threshold [-] comparing value 3.0 against threshold 4.0 _compare /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:175
    2016-10-03 08:49:36.600 11794 DEBUG aodh.evaluator.threshold [-] comparing value 3.5 against threshold 4.0 _compare /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:175
    2016-10-03 08:49:36.600 11794 DEBUG aodh.evaluator.threshold [-] comparing value 4.0 against threshold 4.0 _compare /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:175
    2016-10-03 08:49:36.601 11794 INFO aodh.evaluator [-] alarm c9891f7b-42ac-40fc-8b30-9631d21d228e transitioning to alarm because Composite rule alarm with composition form: (rule1 or rule2) transition to alarm, $
    ue to rules: rule2 outside their threshold.



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-aodh-evaluator-3.0.0-0.20160907221145.3990c5b.el7ost.noarch
openstack-aodh-notifier-3.0.0-0.20160907221145.3990c5b.el7ost.noarch
openstack-aodh-common-3.0.0-0.20160907221145.3990c5b.el7ost.noarch
puppet-aodh-9.2.0-0.20160902115754.16ea22a.el7ost.noarch
openstack-aodh-listener-3.0.0-0.20160907221145.3990c5b.el7ost.noarch
python-aodhclient-0.6.0-0.20160826150744.65d2e62.el7ost.noarch
openstack-aodh-api-3.0.0-0.20160907221145.3990c5b.el7ost.noarch


How reproducible:
100%

Steps to Reproduce:
1. Create composite alarm
aodh --debug alarm create \
--type composite \
--name Composite-OR-Alarm \
--description 'Composite OR Alarm' \
--severity critical \
--enabled True \
--alarm-action 'log://' \
--ok-action 'log://' \
--insufficient-data-action 'log://' \
--evaluation-periods 3 \
--composite-rule  '{"or": [{"type":"gnocchi_aggregation_by_metrics_threshold","threshold": 6, "metrics":["b6ba3db7-78d5-4e66-a592-d999a2988a91", "7f16dccc-92e6-43a9-a87e-009f078b9b55"], "evaluation_periods": 3, "granularity": 60, "comparison_operator": "ge", "aggregation_method":"mean"}, { "type":"gnocchi_aggregation_by_resources_threshold", "query": "{\"or\":[{\"=\":{\"id\":\"alarm-resource-3\"}},{\"=\":{\"id\":\"alarm-resource-4\"}}]}", "metric": "radosgw.api.request", "evaluation_periods":3, "granularity":60, "comparison_operator": "ge", "threshold":"4", "resource_type":"ceph_account", "aggregation_method":"mean"}]}'

2. Trigger alarm transition

for i in {1..9}
do
     ceilometer sample-create --resource-id alarm-resource-3 --meter-name radosgw.api.request --meter-type gauge --meter-unit unit1 --sample-volume ${i};     
     ceilometer sample-create --resource-id alarm-resource-4 --meter-name radosgw.api.request --meter-type gauge --meter-unit unit1 --sample-volume 5;
     sleep 60; 
done

3. Assert alarm transitions to new state

Comment 1 Mehdi ABAAKOUK 2016-10-04 14:41:16 UTC
When (individual) alarms are evaluated, we compute a trending state when the previous state is unknown and change the real state to this trending state if we got enough datapoints but not all other them cross the threshold.

But currently the "trending state" is calculated only with the last datapoint, but because have enough datapoints it should be possible to compute a better trending state.

Comment 2 Mehdi ABAAKOUK 2017-06-28 12:51:24 UTC
Upstream doesn't have any plan to fix that. This is a corner case that occurs when metric doesn't have enough data yet.

Comment 3 Mehdi ABAAKOUK 2017-06-28 12:51:24 UTC
Upstream doesn't have any plan to fix that. This is a corner case that occurs when metric doesn't have enough data yet.


Note You need to log in before you can comment on or make changes to this bug.