Bug 1381154

Summary: Composite alarm uses last value from the evaluation for alarm evaluation
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: openstack-aodhAssignee: Mehdi ABAAKOUK <mabaakou>
Status: CLOSED CANTFIX QA Contact: Sasha Smolyak <ssmolyak>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: apevec, jschluet, lhh, pkilambi, tvignaud
Target Milestone: ---Keywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Yurii Prokulevych 2016-10-03 09:19:27 UTC
Description of problem:
-----------------------
Composite alarm uses last value from the evaluation periods, causing false positive alarm transition.

    2016-10-03 08:49:35.884 11794 DEBUG aodh.evaluator [-] evaluating alarm c9891f7b-42ac-40fc-8b30-9631d21d228e _evaluate_alarm /usr/lib/python2.7/site-packages/aodh/evaluator/__init__.py:257              [63/1810]
    2016-10-03 08:49:35.885 11794 DEBUG aodh.evaluator.composite [-] Evaluating composite rule alarm c9891f7b-42ac-40fc-8b30-9631d21d228e ... evaluate /usr/lib/python2.7/site-packages/aodh/evaluator/composite.py:213
    2016-10-03 08:49:35.885 11794 DEBUG aodh.evaluator.composite [-] Evaluating gnocchi_aggregation_by_metrics_threshold rule: {u'evaluation_periods': 3, u'metrics': [u'b6ba3db7-78d5-4e66-a592-d999a2988a91', u'7f16d
    ccc-92e6-43a9-a87e-009f078b9b55'], u'threshold': 6.0, u'granularity': 60, u'aggregation_method': u'mean', u'type': u'gnocchi_aggregation_by_metrics_threshold', u'comparison_operator': u'ge'} evaluate /usr/lib/py
    thon2.7/site-packages/aodh/evaluator/composite.py:45
    2016-10-03 08:49:35.885 11794 DEBUG aodh.evaluator.threshold [-] query stats from 2016-10-03 08:45:35.885525 to 2016-10-03 08:49:35.885525 _bound_duration /usr/lib/python2.7/site-packages/aodh/evaluator/threshol
    d.py:89
    2016-10-03 08:49:36.097 11794 DEBUG aodh.evaluator.gnocchi [-] sanitize stats [] _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:48
    2016-10-03 08:49:36.098 11794 DEBUG aodh.evaluator.gnocchi [-] pruned statistics to 0 _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:52
    2016-10-03 08:49:36.098 11794 DEBUG aodh.evaluator.composite [-] Evaluating gnocchi_aggregation_by_resources_threshold rule: {u'evaluation_periods': 3, u'metric': u'radosgw.api.request', u'threshold': 4.0, u'gra
    nularity': 60, u'aggregation_method': u'mean', u'query': u'{"or":[{"=":{"id":"alarm-resource-3"}},{"=":{"id":"alarm-resource-4"}}]}', u'type': u'gnocchi_aggregation_by_resources_threshold', u'comparison_operator
    ': u'ge', u'resource_type': u'ceph_account'} evaluate /usr/lib/python2.7/site-packages/aodh/evaluator/composite.py:45
    2016-10-03 08:49:36.098 11794 DEBUG aodh.evaluator.threshold [-] query stats from 2016-10-03 08:45:36.098557 to 2016-10-03 08:49:36.098557 _bound_duration /usr/lib/python2.7/site-packages/aodh/evaluator/threshol
    d.py:89
    2016-10-03 08:49:36.598 11794 DEBUG aodh.evaluator.gnocchi [-] sanitize stats [[u'2016-10-03T08:00:00+00:00', 3600.0, 3.1875], [u'2016-10-03T08:45:00+00:00', 900.0, 3.3], [u'2016-10-03T08:45:00+00:00', 300.0, 3$
    3], [u'2016-10-03T08:45:00+00:00', 60.0, 3.0], [u'2016-10-03T08:46:00+00:00', 60.0, 3.0], [u'2016-10-03T08:47:00+00:00', 60.0, 3.0], [u'2016-10-03T08:48:00+00:00', 60.0, 3.5], [u'2016-10-03T08:49:00+00:00', 60.$
    , 4.0]] _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:48
    2016-10-03 08:49:36.599 11794 DEBUG aodh.evaluator.gnocchi [-] pruned statistics to 3 _sanitize /usr/lib/python2.7/site-packages/aodh/evaluator/gnocchi.py:52
    2016-10-03 08:49:36.599 11794 DEBUG aodh.evaluator.threshold [-] comparing value 3.0 against threshold 4.0 _compare /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:175
    2016-10-03 08:49:36.600 11794 DEBUG aodh.evaluator.threshold [-] comparing value 3.5 against threshold 4.0 _compare /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:175
    2016-10-03 08:49:36.600 11794 DEBUG aodh.evaluator.threshold [-] comparing value 4.0 against threshold 4.0 _compare /usr/lib/python2.7/site-packages/aodh/evaluator/threshold.py:175
    2016-10-03 08:49:36.601 11794 INFO aodh.evaluator [-] alarm c9891f7b-42ac-40fc-8b30-9631d21d228e transitioning to alarm because Composite rule alarm with composition form: (rule1 or rule2) transition to alarm, $
    ue to rules: rule2 outside their threshold.



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-aodh-evaluator-3.0.0-0.20160907221145.3990c5b.el7ost.noarch
openstack-aodh-notifier-3.0.0-0.20160907221145.3990c5b.el7ost.noarch
openstack-aodh-common-3.0.0-0.20160907221145.3990c5b.el7ost.noarch
puppet-aodh-9.2.0-0.20160902115754.16ea22a.el7ost.noarch
openstack-aodh-listener-3.0.0-0.20160907221145.3990c5b.el7ost.noarch
python-aodhclient-0.6.0-0.20160826150744.65d2e62.el7ost.noarch
openstack-aodh-api-3.0.0-0.20160907221145.3990c5b.el7ost.noarch


How reproducible:
100%

Steps to Reproduce:
1. Create composite alarm
aodh --debug alarm create \
--type composite \
--name Composite-OR-Alarm \
--description 'Composite OR Alarm' \
--severity critical \
--enabled True \
--alarm-action 'log://' \
--ok-action 'log://' \
--insufficient-data-action 'log://' \
--evaluation-periods 3 \
--composite-rule  '{"or": [{"type":"gnocchi_aggregation_by_metrics_threshold","threshold": 6, "metrics":["b6ba3db7-78d5-4e66-a592-d999a2988a91", "7f16dccc-92e6-43a9-a87e-009f078b9b55"], "evaluation_periods": 3, "granularity": 60, "comparison_operator": "ge", "aggregation_method":"mean"}, { "type":"gnocchi_aggregation_by_resources_threshold", "query": "{\"or\":[{\"=\":{\"id\":\"alarm-resource-3\"}},{\"=\":{\"id\":\"alarm-resource-4\"}}]}", "metric": "radosgw.api.request", "evaluation_periods":3, "granularity":60, "comparison_operator": "ge", "threshold":"4", "resource_type":"ceph_account", "aggregation_method":"mean"}]}'

2. Trigger alarm transition

for i in {1..9}
do
     ceilometer sample-create --resource-id alarm-resource-3 --meter-name radosgw.api.request --meter-type gauge --meter-unit unit1 --sample-volume ${i};     
     ceilometer sample-create --resource-id alarm-resource-4 --meter-name radosgw.api.request --meter-type gauge --meter-unit unit1 --sample-volume 5;
     sleep 60; 
done

3. Assert alarm transitions to new state

Comment 1 Mehdi ABAAKOUK 2016-10-04 14:41:16 UTC
When (individual) alarms are evaluated, we compute a trending state when the previous state is unknown and change the real state to this trending state if we got enough datapoints but not all other them cross the threshold.

But currently the "trending state" is calculated only with the last datapoint, but because have enough datapoints it should be possible to compute a better trending state.

Comment 2 Mehdi ABAAKOUK 2017-06-28 12:51:24 UTC
Upstream doesn't have any plan to fix that. This is a corner case that occurs when metric doesn't have enough data yet.

Comment 3 Mehdi ABAAKOUK 2017-06-28 12:51:24 UTC
Upstream doesn't have any plan to fix that. This is a corner case that occurs when metric doesn't have enough data yet.