Bug 741414 - alerts with compound AND conditions can incorrectly fire when one of the conditions goes from true to false within 30 seconds of the other condition going from from false to true
Summary: alerts with compound AND conditions can incorrectly fire when one of the cond...
Keywords:
Status: NEW
Alias: None
Product: RHQ Project
Classification: Other
Component: Alerts
Version: 4.1
Hardware: All
OS: All
low
high
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-26 18:22 UTC by Ian Springer
Modified: 2024-03-04 13:35 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 735262 0 high CLOSED alert def with multiple conditions using the same metric fires alerts at wrong times 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 737565 0 high CLOSED do not allow user to pick the same metric in multiple measurement conditionals in same ALL alert def 2021-02-22 00:41:40 UTC

Internal Links: 735262 737565

Description Ian Springer 2011-09-26 18:22:51 UTC
If the following alert is defined:

(metricA > 10) AND (metricB < 20)

And a metric report comes in containing:

metricA = 11, metricB = 21

followed by a report containing:

metricA = 9, metricB = 19

The alert can incorrectly fire after the 2nd report is processed if the matched conditions happen to be processed by the condition consumer MDB in the order: (metricB < 20) then (metricA > 10).

This can happen because AbstractConditionCache.processCacheElements() processes the metric datums from a given metric report as follows:

- for each datum:
--- for each cached condition for the metric def corresponding to the datum:
----- a) evaluate the datum value against the condition
----- b) publish either a positive or negative condition to the JMS condition queue

As each condition is published to the queue, the AlertConditionConsumerBean MDB will either a) create/update an alert condition log and then see if the full condition set is true if the condition is positive, or b) delete any existing alert condition log (i.e. an invalidated condition log) if the condition is negative

In our example, if the (metricB = 19) datum is processed first, the alert will fire even though metricA no longer equals 9, because the (metricA = 9) datum has not been processed yet.

This bug can only occur if the same metric report contains a datum that would cause the first condition to g from true to false, and a datum that would cause the second condition to go from false to true. Since each Agent sends metric reports every 30 seconds, the bug can only occur if one condition goes to true and the other to false within a 30 second window on the Agent.

I think the fix would be to rewrite AbstractConditionCache.processCacheElements() to do the following:

1) for each datum:
--- for each cached condition for the metric def corresponding to the datum:
----- 1) evaluate the datum value against the condition
----- 2) store the condition eval results in two temporary lists, one containing all the conditions that were positive and the other containing all the conditions that were negative
2) for each condition in the list of negative conditions, publish a negative condition to the JMS condition queue
3) for each condition in the list of positive conditions, publish a positive condition to the JMS condition queue

Publishing all of the negative conditions before publishing any of the positive conditions will ensure that any invalidated condition logs are deleted prior to the positive conditions being published and potentially causing the alert to fire.

Comment 1 Ian Springer 2011-09-26 18:31:43 UTC
As of [master 8ada3a7], the pattern-generator plugin can be used to reproduce this bug as follows:

- define an alert with the following condition: 
  (Pattern 1 Metric = 0) AND (Pattern 2 Metric = 0)

The plugin always reports either:

  Pattern 1 Metric = 0, Pattern 2 Metric = 1

or:

  Pattern 1 Metric = 1, Pattern 2 Metric = 0

so the alert should never fire. However, due to this bug, it will fire.

Comment 2 John Mazzitelli 2011-09-26 18:39:05 UTC
see bug #735262 for fixing a specialized form of this issue (that is, with the same metric used in multiple conditions)

bug #737565 forces the user to pick different metrics per condition. but this issue shows that even different metrics still exhibit odd behavior under rare conditions.


Note You need to log in before you can comment on or make changes to this bug.