Red Hat Bugzilla – Bug 670607
RFE: Make baseline alert condition definitions less confusing
Last modified: 2011-09-30 19:26:31 EDT
Description of problem:
Creating an alert definition which is to fire when a set percentage of a baseline is met does not seem to get evaluated and therefore the alert condition is never met (no alert is fired).
Version-Release number of selected component (if applicable):
3.0.0 (JON 2.4.0 GA)
Steps to Reproduce:
1. Navigate to RHQ Server's RHQDS resource
2. Set "Active Connections" (Monitor -> Schedules) to 1 minute interval
3. Create new Alert Definition (Alert -> Definition) "New Definition"
* Name: High connection percentage
* Description: Current active database connections exceeds the specified threshold percentage.
* Expression: All
* If Condition: Active Connections
* is Greater than 0 % of Max Value
No alert is fired even though Active Connections is 1 and max connections is 1 (100%) (Alert -> History)
Alert should be fired every minute and be seen on Alert -> History page as "High connection percentage"
In the test case, the baseline for RHQDS shows 1 active connection with a max active connections of 1 (min and avg active connections are also 1). In this case, the alert criteria of <Current Active Connections> is > 0% of Max Active Connections (i.e. 1/1 = 1 * 100 = 100% > 0% = true) but this condition doesn't ever seem to be evaluated as I do not see any indication in the debug log output as I do with absolute criteria.
I retested using 10% instead of 0% (to ensure that it was not the 0% causing the issue) and the problem still exists. This does not appear to be an issue with the actual value of the percentage.
This has to do with the UI being confusing. I was experiencing the same behavior and did some investigation. It turns out that this is the normal behavior if you don't have a baseline set for the metric in question. In that case the alert condition is not even evaluated, it's basically invalid until a baseline exists.
There are a couple of things that make this confusing:
1) We allow a user to specify the condition to begin with. It's fair that we do
because it could be a template, or just intentional. But perhaps we could do
something more here.
2) It's easy to miss that this condition relates to baseline values.
Min/Average/Max really means BaselineMin/BaselineAverage/BaselineMax. It's
confusing because the Tables subtab shows Min/Average/Max and a user may
think those are the values being tested against. They aren't, those are just
calculated values for the Date Range being applied to the table.
3) It's not very easy to actually see if a Baseline is set (or when it may get
set if it isn't already). And as a doc-note, baseline generation is currently
only covered in the FAQ.
One more technical note, it's not clear to me that if this type of condition exists, and the baseline is generated, that the agent condition cache gets refreshed in any timely manner to ensure that the new baseline value is picked up. We need to understand what happens in this scenario.
I'm changing the title of this to be an RFE for RHQ4's GUI.