870561 – Peak and Low metric values are incorrectly aggregated from compressed avg. value instead of minvalue and maxvalue

Bug 870561 - Peak and Low metric values are incorrectly aggregated from compressed avg. value instead of minvalue and maxvalue

Summary: Peak and Low metric values are incorrectly aggregated from compressed avg. va...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Monitoring -- Other
Sub Component:
Version:	JON 3.1.1
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	JON 3.1.2
Assignee:	John Sanda
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:	870564
Blocks:	1024533
TreeView+	depends on / blocked

Reported:	2012-10-26 20:56 UTC by Larry O'Leary
Modified:	2018-11-29 19:39 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Clones:	870564 (view as bug list)
Environment:
Last Closed:	2013-09-11 10:58:27 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	240563	0	None	None	None	2012-10-26 21:13:05 UTC

Description Larry O'Leary 2012-10-26 20:56:36 UTC

Description of problem:
When aggregated metrics values are returned from the compressed tables, the value column is used for low and peak instead of the minvalue and maxvalue columns. This results in integer metric returning decimal values for minimum and maximum. Additionally, the returned value is just wrong because it is taking the averaged (compressed) value over the time period instead of the true min and max values during the same time period.

Version-Release number of selected component (if applicable):
4.4.0.JON311GA

How reproducible:
Always

Steps to Reproduce:
1. Review the RHQ Agent's Agent-Server Clock Difference metric graph
2. Set the time interval to previous 8 days
 
Actual results:
Min and/or max value will be a decimal value instead of integer

Expected results:
Min and max should be an integer

Additional info:
This appears to be a result of the database query that aggregates the data from the compressed tables:

    SELECT timestamp, max(av), max(peak), max(low)
    FROM ((
    SELECT timestamp, avg(value) as av, max(value) as peak, min(value) as low
    FROM ((
    SELECT beginTS as timestamp, value
    FROM (
    SELECT 1341847301298 + (15840000 * i) as beginTS, i
    FROM RHQ_numbers where i < 60) n, RHQ_MEASUREMENT_DATA_NUM_1H d
    WHERE time_stamp BETWEEN beginTS AND (beginTS + 15840000) AND
    d.schedule_id IN (
    SELECT innerSchedule.id
    FROM rhq_measurement_sched innerSchedule
    WHERE innerSchedule.definition = 10003 AND
    innerSchedule.resource_id = 10003
    )
    )
    ) data GROUP BY timestamp)
    UNION ALL (
    SELECT 1341847301298 + (15840000 * i) AS timestamp, NULL AS av, NULL AS peak, NULL AS low
    FROM RHQ_numbers
    WHERE i < 60)
    ) alldata
    GROUP BY timestamp ORDER BY timestamp


Instead of avg(value) as av, max(value) as peak, min(value) as low we should probably be using: avg(value) as av, max(maxvalue) as peak, min(minvalue) as low

Comment 1 John Sanda 2012-10-26 21:28:39 UTC

Here is an example to illustrate everything going on with this query. Suppose we will be querying against the rhq_measurement_data_num_1h table and for the date range specified, we have the following rows that will be included in aggregation/data points query,


      time_stamp     | schedule_id | value | minvalue | maxvalue 
---------------------+-------------+-------+----------+----------
 Mon Jul 09 11:48:35 |       10013 |     2 |        1 |        3
 Mon Jul 09 12:48:35 |       10013 |     5 |        4 |        6
 Mon Jul 09 13:48:35 |       10013 |     3 |        3 |        3
 Fri Jul 20 07:24:35 |       10013 |     5 |        2 |        9
 Fri Jul 20 08:24:35 |       10013 |     5 |        4 |        6
 Fri Jul 20 09:24:35 |       10013 |     3 |        3 |        3


The data points query takes the time range, which is 11 days in this example, and divides it into 60 buckets. Each of the above rows will fall into one of the buckets. A bucket can have zero or more rows in it. The first three rows fall into bucket 0 while the last three rows fall into bucket 59.

The aggregation/data points query then generates the average, max, and min of the value column of the rows in each bucket. We then wind up with the following results,

Bucket[0] - {avg: 3.333, min: 2, max: 5}
Bucket[1..58] - {avg: Double.NaN, min: Double.NaN, max: Double.NaN}
Bucket[59] - {avg: 4.333, min: 3, max: 5}

Note that the minvalue and maxvalue columns are not used all. This query supports the candlestick graphs and provides a mechanism to summarize the data which is helpful when there are a lot of data points. As I found this afternoon, it does not look like we provide an API to allow users to get at the unsummarized data. A bigger potential problem with this query is that if the date range changes slightly, it can yield different results even if that range covers the same data points.

Comment 2 John Sanda 2012-11-15 02:26:59 UTC

I have updated the code that generates the data points query so that when the query will be against tables other than the raw, the minvalue and the maxvalue columns are used to calculate the hi and low values for each data point/bucket.

master commit hash: c8886fcb9b8

Comment 3 John Sanda 2012-11-15 14:40:27 UTC

Changes have been pushed to the release/jon3.1.x branch.

release/jon3.1.x commit hash: 4741af9d36f

Comment 4 John Sanda 2012-11-15 16:02:31 UTC

There are two key things I want to point out for testing. The first is determining which table you are going to query. Use the following to determine which table(s) will be queried,

start date < 7 days --> raw tables
start date < 14 days --> 1hr table
start date < 31 days --> 6hr table
start date >= 31 days --> 1 day table

As described in comment 1 the query divides the time range into 60 buckets. The second thing to consider for testing is determining into which bucket each data point (i.e., row from metric table) goes. In my commits, there is a class I used for this very purpose named Buckets. You can find it at the following URL.

http://git.fedorahosted.org/cgit/rhq/rhq.git/diff/modules/enterprise/server/jar/src/test/java/org/rhq/enterprise/server/measurement/Buckets.java?h=release/jon3.1.x

It calculates and gives you the time slice for each of the buckets. This can be difficult to test so I hope this helps.

Comment 5 Simeon Pinder 2012-11-21 21:55:52 UTC

Moving to ON_QA as available for test with build : https://brewweb.devel.redhat.com//buildinfo?buildID=244662.

Comment 6 Armine Hovsepyan 2012-12-04 15:35:48 UTC

verified.

Thanks John!

Note You need to log in before you can comment on or make changes to this bug.