Bug 1114199

Summary:	Data from late measurement reports should get aggregated
Product:	[Other] RHQ Project	Reporter:	John Sanda <jsanda>
Component:	Core Server, Storage Node	Assignee:	Nobody <nobody>
Status:	ON_QA ---	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.11	CC:	hrupp
Target Milestone:	---
Target Release:	RHQ 4.13
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1114200 (view as bug list)		Environment:
Last Closed:		Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1114202, 1133605, 1114200, 1114203

Description John Sanda 2014-06-28 15:04:37 UTC

Description of problem:
First I need to define a "late measurement report". The easiest way is with an example. Suppose the current time is 15:00. The data purge job runs and data from 14:00 to 15:00 is aggregated. Let's say that the metrics aggregation finishes at 15:05. The server receives a measurement report at 15:06 that contains timestamps from the 14:00 hour. This is a late measurement report because aggregation has already been run for the time slice of the reported data.

That raw data will get persisted, but in general it will not get aggregated, meaning we will not generate 1 hr, 6 hr, and 24 hr metrics for it. I say in general because it is possible through a server restart that the data could get aggregated.

We need to handle late data because late measurements are not only possible, they are probably fairly common. Work has already been done in master to address late data. https://docs.jboss.org/author/display/RHQ/Aggregation+Schema+Changes describes the changes.

Here is the squash commit in master for the work,

commit hash: fabe6f71458
 
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 John Sanda 2014-08-25 15:01:58 UTC

I am re-targeting this for RHQ 4.13 due to issues found in 4.12.

Comment 2 John Sanda 2014-09-26 20:39:09 UTC

A number of changes have been made to simplify and improve the data aggregation code. Data aggregation is run hourly during the DataCalcJob (previously during the DataPurgeJob). 1 hr data is aggregated every six hours, and 6 hr data is aggregated every 24 hours. Changes were introduced in RHQ 4.12 to handle late data, but they did not cover all scenarios. See comment 2 in bug 1114202 for details.

Here is how things work now. Every time the job runs, we aggregate all raw, 1 hr, and 6 hr data for time slices that have finished for a specified time range which is determined by the raw data retention period. The date ranges used are,

raw_end = start of current hour
raw_start = raw_end - raw_retention + 1 hr

1hr_end = start of 6 hr time slice for raw_end
1hr_start = 1hr_end - minus raw_retention

6hr_end = start of 24 hr time slice for raw_end
6hr_start = 6hr_end - raw_retention

Note that raw_retention is 7 days, and currently it is not configurable. The reason for adding back an hour onto raw_start is to avoid recomputing aggregate metrics for an hour in which some or all of the raw data has already expired.

The 1hr_end and 6hr_end variables can be best explained with some background and some examples. The 6 hr time slices are fixed and are as follows,

(inclusive - exclusive)
00:00 - 06:00
06:00 - 12:00
12:00 - 18:00
18:00 - 24:00

The 24 hr time slice is fixed and is,

(inclusive - exclusive)
00:00 - 24:00

If we are aggregating data at 17:00, then the start of the current 6hr time slice would be 12:00. The start of the 24 hr time slice obviously is 00:00.

We first aggregate raw data for all time slices between raw_start and raw_end, then do the same for 1 hr and 6 hr data respectively.