1114203 – Data aggregation should be fault tolerant

Bug 1114203 - Data aggregation should be fault tolerant

Summary: Data aggregation should be fault tolerant

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Core Server, Storage Node
Sub Component:
Version:	JON 3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	ER03
Target Release:	JON 3.3.0
Assignee:	John Sanda
QA Contact:	Armine Hovsepyan
Docs Contact:
URL:
Whiteboard:
Depends On:	1114199 1114202
Blocks:	1133609
TreeView+	depends on / blocked

Reported:	2014-06-28 15:23 UTC by John Sanda
Modified:	2015-09-03 00:02 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:	1114202
Environment:
Last Closed:	2014-12-11 14:03:44 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Description John Sanda 2014-06-28 15:23:12 UTC

+++ This bug was initially created as a clone of Bug #1114202 +++

Description of problem:
In RHQ 4.9 if an error occurs during aggregation, then entire job is essentially aborted and the remaining data is not aggregated. The aggregation code was re-implemented in RHQ 4.10. Data is processed in batches. We fetch the data for 5 (that number is configurable) schedules in parallel, and then perform the aggregation for multiple batches concurrently. If an exception occurs, the aggregation for that batch is aborted, but we will continue aggregating data for other batches. In terms of fault tolerance, it is an improvement from the implementation in 4.9; however, for each batch that fails, we do not retry the aggregation. 

Work has already been done in master to address all failures. https://docs.jboss.org/author/display/RHQ/Aggregation+Schema+Changes describes the changes. The work done for bug 1114199 cover address failures. I decided to open a separate BZ though because there are different scenarios to test.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 John Sanda 2014-06-28 17:50:51 UTC

I want to point out that we cannot back port the changes to 3.2.x because there are substantial changes including schema changes. If we want to this in 3.2.x, then we need a separate BZ to track that effort.

Comment 2 John Sanda 2014-09-12 02:30:19 UTC

Changes have been pushed to the release/jon3.3.x branch. See bug 1114202 for details.

commit hashes:
2ee9abb58
05dbaec9b
db066d9863
874addb583
dff81ed514

Comment 3 Simeon Pinder 2014-09-17 02:49:10 UTC

Moving to ON_QA as available for test with the following brew build:
https://brewweb.devel.redhat.com//buildinfo?buildID=385149

Comment 6 Armine Hovsepyan 2014-10-09 11:30:54 UTC

verified in JON 3.3 ER04

Note You need to log in before you can comment on or make changes to this bug.