Bug 1114203

Summary: Data aggregation should be fault tolerant
Product: [JBoss] JBoss Operations Network Reporter: John Sanda <jsanda>
Component: Core Server, Storage NodeAssignee: John Sanda <jsanda>
Status: CLOSED CURRENTRELEASE QA Contact: Armine Hovsepyan <ahovsepy>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: JON 3.2CC: ahovsepy, hrupp, jshaughn, loleary, mfoley
Target Milestone: ER03   
Target Release: JON 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1114202 Environment:
Last Closed: 2014-12-11 14:03:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1114199, 1114202    
Bug Blocks: 1133609    

Description John Sanda 2014-06-28 15:23:12 UTC
+++ This bug was initially created as a clone of Bug #1114202 +++

Description of problem:
In RHQ 4.9 if an error occurs during aggregation, then entire job is essentially aborted and the remaining data is not aggregated. The aggregation code was re-implemented in RHQ 4.10. Data is processed in batches. We fetch the data for 5 (that number is configurable) schedules in parallel, and then perform the aggregation for multiple batches concurrently. If an exception occurs, the aggregation for that batch is aborted, but we will continue aggregating data for other batches. In terms of fault tolerance, it is an improvement from the implementation in 4.9; however, for each batch that fails, we do not retry the aggregation. 

Work has already been done in master to address all failures. https://docs.jboss.org/author/display/RHQ/Aggregation+Schema+Changes describes the changes. The work done for bug 1114199 cover address failures. I decided to open a separate BZ though because there are different scenarios to test.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 John Sanda 2014-06-28 17:50:51 UTC
I want to point out that we cannot back port the changes to 3.2.x because there are substantial changes including schema changes. If we want to this in 3.2.x, then we need a separate BZ to track that effort.

Comment 2 John Sanda 2014-09-12 02:30:19 UTC
Changes have been pushed to the release/jon3.3.x branch. See bug 1114202 for details.

commit hashes:
2ee9abb58
05dbaec9b
db066d9863
874addb583
dff81ed514

Comment 3 Simeon Pinder 2014-09-17 02:49:10 UTC
Moving to ON_QA as available for test with the following brew build:
https://brewweb.devel.redhat.com//buildinfo?buildID=385149

Comment 6 Armine Hovsepyan 2014-10-09 11:30:54 UTC
verified in JON 3.3 ER04