Bug 536303 (RHQ-668)

Summary:	alert condition cache segmentation
Product:	[Other] RHQ Project	Reporter:	Joseph Marques <jmarques>
Component:	Alerts	Assignee:	Joseph Marques <jmarques>
Status:	CLOSED NEXTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	unspecified	Keywords:	SubTask
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	All
URL:	http://jira.rhq-project.org/browse/RHQ-668
Whiteboard:
Fixed In Version:	1.1	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	536277

Description Joseph Marques 2008-07-14 15:40:00 UTC

today, the AlertConditionCache loads conditions all alert definitions across the system.  it also loads in all OOB conditions for all MeasurementBaselines in the system.

the HA version needs to load ONLY the alert conditions and OOB conditions for the agent that are currently attached/connected to it

Comment 1 Joseph Marques 2008-08-06 01:26:43 UTC

rev1180 - start of alert condition cache segmentation work;
cache now only loads elements into it on an agent-by-agent basis; 
for now, the cache will load data from all agents until the agent-server comm is reworked so that the HA tables are updated when HA events (fail over, new agent connection, etc) take place; 
to make RHQ development smooth with requiring explicit use of the HA installer, if the server starts up and sees no records in the RHQ_SERVER table it will create a default one, and use that for identity computations; 
started ClusterManager and ClusterIdentityManager SLSBs;

Comment 2 Joseph Marques 2008-08-07 17:33:48 UTC

rev1190 - refactor the baseline recalculation work directly into the SLSB instead of in the AutoBaselineCalculationJob class; 
instead, the AutoBaselineCalculationJob now calls into the new MeasurementBaselineManager SLSB method;

Comment 3 Joseph Marques 2008-08-07 17:48:22 UTC

rev1191 - move the identity manager into a separate instance sub-package;
this draws a better organizational distinction between the SLSBs that manage all/any HA entities versus ones that manage the entities referring to this server instances identity;

Comment 4 Joseph Marques 2008-08-07 19:04:20 UTC

rev1193 - the entry point from the quartz job, since it calls out to methods that can't be executed in a transaction, must also be annotated to not be called within one; (fixes ejb exception you'll see if you run on 1190-1192)

Comment 5 Joseph Marques 2008-08-07 21:55:48 UTC

rev1196 - in order to provide more info for testing / verification / QA purposes, i decided to scratch the boolean dirty field for a slightly richer numeric field, represent a mask of Agent.Status elements;

Comment 6 Joseph Marques 2008-08-08 04:42:30 UTC

rev1200 - added explicit lazy-load params to AffinityGroup, Server, and Agent entities; 
changed all instances of ClusterIdentity (class names, variable names, comments) to Server; 
added new ServerSchedulerBean, which schedules server-specific jobs (just like ServerManagerBean gets server-specific info from the HA tables); 
wrote ReloadServerCacheIfNeededJob, which uses the ServerScheduleBean to ensure execution on each instance in the cloud; 
removed updateConditions method overloads from AlertConditionCacheManagerLocal, new semantics require out-of-band update via full-cache reload job; 
added AgentStatusManager which gets and sets the appropriate status on the appropriate agent according to various updates; 
changed all instances of AlertConditionCacheManager.updateConditions to set the appropriate status on the agent for the data being updated; 
added CacheConsistencyManager which reloads the internal caches for a specific server;

Comment 7 Joseph Marques 2008-08-08 04:47:56 UTC

resolving this as the cache is segmented, tested, and works.  however, the baseline job still runs against the entire db and recalculates all baselines.  this should (or at least could) be partitioned, so that each server only updates baselines for the agents it's currently managing...but this isn't an absolutely necessary for proper operation.  so, i'll be moving onto other RHQ-644 tasks, and if there is time left in this dev cycle i'll revisit this.

Comment 8 Red Hat Bugzilla 2009-11-10 21:14:23 UTC

This bug was previously known as http://jira.rhq-project.org/browse/RHQ-668