today, the AlertConditionCache loads conditions all alert definitions across the system. it also loads in all OOB conditions for all MeasurementBaselines in the system. the HA version needs to load ONLY the alert conditions and OOB conditions for the agent that are currently attached/connected to it
rev1180 - start of alert condition cache segmentation work; cache now only loads elements into it on an agent-by-agent basis; for now, the cache will load data from all agents until the agent-server comm is reworked so that the HA tables are updated when HA events (fail over, new agent connection, etc) take place; to make RHQ development smooth with requiring explicit use of the HA installer, if the server starts up and sees no records in the RHQ_SERVER table it will create a default one, and use that for identity computations; started ClusterManager and ClusterIdentityManager SLSBs;
rev1190 - refactor the baseline recalculation work directly into the SLSB instead of in the AutoBaselineCalculationJob class; instead, the AutoBaselineCalculationJob now calls into the new MeasurementBaselineManager SLSB method;
rev1191 - move the identity manager into a separate instance sub-package; this draws a better organizational distinction between the SLSBs that manage all/any HA entities versus ones that manage the entities referring to this server instances identity;
rev1193 - the entry point from the quartz job, since it calls out to methods that can't be executed in a transaction, must also be annotated to not be called within one; (fixes ejb exception you'll see if you run on 1190-1192)
rev1196 - in order to provide more info for testing / verification / QA purposes, i decided to scratch the boolean dirty field for a slightly richer numeric field, represent a mask of Agent.Status elements;
rev1200 - added explicit lazy-load params to AffinityGroup, Server, and Agent entities; changed all instances of ClusterIdentity (class names, variable names, comments) to Server; added new ServerSchedulerBean, which schedules server-specific jobs (just like ServerManagerBean gets server-specific info from the HA tables); wrote ReloadServerCacheIfNeededJob, which uses the ServerScheduleBean to ensure execution on each instance in the cloud; removed updateConditions method overloads from AlertConditionCacheManagerLocal, new semantics require out-of-band update via full-cache reload job; added AgentStatusManager which gets and sets the appropriate status on the appropriate agent according to various updates; changed all instances of AlertConditionCacheManager.updateConditions to set the appropriate status on the agent for the data being updated; added CacheConsistencyManager which reloads the internal caches for a specific server;
resolving this as the cache is segmented, tested, and works. however, the baseline job still runs against the entire db and recalculates all baselines. this should (or at least could) be partitioned, so that each server only updates baselines for the agents it's currently managing...but this isn't an absolutely necessary for proper operation. so, i'll be moving onto other RHQ-644 tasks, and if there is time left in this dev cycle i'll revisit this.
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-668