support highly available infrastructure. separate subtasks should be created to track the development progress, but the high-level requirements are roughly as follows:
* support multi-server setup
** make sure server-cached structures "just work" (such as the alerting engine)
* support agent failover upon downed server event
** agents need to re-position themselves to talk to another server endpoint
** load should be balanced by overall data throughput, and not by the number of connected agents
See the development forum link below for details:
http://support.rhq-project.org/display/RHQ/High+Availability+-+Agent+Failover
nice-to-have for 1.1 (but might not make it until 1.2):
* using some load balancing algorithm: highly available infrastructure should be able to optimistically analyze the load coming from each agent, and repartition them to distribute load evenly across the infrastructure