support highly available infrastructure. separate subtasks should be created to track the development progress, but the high-level requirements are roughly as follows: * support multi-server setup ** make sure server-cached structures "just work" (such as the alerting engine) * support agent failover upon downed server event ** agents need to re-position themselves to talk to another server endpoint ** load should be balanced by overall data throughput, and not by the number of connected agents See the development forum link below for details: http://support.rhq-project.org/display/RHQ/High+Availability+-+Agent+Failover nice-to-have for 1.1 (but might not make it until 1.2): * using some load balancing algorithm: highly available infrastructure should be able to optimistically analyze the load coming from each agent, and repartition them to distribute load evenly across the infrastructure
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-644
closing this tracker