If you try to setup a monitoring environment with a server and multiple agents, which are NATed behind firewalls, each agent needs its own port for the "callback"-communication from server to client. A better solution for this case (maybe for every case) would be, that the client needs to poll the server. So a unidirectional communication would be used, which is much easier to maintain from the point of firewalling. Cons: - UI-data for the user may get inaccurate, due to cached data until the agent reports its newest state. - The agent cannot be checked from the server if it is still alive. - Resources on the server could get exhausted, if data for the agent aren't picked up from the agent. Possible solutions: - A possible solution for inaccurate data which the user may see on the server-UI would be: - a generic agent poll parameter, which is by default the shortes period of any configured watch for the agent. Lets call it agent.polltime - If the user is now on the UI on any resource of one server. The agent.polltime should be automatically be reduced to e.g. 5 seconds. So you could simulate a responsiveness of the agents. - For the issue to determine if the agent is still alive it would be an option to setup a hearbeat of every agent to the server. So if the agent does not respond in a defined time we could raise a warning, because something must be wrong. - For the issue with exhausting resources. It would be fine, if the command queue for the agents get limited. E.g. if an agent does not respond within 30 days, the queue gets dropped, since the agent seems to be forever down.
One possible solution to workaround the issue with asynchronity could be to use the http-push. A solution could be: - a user requests an operation on a agent. - with the next agent/client request activate the http-push - the server would now be able to communicate synchronously with the client on the servers interval needs. No polling from the client would be necessary. Here you would have only one painpoint. The delay between the first request of a user to update agent data until the http-push gets updated. Everything else should behave as it is now.
Here's a blog post by Joseph discussing switching to unidirectional comm: http://josephmarques.wordpress.com/2008/10/16/the-software-dinner-party/
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2436
mass add of key word FutureFeature to help track
This is still on the wish-list.