Bug 535775 (RHQ-2436)

Summary: RFE: unidirectional communication from agent to server
Product: [Other] RHQ Project Reporter: Darko Palic <darko.palic>
Component: Communications SubsystemAssignee: RHQ Project Maintainer <rhq-maint>
Status: NEW --- QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: jshaughn, tao
Target Milestone: ---Keywords: FutureFeature, Improvement
Target Release: ---   
Hardware: All   
OS: All   
URL: http://jira.rhq-project.org/browse/RHQ-2436
Whiteboard: agent bidi bidirectional communication
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
environment neutral
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Darko Palic 2009-09-18 09:52:00 EDT
If you try to setup a monitoring environment with a server and multiple agents, which are NATed behind firewalls, each agent needs its own port for the "callback"-communication from server to client.

A better solution for this case (maybe for every case) would be, that the client needs to poll the server. So a unidirectional communication would be used, which is much easier to maintain from the point of firewalling.

- UI-data for the user may get inaccurate, due to cached data until the agent reports its newest state.
- The agent cannot be checked from the server if it is still alive.
- Resources on the server could get exhausted, if data for the agent aren't picked up from the agent.

Possible solutions:
- A possible solution for inaccurate data which the user may see on the server-UI would be:
   - a generic agent poll parameter, which is by default the shortes period of any configured watch for the agent. Lets call it agent.polltime
   - If the user is now on the UI on any resource of one server. The agent.polltime should be automatically be reduced to e.g. 5 seconds. So you could simulate a responsiveness of the agents.

- For the issue to determine if the agent is still alive it would be an option to setup a hearbeat of every agent to the server. So if the agent does not respond in a defined time we could raise a warning, because something must be wrong.

- For the issue with exhausting resources. It would be fine, if the command queue for the agents get limited. E.g. if an agent does not respond within 30 days, the queue gets dropped, since the agent seems to be forever down.
Comment 1 Darko Palic 2009-09-30 14:09:30 EDT
One possible solution to workaround the issue with asynchronity could be to use the http-push.
A solution could be:
- a user requests an operation on a agent.
- with the next agent/client request activate the http-push
- the server would now be able to communicate synchronously with the client on the servers interval needs. No polling from the client would be necessary.

Here you would have only one painpoint. The delay between the first request of a user to update agent data until the http-push gets updated. Everything else should behave as it is now.
Comment 2 Ian Springer 2009-10-08 11:45:56 EDT
Here's a blog post by Joseph discussing switching to unidirectional comm:
Comment 3 Red Hat Bugzilla 2009-11-10 16:04:25 EST
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2436
Comment 4 wes hayutin 2010-02-16 12:10:21 EST
mass add of key word FutureFeature to help track
Comment 5 Jay Shaughnessy 2014-05-09 11:53:58 EDT
This is still on the wish-list.