Bug 535885 (RHQ-292)
| Summary: | re-evaluate persistent fifo storage mechanisms for server->agent comm | ||
|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | John Mazzitelli <mazz> | 
| Component: | Performance | Assignee: | John Mazzitelli <mazz> | 
| Status: | CLOSED NEXTRELEASE | QA Contact: | Corey Welton <cwelton> | 
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | unspecified | Keywords: | SubFeature | 
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| URL: | http://jira.rhq-project.org/browse/RHQ-292 | ||
| Whiteboard: | |||
| Fixed In Version: | 1.2 | Doc Type: | Enhancement | 
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 536277 | ||
| 
 
        
          Description
        
        
          John Mazzitelli
        
        
        
        
        
          2008-04-14 16:06:00 UTC
        
       
      
      
      
    As far as I can see, the only server -> agent APIs that are declared as guaranteed are: DiscoveryAgentService: void synchronizeInventory(int resourceId, EnumSet<SynchronizationType> synchronizationTypes); void removeResource(int resourceId); I'm not fully sure when synchronizeInventory would be called, but that's the only one I'd worry would get bogged down with any regularity to cause the persistent FIFO to kick in. This topic has been re-visited with the introduction of multi-server HA support:
Design Decision
The decision is to remove use of server->agent reliable messaging and to minimally add necessary comments to indicate in the code that the @Asynchronous( guaranteedDelivery="true") annotation should not be used for AgentService services (i.e. Services defined for Server->Agent communication, org.rhq.core.clientapi.agent.*.*AgentService.java). This annotation will be removed in:
    * DiscoveryAgentService.synchronizeInventory() : notification of newly committed resources in AD portlet
    * DiscoveryAgentService.removeResource() : notification of uninventoried resources
      In both of these cases the new, more robust synchronization algorithms for 1.1 will ensure proper synch on agent startup regardless of the delivery of these messages.
A third scenario discussed was notification of updated (measurement) schedules. It turns out that reliable messaging was not in place for these updates and the agent must be up for the server-side update to succeed. So, it was a non-issue. If this behavior needed to change it could be handled by adding a getLatestSchedulesForResourceId for modified resources (see InventoryManager.synchInventory), or if that is too coarse, a new update time specific to schedule update.
Or, in the case above, or in general, if we need to re-introduce reliable server->agent messaging we can revisit the options listed above, particularly RHQ-292.
r1272 Enforcing the new policy of no reliable messaging on calls from server->agent.  Removed guaranteedDelivery option where is was used and added the ability for a ClientRemotePojoFactory to disable the option for any proxies generated by the factory.  If found at runtime the option is forced to false and an error is generated in the log (to notify devs that they've made a mistake or need to revisit the solution).
    Moving to 1.2 since we should at least evaluate options and pick (assuming we ever want server->agent comm), but for now choices for 1.1 make that discussion moot. even if we have no server->agent guaranteed messages (not sure if this is true today), i think some code has to be changed to ensure the server never writes this command spool .dat file (I believe just a change to the comm configuraiton file on the server will turn this off). as of today, there is no guaranteed delivery for any server->agent messages (to confirm, do a search for all usages of org.rhq.core.communications.command.annotation.Asynchronous - there should be none in any *AgentService interfaces). server will no longer create the spool files. to test this, simply start a server that has one or more agents connected to it. You should no longer see any command_spool.dat files in the jbossas/server/default/data directory. where 32163 is the pid of the JON Server JVM -bash-3.00$ ls -la /proc/32163/fd > /tmp/jon-files.txt -bash-3.00$ cat /tmp/jon-files.txt | wc -l 1027 -bash-3.00$ cat /tmp/jon-files.txt | grep command-spool | wc -l 806 So 80% of the server's open files were coming from the these command spool files. QA Verified, spool files no longer occur. This bug was previously known as http://jira.rhq-project.org/browse/RHQ-292 This bug relates to RHQ-1408  |