Bug 858829

Summary: Client fails to send measurements - ClientCommandSenderTask - always times out with error
Product: [Other] RHQ Project Reporter: Elias Ross <genman>
Component: AgentAssignee: RHQ Project Maintainer <rhq-maint>
Status: NEW --- QA Contact: Mike Foley <mfoley>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.2CC: hrupp
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Elias Ross 2012-09-19 14:51:35 EDT
This is RHQ 4.2 and might have been addressed.

Still, I see this on my agent:

2012-09-19 18:40:47,365 ERROR [ClientCommandSenderTask Thread #4] (enterprise.communications.command.client.ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.agent-name=app001, rhq.externalizable-strategy=AGENT, rhq.security-token=, rhq.guaranteed-delivery=true, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[mergeMeasurementReport], targetInterfaceName=org.rhq.core.clientapi.server.measurement.MeasurementServerService}]]. Cause: java.util.concurrent.TimeoutException:null. Cause: java.util.concurrent.TimeoutException
2012-09-19 18:40:47,365 WARN  [ClientCommandSenderTask Thread #4] (enterprise.communications.command.client.ClientCommandSenderTask)- {ClientCommandSenderTask.queuing-failed-command}The command that failed has its guaranteed-delivery flag set so it is being queued again

When the agent starts to report this, it gets into a state where measurements fail to be reported. Restarting the agent DOES NOT WORK.

The fix is to clean out the container using the

--purgeplugins --purgedata

options.

I need to gather more details.
Comment 1 Elias Ross 2012-09-19 15:11:31 EDT
Note: Other commands (like 'get live data' etc.) work from the server side.

I do see this a lot:

2012-09-19 19:09:23,464 DEBUG [ClientCommandSenderTask Timer Thread #0] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {CommandService.remote-pojo-execute-not-permitted}Command not permitted - server reached its limit of concurrent invocations [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{... rhq.guaranteed-delivery=true, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[mergeMeasurementReport], targetInterfaceName=org.rhq.core.clientapi.server.measurement.MeasurementServerService}]]. Retry in [6,000]ms

Might need to tune the server a bit.
Comment 2 Elias Ross 2012-12-20 14:13:56 EST
The cause of this appears to be a very busy database. You can increase the number of active threads writing metrics to the database but this may or may not help.