Right now when downloading a patch from the server the agent is allowed 10mins to complete the streaming of the file before its request gets timed out and the following exception is seen 2009-01-23 12:45:53,453 ERROR [ResourceContainer.invoker.nonDaemon-5] (enterprise.communications.command.client.ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.security-token=1232638162187-1204607085-8541727374551626108, rhq.send-throttle=true}]; params=[{targetInterfaceName=org.rhq.core.clientapi.server.content.ContentServerService, invocation=NameBasedInvocation[downloadPackageBitsGivenResource]}]]. Cause: java.util.concurrent.TimeoutException:null java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:211) at java.util.concurrent.FutureTask.get(FutureTask.java:85) at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.run(ClientCommandSenderTask.java:143) at org.rhq.enterprise.communications.command.client.ClientCommandSender.sendSynch(ClientCommandSender.java:616) at org.rhq.enterprise.communications.command.client.ClientRemotePojoFactory$RemotePojoProxyHandler.invoke(ClientRemotePojoFactory.java:407) at $Proxy9.downloadPackageBitsGivenResource(Unknown Source) at org.rhq.core.pc.content.ContentManager.downloadPackageBits(ContentManager.java:265) at com.jboss.jbossnetwork.product.jbpm.handlers.JONServerDownloadActionHandler.downloadBits(JONServerDownloadActionHandler.java:68) at com.jboss.jbossnetwork.product.jbpm.handlers.JONServerDownloadActionHandler.run(JONServerDownloadActionHandler.java:48) at com.jboss.jbossnetwork.product.jbpm.handlers.BaseHandler.execute(BaseHandler.java:130) at org.jbpm.graph.def.Action.execute(Action.java:123) at org.jbpm.graph.def.Node.execute(Node.java:328) at org.jbpm.graph.def.Node.enter(Node.java:316) at org.jbpm.graph.def.Transition.take(Transition.java:119) at org.jbpm.graph.def.Node.leave(Node.java:382) at org.jbpm.graph.node.StartState.leave(StartState.java:70) at org.jbpm.graph.exe.Token.signal(Token.java:174) at org.jbpm.graph.exe.Token.signal(Token.java:123) at org.jbpm.graph.exe.ProcessInstance.signal(ProcessInstance.java:217) at org.rhq.plugins.jbossas.JBPMWorkflowManager.run(JBPMWorkflowManager.java:149) at org.rhq.plugins.jbossas.JBossASServerComponent.deployPackages(JBossASServerComponent.java:382) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.rhq.core.pc.inventory.ResourceContainer$ComponentInvocationThread.call(ResourceContainer.java:450) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) 2009-01-23 12:45:53,453 DEBUG [ResourceContainer.invoker.nonDaemon-5] (enterprise.communications.command.client.ClientRemotePojoFactory)- {ClientRemotePojoFactory.execution-failure}Failed to execute remote POJO method [downloadPackageBitsGivenResource]. Cause: java.util.concurrent.TimeoutException:null As discussed below we should increase this timeout probably to 45mins, incase the server is connected to the CSP over a slow connection or a very large patch is being downloaded. 12:55:46 PM) ccrouch: so the agent makes a call to the server which then goes and downloads the patch from the CSP, and then streams it back to the agent (12:55:56 PM) ccrouch: atleast thats how it used to work (12:56:05 PM) mazz: yeah, that makes sense from this stack (12:56:45 PM) mazz: well, we can use my comm layers @Timeout annotation on this download method - since 10 minutes in general might not be enough for this kind of thing (we couldbe downloading very large binaries) (12:57:12 PM) ccrouch: yeah i'll raise a jira for this (12:57:28 PM) ccrouch: for right now though, assuming they have a fast pipe, they should be ok (12:57:33 PM) mazz: [{targetInterfaceName=org.rhq.core.clientapi.server.content.ContentServerService, invocation=NameBasedInvocation[downloadPackageBitsGivenResource]}]] (12:57:40 PM) mazz: that's the interface that needs a new @Timeout (12:58:19 PM) mazz: there are other "download" methods around here too - might need to look at these also (12:59:42 PM) ccrouch: the txn timeout is 45mins (12:59:53 PM) ccrouch: @TransactionTimeout(45 * 60) (12:59:53 PM) ccrouch: public long outputPackageVersionBitsGivenResource(int resourceId, PackageDetailsKey packageDetailsKey, (12:59:53 PM) ccrouch: (1:00:04 PM) ccrouch: so that maynot be a method call timeout (1:00:23 PM) mazz: this is the JPA timeout - what you hit was the agent comm timeout (1:00:38 PM) mazz: it makes sense to make them the same here - 45 minutes (1:00:58 PM) mazz: so, I would put @Timeout(45 * 60 * 1000) annottaion on that comm interface (1:01:27 PM) ccrouch: sorry, what i meant to say was (1:01:27 PM) ccrouch: "so that may not be a *bad* method call timeout *to use too*"
we definitely don't want to make the product less usable for people trying to connect to our CSP over a slow connection - marking for 1.2 inclusion.
will make sure ContentServerService methods match timeouts for the ContentManagerBean and ContentSourceManagerBean SLSB tx timeouts.
added comm annotation @Timeout(45 * 60 * 1000) to ContentServerService interface's methods related to downloading bits
Testing notes: A suitable test would be to use iptables and reduce throughput across an ethernet device to slow data transfer to a trickle. We've been unsuccessful in doing this so far. That said -- this looks to be a simple code change (adding '* 1000') to the timeout formula. Given this, in addition to QA not currently have the resources/knowledge to test this, dev and qa have agreed it can be closed.
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1396