Bug 1053922 - Agent does not set HTTP connect or read timeout in JBoss remoting
Summary: Agent does not set HTTP connect or read timeout in JBoss remoting
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: RHQ 4.10
Assignee: Thomas Segismont
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-16 01:28 UTC by Elias Ross
Modified: 2014-04-23 12:30 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-04-23 12:30:28 UTC
Embargoed:


Attachments (Terms of Use)
Patch for RHQ_4_9_0 (1.40 KB, patch)
2014-01-28 19:17 UTC, Elias Ross
no flags Details | Diff

Description Elias Ross 2014-01-16 01:28:51 UTC
Description of problem:

I've observed agents hanging at failover. It seems that RHQ does not set any sort of agent timeout.


"ClientCommandSenderTask Timer Thread #4489" daemon prio=10 tid=0x0000000045514000 nid=0x6ba7 runnable [0x0000000046ce3000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        - locked <0x00000000e1389748> (a java.io.BufferedInputStream)
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:697)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
        - locked <0x00000000e13897f0> (a sun.net.www.protocol.http.HttpURLConnection)
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
        at org.jboss.remoting.transport.http.HTTPClientInvoker.getResponseCode(HTTPClientInvoker.java:1325)
        at org.jboss.remoting.transport.http.HTTPClientInvoker.useHttpURLConnection(HTTPClientInvoker.java:372)
        at org.jboss.remoting.transport.http.HTTPClientInvoker.makeInvocation(HTTPClientInvoker.java:253)
        at org.jboss.remoting.transport.http.HTTPClientInvoker.transport(HTTPClientInvoker.java:176)
        at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:169)
        at org.jboss.remoting.Client.invoke(Client.java:2084)
        at org.jboss.remoting.Client.invoke(Client.java:879)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.rawSend(JBossRemotingRemoteCommunicator.java:514)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutCallbacks(JBossRemotingRemoteCommunicator.java:456)
        at org.rhq.enterprise.agent.AgentMain.sendConnectRequestToServer(AgentMain.java:2114)
        at org.rhq.enterprise.agent.AgentMain.switchCommServer(AgentMain.java:2049)
        at org.rhq.enterprise.agent.AgentMain.failoverToNewServer(AgentMain.java:2007)
        - locked <0x00000000e0164af8> (a [J)
        at org.rhq.enterprise.agent.FailoverFailureCallback.failureDetected(FailoverFailureCallback.java:104)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.invokeFailureCallbackIfNeeded(JBossRemotingRemoteCommunicator.java:625)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutInitializeCallback(JBossRemotingRemoteCommunicator.java:478)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.send(JBossRemotingRemoteCommunicator.java:496)
        at org.rhq.enterprise.communications.command.client.AbstractCommandClient.invoke(AbstractCommandClient.java:143)
        at org.rhq.enterprise.communications.command.client.ClientCommandSender.send(ClientCommandSender.java:1084)
        at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.send(ClientCommandSenderTask.java:229)
        at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:107)
        at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:55)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

...

"ClientCommandSenderTask Timer Thread #4488" daemon prio=10 tid=0x000000004436d000 nid=0x6ba6 waiting for monitor entry [0x0000000046ee6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.rhq.enterprise.agent.AgentMain.failoverToNewServer(AgentMain.java:1992)
        - waiting to lock <0x00000000e0164af8> (a [J)
        at org.rhq.enterprise.agent.FailoverFailureCallback.failureDetected(FailoverFailureCallback.java:104)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.invokeFailureCallbackIfNeeded(JBossRemotingRemoteCommunicator.java:625)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutInitializeCallback(JBossRemotingRemoteCommunicator.java:478)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.send(JBossRemotingRemoteCommunicator.java:496)
        at org.rhq.enterprise.communications.command.client.AbstractCommandClient.invoke(AbstractCommandClient.java:143)
        at org.rhq.enterprise.communications.command.client.ClientCommandSender.send(ClientCommandSender.java:1084)
        at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.send(ClientCommandSenderTask.java:229)
        at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:107)
        at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:55)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)


"RHQ Server Polling Thread" daemon prio=10 tid=0x00002aaab0d5c000 nid=0x5f56 waiting for monitor entry [0x0000000043de8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.rhq.enterprise.agent.AgentMain.failoverToNewServer(AgentMain.java:1992)
        - waiting to lock <0x00000000e0164af8> (a [J)
        at org.rhq.enterprise.agent.FailoverFailureCallback.failureDetected(FailoverFailureCallback.java:104)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.invokeFailureCallbackIfNeeded(JBossRemotingRemoteCommunicator.java:625)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutInitializeCallback(JBossRemotingRemoteCommunicator.java:478)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.send(JBossRemotingRemoteCommunicator.java:496)
        at org.rhq.enterprise.communications.command.client.AbstractCommandClient.invoke(AbstractCommandClient.java:143)
        at org.rhq.enterprise.communications.command.client.ClientCommandSender.send(ClientCommandSender.java:1084)
        at org.rhq.enterprise.communications.command.client.ServerPollingThread.run(ServerPollingThread.java:100)



Version-Release number of selected component (if applicable): 4.9


How reproducible: sometimes


Steps to Reproduce:
1. Create an HTTP listener that does not accept a socket or simply does not send any response


Actual results:
See that the agent hangs and will not failover the working server

Expected results:
Failover to correct server.

Additional info:

It appears JBoss Remoting has a 'timeout' option which can be set. This probably works but hasn't been tested...


diff --git a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentMain.java b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentMain.java
index b46fb4b..3274a8e 100644
--- a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentMain.java
+++ b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentMain.java
@@ -2692,6 +2692,13 @@ private RemoteCommunicator createServerRemoteCommunicator(String uri, boolean wi
             config.put(HTTPSClientInvoker.IGNORE_HTTPS_HOST, "true");
         }
 
+        // The HTTP transport can hang for a long, long time
+        // If for example the server is hung, this ensures we do not wait forever and can failover
+        long timeout = m_configuration.getClientSenderCommandTimeout() / 1000;
+        if (timeout > 0) {
+            config.put("timeout", Long.toString(timeout));
+        }
+
         RemoteCommunicator remote_comm = new JBossRemotingRemoteCommunicator(uri, config);
         if (withFailover) {
             remote_comm.setFailureCallback(new FailoverFailureCallback(this));

Comment 1 Elias Ross 2014-01-16 01:31:34 UTC
Note the default timeout is 10 minutes, which is really too long for a reasonable connect or read timeout. I'm not sure it's worth having an additional setting or not, however.

Comment 2 Elias Ross 2014-01-28 19:17:42 UTC
Created attachment 856768 [details]
Patch for RHQ_4_9_0

Note that the patch is different than what is shown in the bug.

Comment 3 Thomas Segismont 2014-01-29 16:04:50 UTC
Merged in master

commit 9302f90bc0d81905bf62649666b7df0913acdac7
Author: Elias Ross <genman>
Date:   Wed Jan 29 17:02:42 2014 +0100

Comment 4 Heiko W. Rupp 2014-04-23 12:30:28 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.


Note You need to log in before you can comment on or make changes to this bug.