Bug 1053922
| Summary: | Agent does not set HTTP connect or read timeout in JBoss remoting | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | Elias Ross <genman> | ||||
| Component: | Agent | Assignee: | Thomas Segismont <tsegismo> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.9 | CC: | hrupp, tsegismo | ||||
| Target Milestone: | --- | ||||||
| Target Release: | RHQ 4.10 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-04-23 12:30:28 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Note the default timeout is 10 minutes, which is really too long for a reasonable connect or read timeout. I'm not sure it's worth having an additional setting or not, however. Created attachment 856768 [details]
Patch for RHQ_4_9_0
Note that the patch is different than what is shown in the bug.
Merged in master commit 9302f90bc0d81905bf62649666b7df0913acdac7 Author: Elias Ross <genman> Date: Wed Jan 29 17:02:42 2014 +0100 Bulk closing of 4.10 issues. If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10. |
Description of problem: I've observed agents hanging at failover. It seems that RHQ does not set any sort of agent timeout. "ClientCommandSenderTask Timer Thread #4489" daemon prio=10 tid=0x0000000045514000 nid=0x6ba7 runnable [0x0000000046ce3000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) - locked <0x00000000e1389748> (a java.io.BufferedInputStream) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:697) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195) - locked <0x00000000e13897f0> (a sun.net.www.protocol.http.HttpURLConnection) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379) at org.jboss.remoting.transport.http.HTTPClientInvoker.getResponseCode(HTTPClientInvoker.java:1325) at org.jboss.remoting.transport.http.HTTPClientInvoker.useHttpURLConnection(HTTPClientInvoker.java:372) at org.jboss.remoting.transport.http.HTTPClientInvoker.makeInvocation(HTTPClientInvoker.java:253) at org.jboss.remoting.transport.http.HTTPClientInvoker.transport(HTTPClientInvoker.java:176) at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:169) at org.jboss.remoting.Client.invoke(Client.java:2084) at org.jboss.remoting.Client.invoke(Client.java:879) at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.rawSend(JBossRemotingRemoteCommunicator.java:514) at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutCallbacks(JBossRemotingRemoteCommunicator.java:456) at org.rhq.enterprise.agent.AgentMain.sendConnectRequestToServer(AgentMain.java:2114) at org.rhq.enterprise.agent.AgentMain.switchCommServer(AgentMain.java:2049) at org.rhq.enterprise.agent.AgentMain.failoverToNewServer(AgentMain.java:2007) - locked <0x00000000e0164af8> (a [J) at org.rhq.enterprise.agent.FailoverFailureCallback.failureDetected(FailoverFailureCallback.java:104) at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.invokeFailureCallbackIfNeeded(JBossRemotingRemoteCommunicator.java:625) at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutInitializeCallback(JBossRemotingRemoteCommunicator.java:478) at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.send(JBossRemotingRemoteCommunicator.java:496) at org.rhq.enterprise.communications.command.client.AbstractCommandClient.invoke(AbstractCommandClient.java:143) at org.rhq.enterprise.communications.command.client.ClientCommandSender.send(ClientCommandSender.java:1084) at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.send(ClientCommandSenderTask.java:229) at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:107) at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:55) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ... "ClientCommandSenderTask Timer Thread #4488" daemon prio=10 tid=0x000000004436d000 nid=0x6ba6 waiting for monitor entry [0x0000000046ee6000] java.lang.Thread.State: BLOCKED (on object monitor) at org.rhq.enterprise.agent.AgentMain.failoverToNewServer(AgentMain.java:1992) - waiting to lock <0x00000000e0164af8> (a [J) at org.rhq.enterprise.agent.FailoverFailureCallback.failureDetected(FailoverFailureCallback.java:104) at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.invokeFailureCallbackIfNeeded(JBossRemotingRemoteCommunicator.java:625) at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutInitializeCallback(JBossRemotingRemoteCommunicator.java:478) at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.send(JBossRemotingRemoteCommunicator.java:496) at org.rhq.enterprise.communications.command.client.AbstractCommandClient.invoke(AbstractCommandClient.java:143) at org.rhq.enterprise.communications.command.client.ClientCommandSender.send(ClientCommandSender.java:1084) at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.send(ClientCommandSenderTask.java:229) at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:107) at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:55) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) "RHQ Server Polling Thread" daemon prio=10 tid=0x00002aaab0d5c000 nid=0x5f56 waiting for monitor entry [0x0000000043de8000] java.lang.Thread.State: BLOCKED (on object monitor) at org.rhq.enterprise.agent.AgentMain.failoverToNewServer(AgentMain.java:1992) - waiting to lock <0x00000000e0164af8> (a [J) at org.rhq.enterprise.agent.FailoverFailureCallback.failureDetected(FailoverFailureCallback.java:104) at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.invokeFailureCallbackIfNeeded(JBossRemotingRemoteCommunicator.java:625) at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutInitializeCallback(JBossRemotingRemoteCommunicator.java:478) at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.send(JBossRemotingRemoteCommunicator.java:496) at org.rhq.enterprise.communications.command.client.AbstractCommandClient.invoke(AbstractCommandClient.java:143) at org.rhq.enterprise.communications.command.client.ClientCommandSender.send(ClientCommandSender.java:1084) at org.rhq.enterprise.communications.command.client.ServerPollingThread.run(ServerPollingThread.java:100) Version-Release number of selected component (if applicable): 4.9 How reproducible: sometimes Steps to Reproduce: 1. Create an HTTP listener that does not accept a socket or simply does not send any response Actual results: See that the agent hangs and will not failover the working server Expected results: Failover to correct server. Additional info: It appears JBoss Remoting has a 'timeout' option which can be set. This probably works but hasn't been tested... diff --git a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentMain.java b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentMain.java index b46fb4b..3274a8e 100644 --- a/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentMain.java +++ b/modules/enterprise/agent/src/main/java/org/rhq/enterprise/agent/AgentMain.java @@ -2692,6 +2692,13 @@ private RemoteCommunicator createServerRemoteCommunicator(String uri, boolean wi config.put(HTTPSClientInvoker.IGNORE_HTTPS_HOST, "true"); } + // The HTTP transport can hang for a long, long time + // If for example the server is hung, this ensures we do not wait forever and can failover + long timeout = m_configuration.getClientSenderCommandTimeout() / 1000; + if (timeout > 0) { + config.put("timeout", Long.toString(timeout)); + } + RemoteCommunicator remote_comm = new JBossRemotingRemoteCommunicator(uri, config); if (withFailover) { remote_comm.setFailureCallback(new FailoverFailureCallback(this));