Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 903453

Summary: Remoting "Read timed out" when starting multiple servers on a host of single CPU
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Osamu Nagano <onagano>
Component: jbossasAssignee: Fernando Nasser <fnasser>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0.1   
Target Milestone: CR1   
Target Release: EAP 6.1.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-28 02:59:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 903472    

Description Osamu Nagano 2013-01-24 02:40:21 UTC
Description of problem:
On single or a small number of CPU machine, launching number of (about 8 or above) servers at the same time like in the start-up results in 'org.xnio.channels.ReadTimeoutException: Read timed out'.  In EAP 6.0.0, this can be avoided by longer time-out via 'jboss.host.server.connection.timeout' and 'jboss.host.domain.connection.timeout' system properties.  But those time-out settings don't take effect in EAP 6.0.1 due to a mechanism change in DC<->HC communication.

How reproducible:
Always in the customer's environment.

Steps to Reproduce:
1. In a domain mode of EAP 6.0.1, set a number of servers (about 10) in host.xml.
2. Start the domain.  All managed servers will start at the same time.
3. The following exception happens in the host controller.
  
Actual results:
[Host Controller] 16:15:58,036 ERROR [org.jboss.remoting.remote.connection] (Remoting "nceaptint03:MANAGEMENT" read-1) JBREM000200: Remote connection failed: org.xnio.channels.ReadTimeoutException: Read timed out
[Server:ib-demo-server1-group2] 16:16:00,236 ERROR [org.jboss.remoting.remote.connection] (Remoting "int03master:ib-demo-server1-group2:MANAGEMENT" read-1) JBREM000200: Remote connection failed: java.io.IOException: JBREM000201: Received invalid message on Remoting connection 111bda67 to nceaptint03/172.16.139.60:9999
[Server:ib-demo-server1-group2] 16:16:00,704 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-1) MSC00001: Failed to start service jboss.host.controller.client: org.jboss.msc.service.StartException in service jboss.host.controller.client: java.net.ConnectException: JBAS012174: Could not connect to remote://172.16.139.60:9999. The connection failed
[Server:ib-demo-server1-group2] 	at org.jboss.as.server.mgmt.domain.HostControllerServerClient.start(HostControllerServerClient.java:172) [jboss-as-server-7.1.3.Final.jar:7.1.3.Final]
[Server:ib-demo-server1-group2] 	at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1811) [jboss-msc-1.0.2.GA.jar:1.0.2.GA]
[Server:ib-demo-server1-group2] 	at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1746) [jboss-msc-1.0.2.GA.jar:1.0.2.GA]
[Server:ib-demo-server1-group2] 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_30]
[Server:ib-demo-server1-group2] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_30]
[Server:ib-demo-server1-group2] 	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_30]
[Server:ib-demo-server1-group2] Caused by: java.net.ConnectException: JBAS012174: Could not connect to remote://172.16.139.60:9999. The connection failed
[Server:ib-demo-server1-group2] 	at org.jboss.as.protocol.ProtocolConnectionUtils.connectSync(ProtocolConnectionUtils.java:118) [jboss-as-protocol-7.1.3.Final.jar:7.1.3.Final]
[Server:ib-demo-server1-group2] 	at org.jboss.as.protocol.ProtocolChannelClient.connectSync(ProtocolChannelClient.java:84) [jboss-as-protocol-7.1.3.Final.jar:7.1.3.Final]
[Server:ib-demo-server1-group2] 	at org.jboss.as.server.mgmt.domain.HostControllerServerConnection.openChannel(HostControllerServerConnection.java:158) [jboss-as-server-7.1.3.Final.jar:7.1.3.Final]
[Server:ib-demo-server1-group2] 	at org.jboss.as.server.mgmt.domain.HostControllerServerConnection.connect(HostControllerServerConnection.java:86) [jboss-as-server-7.1.3.Final.jar:7.1.3.Final]
[Server:ib-demo-server1-group2] 	at org.jboss.as.server.mgmt.domain.HostControllerServerClient.start(HostControllerServerClient.java:148) [jboss-as-server-7.1.3.Final.jar:7.1.3.Final]
[Server:ib-demo-server1-group2] 	... 5 more
[Server:ib-demo-server1-group2] Caused by: java.io.IOException: JBREM000201: Received invalid message on Remoting connection 111bda67 to nceaptint03/172.16.139.60:9999
[Server:ib-demo-server1-group2] 	at org.jboss.remoting3.remote.ClientConnectionOpenListener$Capabilities.handleEvent(ClientConnectionOpenListener.java:424) [jboss-remoting-3.2.8.SP1.jar:3.2.8.SP1]
[Server:ib-demo-server1-group2] 	at org.jboss.remoting3.remote.ClientConnectionOpenListener$Capabilities.handleEvent(ClientConnectionOpenListener.java:226) [jboss-remoting-3.2.8.SP1.jar:3.2.8.SP1]
[Server:ib-demo-server1-group2] 	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72) [xnio-api-3.0.6.GA.jar:3.0.6.GA]
[Server:ib-demo-server1-group2] 	at org.xnio.channels.TranslatingSuspendableChannel.handleReadable(TranslatingSuspendableChannel.java:189) [xnio-api-3.0.6.GA.jar:3.0.6.GA]
[Server:ib-demo-server1-group2] 	at org.xnio.channels.TranslatingSuspendableChannel$1.handleEvent(TranslatingSuspendableChannel.java:103) [xnio-api-3.0.6.GA.jar:3.0.6.GA]
[Server:ib-demo-server1-group2] 	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72) [xnio-api-3.0.6.GA.jar:3.0.6.GA]
[Server:ib-demo-server1-group2] 	at org.xnio.channels.TranslatingSuspendableChannel.handleReadable(TranslatingSuspendableChannel.java:189) [xnio-api-3.0.6.GA.jar:3.0.6.GA]
[Server:ib-demo-server1-group2] 	at org.xnio.ssl.JsseConnectedSslStreamChannel.handleReadable(JsseConnectedSslStreamChannel.java:180) [xnio-api-3.0.6.GA.jar:3.0.6.GA]
[Server:ib-demo-server1-group2] 	at org.xnio.channels.TranslatingSuspendableChannel$1.handleEvent(TranslatingSuspendableChannel.java:103) [xnio-api-3.0.6.GA.jar:3.0.6.GA]
[Server:ib-demo-server1-group2] 	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72) [xnio-api-3.0.6.GA.jar:3.0.6.GA]
[Server:ib-demo-server1-group2] 	at org.xnio.nio.NioHandle.run(NioHandle.java:90)
[Server:ib-demo-server1-group2] 	at org.xnio.nio.WorkerThread.run(WorkerThread.java:187)
[Server:ib-demo-server1-group2] 	at ...asynchronous invocation...(Unknown Source)
[Server:ib-demo-server1-group2] 	at org.jboss.remoting3.EndpointImpl.doConnect(EndpointImpl.java:270) [jboss-remoting-3.2.8.SP1.jar:3.2.8.SP1]
[Server:ib-demo-server1-group2] 	at org.jboss.remoting3.EndpointImpl.doConnect(EndpointImpl.java:251) [jboss-remoting-3.2.8.SP1.jar:3.2.8.SP1]
[Server:ib-demo-server1-group2] 	at org.jboss.remoting3.EndpointImpl.connect(EndpointImpl.java:349) [jboss-remoting-3.2.8.SP1.jar:3.2.8.SP1]
[Server:ib-demo-server1-group2] 	at org.jboss.remoting3.EndpointImpl.connect(EndpointImpl.java:337) [jboss-remoting-3.2.8.SP1.jar:3.2.8.SP1]
[Server:ib-demo-server1-group2] 	at org.jboss.as.protocol.ProtocolConnectionUtils.connect(ProtocolConnectionUtils.java:74) [jboss-as-protocol-7.1.3.Final.jar:7.1.3.Final]
[Server:ib-demo-server1-group2] 	at org.jboss.as.protocol.ProtocolConnectionUtils.connectSync(ProtocolConnectionUtils.java:88) [jboss-as-protocol-7.1.3.Final.jar:7.1.3.Final]
[Server:ib-demo-server1-group2] 	... 9 more

Expected results:
All managed servers should start normally regardless of the number of servers.

Comment 1 Osamu Nagano 2013-01-24 02:53:31 UTC
Besides the case 00756264, there was a communication between the customer and a RH developer on the community thread [1].  And the developer introduced a new system property 'org.jboss.as.host.start.servers.sequential' in the code [2], to start the servers sequentially to avoid the exception.  This feature works well for the customer and I figured out the necessary 6 commits from his branch to apply on EAP_6.0.1.GA.

pick 793ba23 [AS7-5556] Where we convert to a String ensure we set the charset to UTF-8 for both processes.
pick 25bfd87 Fix wrong unmarshalling order of process inventory data.
pick d3cdd74 [AS7-5887] reconnect servers automatically
pick f3e0b72 add managed server std.in state
pick 1ed0175 [AS7-6230] wait until the managed server opens it's mgmt channel by default
pick 2d956ec [AS7-6230] add a blocking start property for the host-controller

[1] https://community.jboss.org/thread/215769
[2] https://github.com/jbossas/jboss-as/pull/3794

Comment 2 Osamu Nagano 2013-01-28 02:59:27 UTC
The fix is already included in the upstream by AS7-6230 into 7.2.0.CR1.  So I close this as targeted to EAP 6.1.0.CR1 too.