Setup: 4 node cluster, one node at at time is shutdown, while standalone clients keep calling the application. When server is gracefully shutting down, the following error was seen on the client side: 16:15:32,082 INFO [org.jboss.ejb.client.remoting] (Remoting "config-based-ejb-client-endpoint" task-4) EJBCLIENT000016: Channel Channel ID abd07b66 (outbound) of Remoting connection 73efe4ac to perf18/10.16.90.54:4447 can no longer process messages 16:15:32,100 ERROR [org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver] (Remoting "config-based-ejb-client-endpoint" task-4) Failed to open channel for context EJBReceiverContext{clientContext=org.jboss.ejb.client.EJBClientContext@26d7ad36, receiver=Remoting connection EJB receiver [connection=org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection@35908558,channel=jboss.ejb,nodename=perf18]} org.jboss.remoting3.NotOpenException: Cannot open new channel because close was initiated at org.jboss.remoting3.remote.RemoteConnectionHandler.handleOutboundChannelOpen(RemoteConnectionHandler.java:188) at org.jboss.remoting3.remote.RemoteConnectionHandler.open(RemoteConnectionHandler.java:314) at org.jboss.remoting3.ConnectionImpl.openChannel(ConnectionImpl.java:75) at org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection.openChannel(ConnectionPool.java:223) at org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver.associate(RemotingConnectionEJBReceiver.java:136) at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:375) at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:327) at org.jboss.ejb.client.remoting.EJBClientContextConnectionReconnectHandler.reconnect(EJBClientContextConnectionReconnectHandler.java:66) at org.jboss.ejb.client.EJBClientContext$ReconnectAttempt.run(EJBClientContext.java:1287) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Server log: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-shutdown-dist-async/20/artifact/report/config/jboss-perf18/server.log Client log: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-shutdown-dist-async/20/console-perf17/consoleText Server setup: https://jenkins.mw.lab.eng.bos.redhat.com/job/eap-6x-failover-ejb-ejbremote-shutdown-dist-async/20/artifact/report/config/jboss-perf18/standalone-ha.xml
Assigning jpai EJB issues to david.lloyd. Please re-assign to Cheng or others as needed.
From what I see in the logs, I'm not sure if this is actually an error. The client tries an invocation and picks a cluster node which is shutting down at that moment. The outbound remoting channel to that node gets closed in the meantime, so remoting tries to re-create it, but it fails because the server refuses it. The client logs an error and then picks another node. The EJB invocation proceeds correctly, doesn't it? I don't see any error message suggesting that the invocation failed. Or am I mistaken?
Hi Jan. From my point of view this is a bug. What is happening here is the following: 1) One end close the connection (the shutdown of the connection starts) 2) the other end detects the broken channel and registers a reconnect handler. 3) The ConnectionPool class inside the ejb client has two ways of removing connection from the cache a) through close listener in the connection pool https://github.com/elguardian/jboss-ejb-client/blob/master/src/main/java/org/jboss/ejb/client/remoting/ConnectionPool.java#L273 b) through the release operation: through close listener in the connection pool https://github.com/elguardian/jboss-ejb-client/blob/master/src/main/java/org/jboss/ejb/client/remoting/ConnectionPool.java#L92 These two ways are not thread safe. The attempt to reconnect happens before the 3.a (through the broken channel event). This race condition leads to a situation where the ConnectionPool returns a connection is going to be closed. For avoiding this situation the reconnect handler needs to ensure that is executed *only* after the connection has been closed. This mean serialize the events. Whenever the connection is broken, add a close listener to the channel for registering the reconnect handler. This ensure the events are sequential (it's not going to happen that a close channel tries to reconnect till the connection is cleaned up) avoiding the situation of a closing connection. I added the solution for this in another BZ#1020074 (ejb async reconnection) https://github.com/elguardian/jboss-ejb-client/blob/44982ae7a5e19c76ad96040dccddff8cfee2f265/src/main/java/org/jboss/ejb/client/EJBClientContext.java#L565
commit from BZ#1020074 was reverted. creating a new upstream jira for this issue.
component upstream merged: https://github.com/jbossas/jboss-ejb-client/pull/112
reverted in upstream. https://github.com/jbossas/jboss-ejb-client/pull/115