Bug 989879

Summary:	(6.4.z) NotOpenException: Cannot open new channel because close was initiated on server shutdown
Product:	[JBoss] JBoss Enterprise Application Platform 6	Reporter:	Jitka Kozana <jkudrnac>
Component:	EJB	Assignee:	Radovan STANCEL <rstancel>
Status:	CLOSED WONTFIX	QA Contact:	Michal Vinkler <mvinkler>
Severity:	unspecified	Docs Contact:	Russell Dickenson <rdickens>
Priority:	unspecified
Version:	6.1.1	CC:	bbaranow, bmaxwell, cdewolf, egonzale, jmartisk, rstancel
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-07-21 16:41:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jitka Kozana 2013-07-30 06:23:20 UTC

Setup: 4 node cluster, one node at at time is shutdown, while standalone clients keep calling the application. 

When server is gracefully shutting down, the following error was seen on the client side:
16:15:32,082 INFO  [org.jboss.ejb.client.remoting] (Remoting "config-based-ejb-client-endpoint" task-4) EJBCLIENT000016: Channel Channel ID abd07b66 (outbound) of Remoting connection 73efe4ac to perf18/10.16.90.54:4447 can no longer process messages
16:15:32,100 ERROR [org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver] (Remoting "config-based-ejb-client-endpoint" task-4) Failed to open channel for context EJBReceiverContext{clientContext=org.jboss.ejb.client.EJBClientContext@26d7ad36, receiver=Remoting connection EJB receiver [connection=org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection@35908558,channel=jboss.ejb,nodename=perf18]}
org.jboss.remoting3.NotOpenException: Cannot open new channel because close was initiated
	at org.jboss.remoting3.remote.RemoteConnectionHandler.handleOutboundChannelOpen(RemoteConnectionHandler.java:188)
	at org.jboss.remoting3.remote.RemoteConnectionHandler.open(RemoteConnectionHandler.java:314)
	at org.jboss.remoting3.ConnectionImpl.openChannel(ConnectionImpl.java:75)
	at org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection.openChannel(ConnectionPool.java:223)
	at org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver.associate(RemotingConnectionEJBReceiver.java:136)
	at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:375)
	at org.jboss.ejb.client.EJBClientContext.registerEJBReceiver(EJBClientContext.java:327)
	at org.jboss.ejb.client.remoting.EJBClientContextConnectionReconnectHandler.reconnect(EJBClientContextConnectionReconnectHandler.java:66)
	at org.jboss.ejb.client.EJBClientContext$ReconnectAttempt.run(EJBClientContext.java:1287)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)

Server log:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-shutdown-dist-async/20/artifact/report/config/jboss-perf18/server.log

Client log:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-shutdown-dist-async/20/console-perf17/consoleText

Server setup:
https://jenkins.mw.lab.eng.bos.redhat.com/job/eap-6x-failover-ejb-ejbremote-shutdown-dist-async/20/artifact/report/config/jboss-perf18/standalone-ha.xml

Comment 2 Dimitris Andreadis 2013-10-24 18:28:24 UTC

Assigning jpai EJB issues to david.lloyd. Please re-assign to Cheng or others as needed.

Comment 3 Jan Martiska 2014-07-30 13:38:56 UTC

From what I see in the logs, I'm not sure if this is actually an error. The client tries an invocation and picks a cluster node which is shutting down at that moment. The outbound remoting channel to that node gets closed in the meantime, so remoting tries to re-create it, but it fails because the server refuses it. The client logs an error and then picks another node. The EJB invocation proceeds correctly, doesn't it? I don't see any error message suggesting that the invocation failed. Or am I mistaken?

Comment 4 Enrique Gonzalez Martinez 2015-02-11 13:23:18 UTC

Hi Jan. 
From my point of view this is a bug. What is happening here is the following:

1) One end close the connection (the shutdown of the connection starts)
2) the other end detects the broken channel and registers a reconnect handler.
3) The ConnectionPool class inside the ejb client has two ways of removing connection from the cache 
   a) through close listener in the connection pool
    https://github.com/elguardian/jboss-ejb-client/blob/master/src/main/java/org/jboss/ejb/client/remoting/ConnectionPool.java#L273
   b) through the release operation:
      through close listener in the connection pool
    https://github.com/elguardian/jboss-ejb-client/blob/master/src/main/java/org/jboss/ejb/client/remoting/ConnectionPool.java#L92

These two ways are not thread safe.

The attempt to reconnect happens before the 3.a (through the broken channel event). This race condition leads to a situation where the ConnectionPool returns a connection is going to be closed.

For avoiding this situation the reconnect handler needs to ensure that is executed *only* after the connection has been closed. This mean serialize the events.  

Whenever the connection is broken, add a close listener to the channel for registering the reconnect handler. This ensure the events are sequential (it's not going to happen that a close channel tries to reconnect till the connection is cleaned up) avoiding the situation of a closing connection.

I added the solution for this in another BZ#1020074 (ejb async reconnection)
https://github.com/elguardian/jboss-ejb-client/blob/44982ae7a5e19c76ad96040dccddff8cfee2f265/src/main/java/org/jboss/ejb/client/EJBClientContext.java#L565

Comment 5 Enrique Gonzalez Martinez 2015-03-30 08:35:25 UTC

commit from BZ#1020074 was reverted. creating a new upstream jira for this issue.

Comment 6 Enrique Gonzalez Martinez 2015-04-08 12:39:30 UTC

component upstream merged:
https://github.com/jbossas/jboss-ejb-client/pull/112

Comment 7 Enrique Gonzalez Martinez 2016-09-01 11:14:11 UTC

reverted in upstream.

https://github.com/jbossas/jboss-ejb-client/pull/115