This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1299844 - [QA](6.4.z) Deadlock in EJB client after fix for bug 1266112
[QA](6.4.z) Deadlock in EJB client after fix for bug 1266112
Status: CLOSED CURRENTRELEASE
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Clustering, EJB (Show other bugs)
6.4.6
Unspecified Unspecified
unspecified Severity high
: CR1
: EAP 6.4.8
Assigned To: Enrique Gonzalez Martinez
Ladislav Thon
:
Depends On:
Blocks: eap648-payload 1299983
  Show dependency treegraph
 
Reported: 2016-01-19 06:49 EST by Ladislav Thon
Modified: 2017-01-24 13:21 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-01-17 07:32:38 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker EJBCLIENT-152 Major Resolved Deadlock in EJB client 2017-04-11 02:52 EDT
JBoss Issue Tracker JBEAP-3114 Major Closed Deadlock in EJB client 2017-04-11 02:52 EDT
Red Hat Knowledge Base (Solution) 2887731 None None None 2017-01-24 13:21 EST

  None (edit)
Description Ladislav Thon 2016-01-19 06:49:16 EST
Description of problem:

This is the same test that found bug 1266112. Specifically, the tests starts a cluster of 2 servers, deploys a simple application and calls a remote EJB from a standalone client (actually 10 clients in 10 threads). During the test, both servers are repeatedly shut down and then brought back to life, but only one at a time.

Around the time one of the servers is going down, a deadlock in the EJB client library occurs. This was found by investigating why the test appears to hang -- a simple `jstack -l $PID` reveals

Found one Java-level deadlock:
=============================
"Remoting "config-based-ejb-client-endpoint" read-1":
  waiting to lock monitor 0x00007fc2786a53f8 (object 0x000000078609c840, a java.util.Collections$SynchronizedSet),
  which is held by "Client09"
"Client09":
  waiting to lock monitor 0x00007fc2788e4710 (object 0x000000078608c238, a java.util.IdentityHashMap),
  which is held by "Remoting "config-based-ejb-client-endpoint" read-1"

Java stack information for the threads listed above:
===================================================
"Remoting "config-based-ejb-client-endpoint" read-1":
	at java.util.Collections$SynchronizedCollection.remove(Collections.java:1580)
	- waiting to lock <0x000000078609c840> (a java.util.Collections$SynchronizedSet)
	at org.jboss.ejb.client.ClusterContext.receiverContextClosed(ClusterContext.java:414)
	at org.jboss.ejb.client.EJBClientContext.unregisterEJBReceiver(EJBClientContext.java:443)
	- locked <0x000000078608c238> (a java.util.IdentityHashMap)
	at org.jboss.ejb.client.EJBReceiverContext.close(EJBReceiverContext.java:59)
	at org.jboss.ejb.client.remoting.ChannelAssociation.notifyBrokenChannel(ChannelAssociation.java:407)
	at org.jboss.ejb.client.remoting.ChannelAssociation.access$100(ChannelAssociation.java:59)
	at org.jboss.ejb.client.remoting.ChannelAssociation$1.handleClose(ChannelAssociation.java:118)
	at org.jboss.ejb.client.remoting.ChannelAssociation$1.handleClose(ChannelAssociation.java:110)
	at org.jboss.remoting3.spi.SpiUtils.safeHandleClose(SpiUtils.java:54)
	at org.jboss.remoting3.spi.AbstractHandleableCloseable$CloseHandlerTask.run(AbstractHandleableCloseable.java:514)
	at org.jboss.remoting3.spi.AbstractHandleableCloseable.runCloseTask(AbstractHandleableCloseable.java:419)
	at org.jboss.remoting3.spi.AbstractHandleableCloseable.closeComplete(AbstractHandleableCloseable.java:290)
	at org.jboss.remoting3.remote.RemoteConnectionChannel.closeAction(RemoteConnectionChannel.java:531)
	at org.jboss.remoting3.spi.AbstractHandleableCloseable.closeAsync(AbstractHandleableCloseable.java:372)
	at org.jboss.remoting3.remote.RemoteConnectionHandler.closeAllChannels(RemoteConnectionHandler.java:437)
	at org.jboss.remoting3.remote.RemoteConnectionHandler.receiveCloseRequest(RemoteConnectionHandler.java:217)
	at org.jboss.remoting3.remote.RemoteReadListener.handleEvent(RemoteReadListener.java:113)
	at org.jboss.remoting3.remote.RemoteReadListener.handleEvent(RemoteReadListener.java:46)
	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72)
	at org.xnio.channels.TranslatingSuspendableChannel.handleReadable(TranslatingSuspendableChannel.java:189)
	at org.xnio.channels.TranslatingSuspendableChannel$1.handleEvent(TranslatingSuspendableChannel.java:103)
	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72)
	at org.xnio.nio.NioHandle.run(NioHandle.java:90)
	at org.xnio.nio.WorkerThread.run(WorkerThread.java:198)
"Client09":
	at org.jboss.ejb.client.EJBClientContext.getNodeEJBReceiver(EJBClientContext.java:860)
	- waiting to lock <0x000000078608c238> (a java.util.IdentityHashMap)
	at org.jboss.ejb.client.EJBClientContext.getNodeEJBReceiver(EJBClientContext.java:846)
	at org.jboss.ejb.client.EJBClientContext.getNodeEJBReceiverContext(EJBClientContext.java:888)
	at org.jboss.ejb.client.ClusterContext.isNodeConnectedAndDeployed(ClusterContext.java:251)
	at org.jboss.ejb.client.ClusterContext.getConnectedAndDeployedNodes(ClusterContext.java:266)
	- locked <0x000000078609c840> (a java.util.Collections$SynchronizedSet)
	at org.jboss.ejb.client.ClusterContext.getEJBReceiverContext(ClusterContext.java:125)
	at org.jboss.ejb.client.ClusterContext.getEJBReceiverContext(ClusterContext.java:96)
	at org.jboss.ejb.client.EJBClientContext.getClusterEJBReceiverContext(EJBClientContext.java:958)
	at org.jboss.ejb.client.EJBClientContext.getClusterEJBReceiverContext(EJBClientContext.java:938)
	at org.jboss.ejb.client.ReceiverInterceptor.handleInvocation(ReceiverInterceptor.java:108)
	at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:186)
	at org.jboss.ejb.client.EJBInvocationHandler.sendRequestWithPossibleRetries(EJBInvocationHandler.java:255)
	at org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:200)
	at org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:183)
	at org.jboss.ejb.client.EJBInvocationHandler.invoke(EJBInvocationHandler.java:146)
	at com.sun.proxy.$Proxy23.test(Unknown Source)
	at org.jboss.as.test.clustering.lodh.june2013.tools.ClientThread.doCall(ClientThread.java:75)
	at org.jboss.as.test.clustering.lodh.june2013.tools.ClientThread.run(ClientThread.java:56)

This appears to be caused by the fix for bug 1266112, which is commit https://github.com/jbossas/jboss-ejb-client/commit/66cda3bcd2 The fix itself is arguably correct, but it reveals another concurrency issue.

Version-Release number of selected component (if applicable):

EAP 6.4.6.CP.CR2

JBoss EJB Client 1.0.32.Final-redhat-1

Turns out that this version of EJB client also appears in EAP 6.4.5, so EAP 6.4.5 is most likely also affected. This wasn't discovered when verifying the fix for bug 1266112, because that bug appeared in scenarios where the EAP process was `kill -9`-ed, so I only used these "processKill" scenarios for verifying. On the other hand, this bug appears in the scenarios where EAP is shut down "gracefully".

How reproducible:

Often, but not 100% of the time.

Steps to Reproduce:

See a private comment below.

Actual results:

Deadlock occurs.

Expected results:

No deadlock occurs.
Comment 3 Enrique Gonzalez Martinez 2016-01-22 03:02:05 EST
The deadlock is simple to reproduce. It happens when
Comment 4 Enrique Gonzalez Martinez 2016-02-01 02:19:05 EST
For some reason my last commet is just half of it. The bug can be reproduced easily with a stateless ejb and 2 instances of the app server.

If you shutdown one of the servers during an ejb-client invocation you get this error (it is a race condition). Reducing the scope of the connectedAndDeployed in the function getConnectedAndDeployedNodes solves the problem.

PR upstream 7.0.0: https://github.com/jbossas/jboss-ejb-client/pull/138
PR 6.4.z: https://github.com/jbossas/jboss-ejb-client/pull/137
Comment 5 Enrique Gonzalez Martinez 2016-02-02 03:08:58 EST
merged

6.4.z: 1.0.x https://github.com/jbossas/jboss-ejb-client/commit/3a19ab2113cce7026847c70058195b2c5d8dce28 

@Brad this got merged in 6.4.z branch in ejb-client. this will get in the next ejb-client upgrade
Comment 8 Jiří Bílek 2016-05-12 09:49:51 EDT
Verified with EAP 6.4.8.CP.CR2
Comment 9 JBoss JIRA Server 2016-05-17 06:41:40 EDT
Enrique González Martínez <elguardian@gmail.com> updated the status of jira JBEAP-3114 to Resolved
Comment 10 JBoss JIRA Server 2016-08-23 07:38:33 EDT
Jiri Pallich <jpallich@redhat.com> updated the status of jira JBEAP-3114 to Closed
Comment 11 Petr Penicka 2017-01-17 07:32:38 EST
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.

Note You need to log in before you can comment on or make changes to this bug.