Description of problem: https://issues.jboss.org/browse/REM3-204 When the connection is closed while we are sending some content, we have a deadlock happening between the RemoteConnection.RemoteWriteListener.queue and the BufferPipeOutputStream, for example: "Remoting read-1": at org.jboss.remoting3.remote.OutboundMessage.cancel(OutboundMessage.java:288) - waiting to lock <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream) at org.jboss.remoting3.remote.RemoteConnectionChannel.closeMessages(RemoteConnectionChannel.java:560) at org.jboss.remoting3.remote.RemoteConnectionChannel.closeAction(RemoteConnectionChannel.java:542) at org.jboss.remoting3.spi.AbstractHandleableCloseable.closeAsync(AbstractHandleableCloseable.java:372) at org.jboss.remoting3.remote.RemoteConnectionHandler.closeAllChannels(RemoteConnectionHandler.java:429) at org.jboss.remoting3.remote.RemoteConnectionHandler.sendCloseRequest(RemoteConnectionHandler.java:233) at org.jboss.remoting3.remote.RemoteConnectionHandler.handleConnectionClose(RemoteConnectionHandler.java:113) at org.jboss.remoting3.remote.RemoteReadListener.handleEvent(RemoteReadListener.java:81) - locked <0xdd29f670> (a java.util.ArrayDeque) at org.jboss.remoting3.remote.RemoteReadListener.handleEvent(RemoteReadListener.java:45) at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72) at org.xnio.channels.TranslatingSuspendableChannel.handleReadable(TranslatingSuspendableChannel.java:189) at org.xnio.channels.TranslatingSuspendableChannel$1.handleEvent(TranslatingSuspendableChannel.java:103) at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72) at org.xnio.channels.TranslatingSuspendableChannel.handleReadable(TranslatingSuspendableChannel.java:189) at org.xnio.ssl.JsseConnectedSslStreamChannel.handleReadable(JsseConnectedSslStreamChannel.java:183) at org.xnio.channels.TranslatingSuspendableChannel$1.handleEvent(TranslatingSuspendableChannel.java:103) at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72) at org.xnio.nio.NioHandle.run(NioHandle.java:90) at org.xnio.nio.WorkerThread.run(WorkerThread.java:198) "Remoting task-2": at org.jboss.remoting3.remote.RemoteConnection$RemoteWriteListener.send(RemoteConnection.java:294) - waiting to lock <0xdd29f670> (a java.util.ArrayDeque) at org.jboss.remoting3.remote.RemoteConnection.send(RemoteConnection.java:122) at org.jboss.remoting3.remote.OutboundMessage$1.accept(OutboundMessage.java:154) at org.xnio.streams.BufferPipeOutputStream.send(BufferPipeOutputStream.java:126) at org.xnio.streams.BufferPipeOutputStream.send(BufferPipeOutputStream.java:114) at org.xnio.streams.BufferPipeOutputStream.flush(BufferPipeOutputStream.java:143) - locked <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream) at org.xnio.streams.BufferPipeOutputStream.close(BufferPipeOutputStream.java:161) - locked <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream) at org.jboss.remoting3.remote.OutboundMessage.close(OutboundMessage.java:283) - locked <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream) at org.jboss.as.ejb3.remote.protocol.versionone.ChannelAssociation.releaseChannelMessageOutputStream(ChannelAssociation.java:85) at org.jboss.as.ejb3.remote.EJBRemoteConnectorService.sendVersionMessage(EJBRemoteConnectorService.java:184) at org.jboss.as.ejb3.remote.EJBRemoteConnectorService.access$000(EJBRemoteConnectorService.java:73) at org.jboss.as.ejb3.remote.EJBRemoteConnectorService$ChannelOpenListener.channelOpened(EJBRemoteConnectorService.java:211) at org.jboss.remoting3.spi.SpiUtils$ServiceOpenTask.run(SpiUtils.java:126) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Version-Release number of selected component (if applicable): 3.3.5
Hi Aaron: The problem is that when an endpoint receives a CLOSE signal it tries to send another CLOSE to the other end resulting in this behaviour. IMO the problem is in here:(send something is async, and closing the channel is also async... this causes the race condition) This is the entry point when the endpoint receives the CLOSE signal https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L112 This is the entry point when the endpoint sends the CLOSE signal https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L409 IMO the offender is: https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L113 should be executed only when the endpoint is closed (closeActionn) and not when is receiving the CLOSE SIGNAL (handling close)
Thinking on Enrique's suggestion, I'm not sure it explains it because sendCloseRequest would only be called after the connection is flagged as closing in RemoteReadListener.handleEvent. So anything sent as a result of the sendCloseRequest should actually not send because of the closing check added to RemoteConnection$RemoteWriteListener.send. But I've noticed that the possibility for this deadlock is introduced by the changes for the RejectedExecutionException bug: https://github.com/jboss-remoting/jboss-remoting/commit/97bddc0d16deb421f1dea0baa61aeaeaa7c504b4 We removed the executor altogether that used to handle the message cancels during channel close. Thus the RemoteConnectionHandler.closeAllChannels call is blocking in and can deadlock in OutboundMessage.cancel on remoting 3.3.5.Final. Thus, this deadlock is avoidable on EAP 6.4.1 and earlier for now.
So to avoid the deadlock, we'll need to revert changes that removed the executor. David suggested reverting: #1 https://github.com/jboss-remoting/jboss-remoting/commit/97bddc0d16deb421f1dea0baa61aeaeaa7c504b4 #2 https://github.com/jboss-remoting/jboss-remoting/commit/055f0c4b133a3cb361d8d7c9ba7825e000ee1386 #3 https://github.com/jboss-remoting/jboss-remoting/commit/eff6cb787f567d2a917844c189bb4fa565e0257b And then implement a new fix to avoid https://bugzilla.redhat.com/show_bug.cgi?id=1238420. Perhaps: https://github.com/jboss-remoting/jboss-remoting/compare/3.3...dmlloyd:3.3-tracking-exec
3.3 PR to revert prior changes: https://github.com/jboss-remoting/jboss-remoting/pull/46
Commit to avoid RejectedExecutionExceptions as well without having removed the executor to introduce these deadlocks: https://github.com/jboss-remoting/jboss-remoting/commit/61c32c01c7b9f893a50842f08d3ebe9d3ef81797
Hi Aaron: if your fix for this issue is reverting things, they are already in: PR 3.3: https://github.com/jboss-remoting/jboss-remoting/pull/46 Upstream: not required. Your comment https://bugzilla.redhat.com/show_bug.cgi?id=1262114#c8 is related to another BZ (https://bugzilla.redhat.com/show_bug.cgi?id=1238420)
Verified. For the record, I used reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1264927 and tried with the reproducer.btm present in there and also with a reproducer.btm containing the following rule: RULE trigger deadlock two CLASS org.jboss.remoting3.remote.OutboundMessage METHOD cancel AT ENTRY IF TRUE DO Thread.sleep(3000) ENDRULE
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.