Bug 1262114 - [GSS](6.4.z) Deadlock when connection is closing while we are writing
Summary: [GSS](6.4.z) Deadlock when connection is closing while we are writing
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Remoting
Version: 6.4.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: CR1
: EAP 6.4.5
Assignee: Aaron Ogburn
QA Contact: Jitka Kozana
URL:
Whiteboard:
Depends On:
Blocks: 1235745 1253482 1262449 1264927 1265008 1266518 1278889
TreeView+ depends on / blocked
 
Reported: 2015-09-10 20:14 UTC by Aaron Ogburn
Modified: 2019-09-12 08:54 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker REM3-204 0 Major Resolved Deadlock when connection is closing while we are writing 2020-08-12 16:13:51 UTC

Description Aaron Ogburn 2015-09-10 20:14:08 UTC
Description of problem:

https://issues.jboss.org/browse/REM3-204

When the connection is closed while we are sending some content, we have a deadlock happening between the RemoteConnection.RemoteWriteListener.queue and the BufferPipeOutputStream, for example:

"Remoting read-1":
	at org.jboss.remoting3.remote.OutboundMessage.cancel(OutboundMessage.java:288)
	- waiting to lock <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream)
	at org.jboss.remoting3.remote.RemoteConnectionChannel.closeMessages(RemoteConnectionChannel.java:560)
	at org.jboss.remoting3.remote.RemoteConnectionChannel.closeAction(RemoteConnectionChannel.java:542)
	at org.jboss.remoting3.spi.AbstractHandleableCloseable.closeAsync(AbstractHandleableCloseable.java:372)
	at org.jboss.remoting3.remote.RemoteConnectionHandler.closeAllChannels(RemoteConnectionHandler.java:429)
	at org.jboss.remoting3.remote.RemoteConnectionHandler.sendCloseRequest(RemoteConnectionHandler.java:233)
	at org.jboss.remoting3.remote.RemoteConnectionHandler.handleConnectionClose(RemoteConnectionHandler.java:113)
	at org.jboss.remoting3.remote.RemoteReadListener.handleEvent(RemoteReadListener.java:81)
	- locked <0xdd29f670> (a java.util.ArrayDeque)
	at org.jboss.remoting3.remote.RemoteReadListener.handleEvent(RemoteReadListener.java:45)
	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72)
	at org.xnio.channels.TranslatingSuspendableChannel.handleReadable(TranslatingSuspendableChannel.java:189)
	at org.xnio.channels.TranslatingSuspendableChannel$1.handleEvent(TranslatingSuspendableChannel.java:103)
	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72)
	at org.xnio.channels.TranslatingSuspendableChannel.handleReadable(TranslatingSuspendableChannel.java:189)
	at org.xnio.ssl.JsseConnectedSslStreamChannel.handleReadable(JsseConnectedSslStreamChannel.java:183)
	at org.xnio.channels.TranslatingSuspendableChannel$1.handleEvent(TranslatingSuspendableChannel.java:103)
	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72)
	at org.xnio.nio.NioHandle.run(NioHandle.java:90)
	at org.xnio.nio.WorkerThread.run(WorkerThread.java:198)
"Remoting task-2":
	at org.jboss.remoting3.remote.RemoteConnection$RemoteWriteListener.send(RemoteConnection.java:294)
	- waiting to lock <0xdd29f670> (a java.util.ArrayDeque)
	at org.jboss.remoting3.remote.RemoteConnection.send(RemoteConnection.java:122)
	at org.jboss.remoting3.remote.OutboundMessage$1.accept(OutboundMessage.java:154)
	at org.xnio.streams.BufferPipeOutputStream.send(BufferPipeOutputStream.java:126)
	at org.xnio.streams.BufferPipeOutputStream.send(BufferPipeOutputStream.java:114)
	at org.xnio.streams.BufferPipeOutputStream.flush(BufferPipeOutputStream.java:143)
	- locked <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream)
	at org.xnio.streams.BufferPipeOutputStream.close(BufferPipeOutputStream.java:161)
	- locked <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream)
	at org.jboss.remoting3.remote.OutboundMessage.close(OutboundMessage.java:283)
	- locked <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream)
	at org.jboss.as.ejb3.remote.protocol.versionone.ChannelAssociation.releaseChannelMessageOutputStream(ChannelAssociation.java:85)
	at org.jboss.as.ejb3.remote.EJBRemoteConnectorService.sendVersionMessage(EJBRemoteConnectorService.java:184)
	at org.jboss.as.ejb3.remote.EJBRemoteConnectorService.access$000(EJBRemoteConnectorService.java:73)
	at org.jboss.as.ejb3.remote.EJBRemoteConnectorService$ChannelOpenListener.channelOpened(EJBRemoteConnectorService.java:211)
	at org.jboss.remoting3.spi.SpiUtils$ServiceOpenTask.run(SpiUtils.java:126)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)


Version-Release number of selected component (if applicable):

3.3.5

Comment 4 Enrique Gonzalez Martinez 2015-09-18 09:07:28 UTC
Hi Aaron:

The problem is that when an endpoint receives a CLOSE signal it tries to send another CLOSE to the other end resulting in this behaviour. IMO the problem is in here:(send something is async, and closing the channel is also async... this causes the race condition)

This is the entry point when the endpoint receives the CLOSE signal
https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L112

This is the entry point when the endpoint sends the CLOSE signal
https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L409

IMO the offender is:
https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L113

should be executed only when the endpoint is closed (closeActionn) and not when is receiving the CLOSE SIGNAL (handling close)

Comment 5 Aaron Ogburn 2015-09-21 20:00:41 UTC
Thinking on Enrique's suggestion, I'm not sure it explains it because sendCloseRequest would only be called after the connection is flagged as closing in RemoteReadListener.handleEvent.  So anything sent as a result of the sendCloseRequest should actually not send because of the closing check added to RemoteConnection$RemoteWriteListener.send.

But I've noticed that the possibility for this deadlock is introduced by the changes for the RejectedExecutionException bug:

https://github.com/jboss-remoting/jboss-remoting/commit/97bddc0d16deb421f1dea0baa61aeaeaa7c504b4


We removed the executor altogether that used to handle the message cancels during channel close.  Thus the RemoteConnectionHandler.closeAllChannels call is blocking in and can deadlock in OutboundMessage.cancel on remoting 3.3.5.Final.

Thus, this deadlock is avoidable on EAP 6.4.1 and earlier for now.

Comment 7 Aaron Ogburn 2015-09-21 21:44:41 UTC
3.3 PR to revert prior changes:

https://github.com/jboss-remoting/jboss-remoting/pull/46

Comment 8 Aaron Ogburn 2015-09-21 22:12:51 UTC
Commit to avoid RejectedExecutionExceptions as well without having removed the executor to introduce these deadlocks:

https://github.com/jboss-remoting/jboss-remoting/commit/61c32c01c7b9f893a50842f08d3ebe9d3ef81797

Comment 9 Enrique Gonzalez Martinez 2015-09-22 07:22:45 UTC
Hi Aaron:

if your fix for this issue is reverting things, they are already in:

PR 3.3: https://github.com/jboss-remoting/jboss-remoting/pull/46
Upstream: not required.

Your comment https://bugzilla.redhat.com/show_bug.cgi?id=1262114#c8 is related to another BZ (https://bugzilla.redhat.com/show_bug.cgi?id=1238420)

Comment 11 Richard Janík 2015-11-05 15:09:46 UTC
Verified.

For the record, I used reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1264927 and tried with the reproducer.btm present in there and also with a reproducer.btm containing the following rule:

RULE trigger deadlock two
CLASS org.jboss.remoting3.remote.OutboundMessage
METHOD cancel
AT ENTRY
IF TRUE
DO Thread.sleep(3000)
ENDRULE

Comment 12 Petr Penicka 2017-01-17 11:43:32 UTC
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.

Comment 13 Petr Penicka 2017-01-17 11:43:37 UTC
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.


Note You need to log in before you can comment on or make changes to this bug.