Bug 1262114

Summary: [GSS](6.4.z) Deadlock when connection is closing while we are writing
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Aaron Ogburn <aogburn>
Component: RemotingAssignee: Aaron Ogburn <aogburn>
Status: CLOSED CURRENTRELEASE QA Contact: Jitka Kozana <jkudrnac>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4.3CC: bmaxwell, cdewolf, david.lloyd, egonzale, rjanik
Target Milestone: CR1   
Target Release: EAP 6.4.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1235745, 1253482, 1262449, 1264927, 1265008, 1266518, 1278889    

Description Aaron Ogburn 2015-09-10 20:14:08 UTC
Description of problem:

https://issues.jboss.org/browse/REM3-204

When the connection is closed while we are sending some content, we have a deadlock happening between the RemoteConnection.RemoteWriteListener.queue and the BufferPipeOutputStream, for example:

"Remoting read-1":
	at org.jboss.remoting3.remote.OutboundMessage.cancel(OutboundMessage.java:288)
	- waiting to lock <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream)
	at org.jboss.remoting3.remote.RemoteConnectionChannel.closeMessages(RemoteConnectionChannel.java:560)
	at org.jboss.remoting3.remote.RemoteConnectionChannel.closeAction(RemoteConnectionChannel.java:542)
	at org.jboss.remoting3.spi.AbstractHandleableCloseable.closeAsync(AbstractHandleableCloseable.java:372)
	at org.jboss.remoting3.remote.RemoteConnectionHandler.closeAllChannels(RemoteConnectionHandler.java:429)
	at org.jboss.remoting3.remote.RemoteConnectionHandler.sendCloseRequest(RemoteConnectionHandler.java:233)
	at org.jboss.remoting3.remote.RemoteConnectionHandler.handleConnectionClose(RemoteConnectionHandler.java:113)
	at org.jboss.remoting3.remote.RemoteReadListener.handleEvent(RemoteReadListener.java:81)
	- locked <0xdd29f670> (a java.util.ArrayDeque)
	at org.jboss.remoting3.remote.RemoteReadListener.handleEvent(RemoteReadListener.java:45)
	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72)
	at org.xnio.channels.TranslatingSuspendableChannel.handleReadable(TranslatingSuspendableChannel.java:189)
	at org.xnio.channels.TranslatingSuspendableChannel$1.handleEvent(TranslatingSuspendableChannel.java:103)
	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72)
	at org.xnio.channels.TranslatingSuspendableChannel.handleReadable(TranslatingSuspendableChannel.java:189)
	at org.xnio.ssl.JsseConnectedSslStreamChannel.handleReadable(JsseConnectedSslStreamChannel.java:183)
	at org.xnio.channels.TranslatingSuspendableChannel$1.handleEvent(TranslatingSuspendableChannel.java:103)
	at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:72)
	at org.xnio.nio.NioHandle.run(NioHandle.java:90)
	at org.xnio.nio.WorkerThread.run(WorkerThread.java:198)
"Remoting task-2":
	at org.jboss.remoting3.remote.RemoteConnection$RemoteWriteListener.send(RemoteConnection.java:294)
	- waiting to lock <0xdd29f670> (a java.util.ArrayDeque)
	at org.jboss.remoting3.remote.RemoteConnection.send(RemoteConnection.java:122)
	at org.jboss.remoting3.remote.OutboundMessage$1.accept(OutboundMessage.java:154)
	at org.xnio.streams.BufferPipeOutputStream.send(BufferPipeOutputStream.java:126)
	at org.xnio.streams.BufferPipeOutputStream.send(BufferPipeOutputStream.java:114)
	at org.xnio.streams.BufferPipeOutputStream.flush(BufferPipeOutputStream.java:143)
	- locked <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream)
	at org.xnio.streams.BufferPipeOutputStream.close(BufferPipeOutputStream.java:161)
	- locked <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream)
	at org.jboss.remoting3.remote.OutboundMessage.close(OutboundMessage.java:283)
	- locked <0xdd0ae4c0> (a org.xnio.streams.BufferPipeOutputStream)
	at org.jboss.as.ejb3.remote.protocol.versionone.ChannelAssociation.releaseChannelMessageOutputStream(ChannelAssociation.java:85)
	at org.jboss.as.ejb3.remote.EJBRemoteConnectorService.sendVersionMessage(EJBRemoteConnectorService.java:184)
	at org.jboss.as.ejb3.remote.EJBRemoteConnectorService.access$000(EJBRemoteConnectorService.java:73)
	at org.jboss.as.ejb3.remote.EJBRemoteConnectorService$ChannelOpenListener.channelOpened(EJBRemoteConnectorService.java:211)
	at org.jboss.remoting3.spi.SpiUtils$ServiceOpenTask.run(SpiUtils.java:126)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)


Version-Release number of selected component (if applicable):

3.3.5

Comment 4 Enrique Gonzalez Martinez 2015-09-18 09:07:28 UTC
Hi Aaron:

The problem is that when an endpoint receives a CLOSE signal it tries to send another CLOSE to the other end resulting in this behaviour. IMO the problem is in here:(send something is async, and closing the channel is also async... this causes the race condition)

This is the entry point when the endpoint receives the CLOSE signal
https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L112

This is the entry point when the endpoint sends the CLOSE signal
https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L409

IMO the offender is:
https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L113

should be executed only when the endpoint is closed (closeActionn) and not when is receiving the CLOSE SIGNAL (handling close)

Comment 5 Aaron Ogburn 2015-09-21 20:00:41 UTC
Thinking on Enrique's suggestion, I'm not sure it explains it because sendCloseRequest would only be called after the connection is flagged as closing in RemoteReadListener.handleEvent.  So anything sent as a result of the sendCloseRequest should actually not send because of the closing check added to RemoteConnection$RemoteWriteListener.send.

But I've noticed that the possibility for this deadlock is introduced by the changes for the RejectedExecutionException bug:

https://github.com/jboss-remoting/jboss-remoting/commit/97bddc0d16deb421f1dea0baa61aeaeaa7c504b4


We removed the executor altogether that used to handle the message cancels during channel close.  Thus the RemoteConnectionHandler.closeAllChannels call is blocking in and can deadlock in OutboundMessage.cancel on remoting 3.3.5.Final.

Thus, this deadlock is avoidable on EAP 6.4.1 and earlier for now.

Comment 7 Aaron Ogburn 2015-09-21 21:44:41 UTC
3.3 PR to revert prior changes:

https://github.com/jboss-remoting/jboss-remoting/pull/46

Comment 8 Aaron Ogburn 2015-09-21 22:12:51 UTC
Commit to avoid RejectedExecutionExceptions as well without having removed the executor to introduce these deadlocks:

https://github.com/jboss-remoting/jboss-remoting/commit/61c32c01c7b9f893a50842f08d3ebe9d3ef81797

Comment 9 Enrique Gonzalez Martinez 2015-09-22 07:22:45 UTC
Hi Aaron:

if your fix for this issue is reverting things, they are already in:

PR 3.3: https://github.com/jboss-remoting/jboss-remoting/pull/46
Upstream: not required.

Your comment https://bugzilla.redhat.com/show_bug.cgi?id=1262114#c8 is related to another BZ (https://bugzilla.redhat.com/show_bug.cgi?id=1238420)

Comment 11 Richard Janík 2015-11-05 15:09:46 UTC
Verified.

For the record, I used reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1264927 and tried with the reproducer.btm present in there and also with a reproducer.btm containing the following rule:

RULE trigger deadlock two
CLASS org.jboss.remoting3.remote.OutboundMessage
METHOD cancel
AT ENTRY
IF TRUE
DO Thread.sleep(3000)
ENDRULE

Comment 12 Petr Penicka 2017-01-17 11:43:32 UTC
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.

Comment 13 Petr Penicka 2017-01-17 11:43:37 UTC
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.