Bug 962734 - IOException during EJB invocation causes a hang
Summary: IOException during EJB invocation causes a hang
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: EJB, Remoting
Version: 6.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ER1
: EAP 6.2.0
Assignee: David M. Lloyd
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-14 10:53 UTC by Jan Martiska
Modified: 2015-11-16 12:12 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-15 16:16:02 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
reproducer (36.35 KB, application/zip)
2013-05-14 13:08 UTC, Jan Martiska
no flags Details

Description Jan Martiska 2013-05-14 10:53:26 UTC
I hit this issue when reproducing #957170. I used the reproducer described in #957171 and added byteman into it - it throws an IOException at the beginning of MethodInvocationMessageWriter.writeMessage method, that means it reproduces exactly the scenario in #957170 and leads to a RequestSendFailedException.

However, the client is unable to invoke the same EJB on the same server node after that. It seems that the IOException caused the underlying remoting channel to break somehow and subsequent attempts to invoke a method cause the client to hang forever.

How to reproduce:
- use any EJB client connecting to a EAP instance (doesn't have to be a cluster, you can use a standalone server)
- add a byteman rule to throw an IOException at the beginning of MethodInvocationMessageWriter.writeMessage at some point in time
- have the EJB client repeatedly invoke a session bean on the server

The first time this exception occurs, the EJB invocation will just fail with something like java.lang.IllegalStateException: EJBCLIENT000025: No EJB receiver available for handling [appName:appone, moduleName:ejb, distinctName:] combination for invocation context org.jboss.ejb.client.EJBClientInvocationContext@515f9553, which seems OK (if you invoke a bean in cluster, the client will correctly retry on another node, as per #957170). However, subsequent attempts to invoke the bean on this node will cause the client to hang indefinitely (if you don't specify a invocation.timeout property on client). 

Log of such unsuccessful invocation attempt (from client side):

May 14, 2013 10:27:25 AM org.jboss.ejb.client.EJBClientContext getEJBReceiver
DEBUG: org.jboss.ejb.client.RandomDeploymentNodeSelector@289f21ed deployment node selector selected master:app-oneA node for appname=appone,modulename=ejb,distinctname=
May 14, 2013 10:27:25 AM org.jboss.remoting3.remote.RemoteConnectionChannel casState
TRACE: CAS Channel ID a9069502 (outbound) of Remoting connection 629abc40 to localhost/127.0.0.1:4547
	old: RS=false WS=false IM=0 OM=0
	new: RS=false WS=false IM=0 OM=1
May 14, 2013 10:27:25 AM org.jboss.remoting3.remote.RemoteConnectionChannel openOutboundMessage
TRACE: Opened outbound message on Channel ID a9069502 (outbound) of Remoting connection 629abc40 to localhost/127.0.0.1:4547
May 14, 2013 10:27:25 AM org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver wrapMessageOutputStream
TRACE: Hints are disabled. Ignoring any CompressionHint on methods being invoked on view interface org.jboss.as.quickstarts.ejb.multi.server.app.AppOne
May 14, 2013 10:27:25 AM org.jboss.remoting3.remote.OutboundMessage$1 accept
TRACE: Message window is open, proceeding with send
May 14, 2013 10:27:25 AM org.xnio.channels.FramedMessageChannel send
TRACE: Accepting java.nio.HeapByteBuffer[pos=0 lim=133 cap=8192] into java.nio.HeapByteBuffer[pos=0 lim=8196 cap=8196]
May 14, 2013 10:27:25 AM org.xnio.channels.FramedMessageChannel send
TRACE: Accepted a message into java.nio.HeapByteBuffer[pos=137 lim=8196 cap=8196]
May 14, 2013 10:27:25 AM org.xnio.channels.FramedMessageChannel doFlushBuffer
TRACE: Fully flushed org.xnio.channels.FramedMessageChannel around TCP socket channel (NIO) <47387c07>
May 14, 2013 10:27:25 AM org.jboss.remoting3.remote.OutboundMessage$1 flush
TRACE: Flushing message channel
May 14, 2013 10:27:25 AM org.xnio.channels.FramedMessageChannel doFlushBuffer
TRACE: Fully flushed org.xnio.channels.FramedMessageChannel around TCP socket channel (NIO) <47387c07>
May 14, 2013 10:27:25 AM org.jboss.remoting3.remote.OutboundMessage$1 accept
TRACE: Sending message (with EOF) (java.nio.HeapByteBuffer[pos=0 lim=8 cap=8192]) to org.xnio.channels.FramedMessageChannel around TCP socket channel (NIO) <47387c07>
May 14, 2013 10:27:25 AM org.jboss.remoting3.remote.OutboundMessage$1 accept
TRACE: Message window is open, proceeding with send
May 14, 2013 10:27:25 AM org.xnio.channels.FramedMessageChannel send
TRACE: Accepting java.nio.HeapByteBuffer[pos=0 lim=8 cap=8192] into java.nio.HeapByteBuffer[pos=0 lim=8196 cap=8196]
May 14, 2013 10:27:25 AM org.xnio.channels.FramedMessageChannel send
TRACE: Accepted a message into java.nio.HeapByteBuffer[pos=12 lim=8196 cap=8196]
May 14, 2013 10:27:25 AM org.xnio.channels.FramedMessageChannel doFlushBuffer
TRACE: Fully flushed org.xnio.channels.FramedMessageChannel around TCP socket channel (NIO) <47387c07>
May 14, 2013 10:27:25 AM org.jboss.remoting3.remote.RemoteConnectionChannel free
TRACE: Removed Outbound message ID 9116 on Channel ID a9069502 (outbound) of Remoting connection 629abc40 to localhost/127.0.0.1:4547
May 14, 2013 10:27:25 AM org.jboss.remoting3.remote.RemoteConnectionChannel incrementState
TRACE: CAS Channel ID a9069502 (outbound) of Remoting connection 629abc40 to localhost/127.0.0.1:4547
	old: RS=false WS=false IM=0 OM=1
	new: RS=false WS=false IM=0 OM=0
May 14, 2013 10:27:25 AM org.jboss.remoting3.remote.RemoteConnectionChannel closeOutboundMessage
TRACE: Closed outbound message on Channel ID a9069502 (outbound) of Remoting connection 629abc40 to localhost/127.0.0.1:4547
May 14, 2013 10:27:25 AM org.jboss.remoting3.remote.OutboundMessage$1 flush
TRACE: Flushing message channel
May 14, 2013 10:27:25 AM org.xnio.channels.FramedMessageChannel doFlushBuffer
TRACE: Fully flushed org.xnio.channels.FramedMessageChannel around TCP socket channel (NIO) <47387c07>

The application thread which called the EJB is now stuck forever in the EJBClientInvocationContext.awaitResponse method, like this:

java.lang.Object.wait(Native Method)
org.jboss.ejb.client.EJBClientInvocationContext.awaitResponse(EJBClientInvocationContext.java:401)
org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:202)
org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:181)
org.jboss.ejb.client.EJBInvocationHandler.invoke(EJBInvocationHandler.java:144)
com.sun.proxy.$Proxy8.getJBossNodeName(Unknown Source)
org.jboss.as.quickstarts.ejb.multi.server.ClientTest.dotest(ClientTest.java:149)
(...)

This is not a blocker for 6.1.0 release, because the probability of this happening in real world seems very small.

Comment 1 Jaikiran Pai 2013-05-14 12:02:07 UTC
That wait looks "normal" since it's awaiting a response. What would be interesting though is to see if the server side has received the request and doing something with it.

Can you attach the app (with the byteman rules) to reproduce this? That'll reduce the time I'll have to spend to setup the test.

Comment 2 Jan Martiska 2013-05-14 13:08:21 UTC
Created attachment 747693 [details]
reproducer

Comment 3 Jan Martiska 2013-05-14 13:09:20 UTC
The server doesn't receive the whole request.. on server side this exception occurs:

15:06:46,092 WARN  [org.jboss.as.ejb3] (Remoting "dhcp-4-126" task-4) JBAS014254: Received unsupported message header 0xffffffff on channel Channel ID 0c51fad8 (inbound) of Remoting connection 4589adff to /127.0.0.1:33329

I added a reproducer zip, instructions included.

Comment 4 Jimmy Wilson 2013-05-14 20:27:28 UTC
@Jaikiran: Can you comment on how likely this scenario is?  Presumptively, customers won't be using ByteMan to create IOExceptions in their applications, so I see how it's rare for that point of view.  However, how likely are we to see IOExceptions (or others?) at this level that could cause the same behavior?  A hang for clients is a fairly severe consequence, so I would like to be sure of the likelihood of this occurring...

Comment 5 Jaikiran Pai 2013-05-15 03:21:17 UTC
@Jimmy, I am going to look into this today and update this bugzilla.

Comment 6 Jaikiran Pai 2013-05-15 06:16:38 UTC
Quick update - this is a bug and I've a simple one liner fix (*doesn't* require any sub project release). I'll send a PR for this soon and will update this bugzilla further on what I think might be the scenario where this can cause an issue.

Comment 8 Jaikiran Pai 2013-05-15 12:40:21 UTC
>> However, how likely are we to see IOExceptions (or others?) at this level that could cause the same behavior? 

To summarize, this hang happens if the following occurs:

1) A client sends out a message on the EJB channel with a message header which isn't recognized by the server side channel receiver for the EJB protocol.
2) The server side EJB protocol receiver then logs a message and completes its job. However, it forgets to re-enroll itself to receive further messages on that channel (and that's the bug).
3) Subsequently, the client sends message(s) to the server over the EJB channel, but the server side protocol receiver hasn't enrolled itself for the messages. This leaves the client in a state where it is waiting for a response from the server, but the server never sends one, because it didn't handle the request in first place.

The test here happened to send out a message with (an unrecognized) 0xffff header (I think it represents a EOF) to the server side by triggering the IOExcpetion while sending out the message and as a result exposed this bug.

The PR that has been sent fixes the issue but re-enrolling the protocol receiver for messages on the channel.

As to how frequently this can happen in a real world scenario, I'm not really sure. In theory its possible that the client side fails with some issue even before it has written out the correct message header to the message inputstream that has been opened. In practice, it's a rare scenario that this will happen. 

Note that this isn't specific to EAP 6.1. I believe EAP 6.0.x too has the same problem.

Comment 10 Jan Martiska 2013-05-17 07:57:16 UTC
I would change the release notes text a bit.

No workaround is known for this issue

=>

There is a partial workaround, it will not let the client to continue EJB invocations, but at least the client will not hang forever. This is accomplished by setting an invocation.timeout property when creating the EJB client context.

Comment 11 Scott Mumford 2013-08-21 03:48:14 UTC
Thanks for the extra info Jan.

The Docs text has been updated accordingly.

Comment 12 Jan Martiska 2013-09-11 13:35:22 UTC
Verified in EAP 6.2.0.ER1. Thanks!

Comment 17 Dimitris Andreadis 2013-10-24 18:28:27 UTC
Assigning jpai EJB issues to david.lloyd. Please re-assign to Cheng or others as needed.

Comment 19 Russell Dickenson 2013-12-03 13:19:02 UTC
'Requires doc text' flag cleared as it is too late to include this issue in release notes for JBoss EAP 6.2.0.


Note You need to log in before you can comment on or make changes to this bug.