Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 901135 (JBPAPP6-1172)

Summary: ChannelBusyException: Too many open outbound writes
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Jitka Kozana <jkudrnac>
Component: EJB, RemotingAssignee: David M. Lloyd <david.lloyd>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.0.0CC: asaji, atangrin, cdewolf, david.lloyd, dosoudil, fnasser, jkudrnac, jpai, lthon, mnovak, ochaloup
Target Milestone: ---   
Target Release: EAP 6.0.1   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jira.jboss.org/jira/browse/JBPAPP6-1172
Whiteboard: Regression
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-29 09:21:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproducer.zip
none
client-maven-project.zip
none
JBPAPP6-1172-test-app.zip none

Description Jitka Kozana 2012-10-31 10:51:10 UTC
project_key: JBPAPP6

After node failover, we are seeing on the client side: 
{code}
2012/10/24 08:01:52:369 EDT [ERROR][Runner - 368] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Error getting response. <java.lang.reflect.UndeclaredThrowableException>
        java.lang.reflect.UndeclaredThrowableException
        at $Proxy0.getSerialAndIncrement(Unknown Source)
        at org.jboss.smartfrog.clustering.ejb3.StatefulSBProcessorFactoryImpl$EJB3RequestProcessor.processRequest(StatefulSBProcessorFactoryImpl.java:75)
        at org.jboss.smartfrog.loaddriver.CompoundRequestProcessorFactoryImpl$CompoundRequestProcessor.processRequest(CompoundRequestProcessorFactoryImpl.java:52)
        at org.jboss.smartfrog.loaddriver.Runner.run(Runner.java:87)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.jboss.remoting3.ChannelBusyException: Too many open outbound writes
        at org.jboss.remoting3.remote.RemoteConnectionChannel.openOutboundMessage(RemoteConnectionChannel.java:112)
        at org.jboss.remoting3.remote.RemoteConnectionChannel.writeMessage(RemoteConnectionChannel.java:297)
        at org.jboss.ejb.client.remoting.ChannelAssociation.acquireChannelMessageOutputStream(ChannelAssociation.java:249)
        at org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver.processInvocation(RemotingConnectionEJBReceiver.java:196)
        at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:181)
        at org.jboss.ejb.client.TransactionInterceptor.handleInvocation(TransactionInterceptor.java:42)
        at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:183)
        at org.jboss.ejb.client.ReceiverInterceptor.handleInvocation(ReceiverInterceptor.java:125)
        at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:183)
        at org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:136)
        at org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:121)
        at org.jboss.ejb.client.EJBInvocationHandler.invoke(EJBInvocationHandler.java:104)

{code}

We have 2000 clients.

Here is the jboss-ejb-client.properties:
{code}
remote.connection.perf18.host=perf18
remote.connection.perf18.port=4447
remote.connection.perf18.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS=false
remote.connection.perf18.channel.options.org.jboss.remoting3.RemotingOptions.MAX_OUTBOUND_MESSAGES=30000
{code}

Comment 1 Jaikiran Pai 2012-10-31 11:01:39 UTC
2000 clients running simultaneously? What do each of those 2000 clients do? And how many messages/operations do they do simultaneously?


Comment 2 Jitka Kozana 2012-10-31 11:52:35 UTC
Here is the link to the job:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-Failover-EJB/job/eap-6x-failover-ejb-ejbremote-shutdown-repl-async/29/console-perf17/

The clients are just calling a statefull EJB via remote invocation. One client = one call to the EJB's method getSerialAndIncrement().

I saw similar issues: JBPAPP-8721, AS7-3796, can this be a regression? 



Comment 3 Jitka Kozana 2012-10-31 11:54:08 UTC
Also, we did not see this issue during ER2 testing. Can this issue be introduced in remoting upgrade?

Comment 4 Jitka Kozana 2012-11-02 07:29:24 UTC
- All our functional test scenarios involving EJB calls are failing because of this issue (there were passing in ER2)
- Performance was dropped significantly: in ER2 4 node cluster was able to serve 5200 concurrent clients, in ER3 it is only 1600 clients
- the environment was the same in ER2 and ER3 testing

This may have big impact on our customers.


Comment 5 Jaikiran Pai 2012-11-02 07:37:43 UTC
Can you add the exact client side configurations being used and also the server side configurations for EJB3 and remoting subsystem? Are you sure the client side uses the correct set of jars (we had this issue with performance setups earlier where after spending a lot of time, we realized it was a client side jar inconsistency)?

A stateful session creation is 1 outbound message and a method invocation is another outbound message. Are you sure that the test setup doesn't exceed the *simultaneous* message limit that's been configured? 



Comment 6 Jitka Kozana 2012-11-02 09:26:11 UTC
$cat jboss-ejb-client.properties
{code}
remote.connectionprovider.create.options.org.xnio.Options.SSL_ENABLED=false

remote.connections=perf18,perf19,perf20,perf21

invocation.timeout=120000

remote.connection.perf18.host=perf18
remote.connection.perf18.port=4447
remote.connection.perf18.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS=false
remote.connection.perf18.channel.options.org.jboss.remoting3.RemotingOptions.MAX_OUTBOUND_MESSAGES=30000

remote.connection.perf19.host=perf19
remote.connection.perf19.port=4447
remote.connection.perf19.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS=false
remote.connection.perf19.channel.options.org.jboss.remoting3.RemotingOptions.MAX_OUTBOUND_MESSAGES=30000

remote.connection.perf20.host=perf20
remote.connection.perf20.port=4447
remote.connection.perf20.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS=false
remote.connection.perf20.channel.options.org.jboss.remoting3.RemotingOptions.MAX_OUTBOUND_MESSAGES=30000

remote.connection.perf21.host=perf21
remote.connection.perf21.port=4447
remote.connection.perf21.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS=false
remote.connection.perf21.channel.options.org.jboss.remoting3.RemotingOptions.MAX_OUTBOUND_MESSAGES=30000
{code}

server side config: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-Performance-Clustering-4node/job/eap-6x-stress-ejbremote-repl-sync/7/artifact/report/config/perf18/standalone-ha.xml

classpath:
{code}
clustering/target/dependency/xml-apis-1.3.04.jar
clustering/target/dependency/xercesImpl-2.9.1.jar
clustering/target/dependency/validation-api-1.0.0.GA.jar
clustering/target/dependency/sf-www-3.17.014.jar
clustering/target/dependency/odfdom-java-0.8.6.jar
clustering/target/dependency/log4j-1.2.16.jar
clustering/target/dependency/junit-3.8.jar
clustering/target/dependency/jfreechart-1.0.0.jar
clustering/target/dependency/jcommon-1.0.0.jar
clustering/target/dependency/jboss-transaction-api_1.1_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-smartfrog-stats-0.2.0.jar
clustering/target/dependency/jboss-smartfrog-loaddriver-0.2.1.CR1.jar
clustering/target/dependency/jboss-smartfrog-httpd-0.1.0.jar
clustering/target/dependency/jboss-smartfrog-extension-0.2.0.jar
clustering/target/dependency/jboss-smartfrog-ejb3-0.2.1-SNAPSHOT.jar
clustering/target/dependency/jboss-smartfrog-controller-0.2.1.CR1.jar
clustering/target/dependency/jboss-smartfrog-clustering-0.2.1-SNAPSHOT.jar
clustering/target/dependency/jboss-smartfrog-as7-0.2.1-SNAPSHOT.jar
clustering/target/dependency/jboss-smartfrog-as-0.1.1.jar
clustering/target/dependency/jboss-servlet-api_3.0_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-qa-utils-0.1.0.jar
clustering/target/dependency/jboss-jstl-api_1.2_spec-1.0.1.Final.jar
clustering/target/dependency/jboss-jsp-api_2.2_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-jsf-api_2.0_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-jaxrs-api_1.1_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-jaxb-api_2.2_spec-1.0.2.Final.jar
clustering/target/dependency/jboss-jaspi-api_1.0_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-interceptors-api_1.1_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-el-api_2.2_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-ejb-api_3.1_spec-1.0.1.Final.jar
clustering/target/dependency/jboss-connector-api_1.6_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-annotations-api_1.1_spec-1.0.0.Final.jar
clustering/target/dependency/javax.inject-1.jar
clustering/target/dependency/hibernate-jpa-2.0-api-1.0.1.Final.jar
clustering/target/dependency/eap-jboss-threads-6.0.1.ER3.jar
clustering/target/dependency/eap-jboss-dmr-6.0.1.ER3.jar
clustering/target/dependency/eap-jboss-client-6.0.1.ER3.jar
clustering/target/dependency/eap-jboss-as-protocol-6.0.1.ER3.jar
clustering/target/dependency/eap-jboss-as-controller-client-6.0.1.ER3.jar
clustering/target/dependency/commons-logging-1.0.4.jar
clustering/target/dependency/commons-lang-2.4.jar
clustering/target/dependency/commons-httpclient-3.1.jar
clustering/target/dependency/commons-csv-1.0-mlinhard.jar
clustering/target/dependency/commons-codec-1.2.jar
clustering/target/dependency/commons-beanutils-1.7.0.jar
clustering/target/dependency/clusterbench-ee6-ejb-1.0.1.jar
clustering/target/dependency/clusterbench-common-1.0.1.jar
clustering/target/dependency/cdi-api-1.0-SP4.jar
{code}

eap-jboss-threads-6.0.1.ER3.jar = jboss-threads-7.1.3.Final-redhat-3.jar
eap-jboss-dmr-6.0.1.ER3.jar = jboss-dmr-1.1.1.Final-redhat-2.jar
eap-jboss-client-6.0.1.ER3.jar = jboss-client.jar, v. 7.1.3.Final-redhat-3
eap-jboss-as-protocol-6.0.1.ER3.jar = jboss-as-protocol-7.1.3.Final-redhat-3.jar
eap-jboss-as-controller-client-6.0.1.ER3.jar = jboss-as-controller-client-7.1.3.Final-redhat-3.jar

Comment 7 Misty Stanley-Jones 2012-11-05 04:35:37 UTC
Release Notes Docs Status: Added: Not Required


Comment 8 Miroslav Novak 2012-11-06 09:26:20 UTC
This issue blocks performance and SOAK testing of JMS subsytem. Problem is that Specjms2007 creates load using standalone JMS clients which are making remote jndi lookups for connection factories and destinations. When there is too many clients this issue occur.

Console log:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-HornetQ/job/eap-60-hornetq-performance-specjjms2007-horizontal-nio-journal-with-non-blocking-io/38/console-messaging-17/

Comment 9 Fernando Nasser 2012-11-06 15:31:40 UTC
Have you tried using the remoting JAR from ER2?

Comment 10 Miroslav Novak 2012-11-06 15:44:05 UTC
Job was started again with EAP 6.0.1.ER2 and it's running for more than 6 hours now without failures. For sure this is regression. 

Hudson job:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-HornetQ/job/eap-60-hornetq-performance-specjjms2007-horizontal-nio-journal-with-non-blocking-io/

Comment 11 Anne-Louise Tangring 2012-11-06 18:46:16 UTC
Labels: Added: Regression


Comment 13 Jaikiran Pai 2012-11-08 15:59:45 UTC
I'm still trying to narrow this down, but in the meantime, could someone from the perf team who has access to these systems and knows how they are setup, please use this new remoting jar https://repository.jboss.org/nexus/content/repositories/snapshots/org/jboss/remoting3/jboss-remoting/3.2.13.GA-SNAPSHOT/jboss-remoting-3.2.13.GA-20121108.153303-1.jar which just has a minor exception logging related change and reproduce this issue? Update the *client* classpath to get rid of the older remoting version jar and use this new snapshot jar instead. Updating the server side (for now) isn't necessary. I'll continue with the investigation in the meantime.


Comment 14 Jaikiran Pai 2012-11-08 16:01:40 UTC
{quote}
This issue blocks performance and SOAK testing of JMS subsytem. Problem is that Specjms2007 creates load using standalone JMS clients which are making remote jndi lookups for connection factories and destinations. When there is too many clients this issue occur.

Console log:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-HornetQ/job/eap-60-hornetq-performance-specjjms2007-horizontal-nio-journal-with-non-blocking-io/38/console-messaging-17/
{quote}
That job is using incorrect client jar versions. For example:

jboss-hornetq-testsuite/target/dependency/jboss-remoting-3.2.7.GA.jar

It shouldn't be using 3.2.7 version of remoting on the client side and instead be using the version that's specified in the client BOM for 6.0.1.ER3. 


Comment 15 Jaikiran Pai 2012-11-08 16:58:05 UTC
This does look like some regression or some bug which we (re)exposed again, but based on the data so far, I haven't been able to confirm either way. I'll be taking a look at this later today as soon as I find some time (am away in Brno for a AS meeting). The data that I asked for in one of my previous replies, is going to help in the meantime.



Comment 16 Miroslav Novak 2012-11-08 20:00:47 UTC
Nice catch Jaikiran! This jar should not be there. It got there because of jboss-as-arquillian-container-managed dependency in pom.

But I can see that jboss-as-client-6.0.1.ER3.jar is before jboss-remoting-3.2.7.GA.jar in classpath when sattelite was started. It seems that this could not cause this issue. 

Strange thing is that when performance test is started with *EAP 6.0.1.ER2* and jboss-as-client-6.0.1.ER3.jar on the client side (on satellite side) then *no* exception occur. When it's started with *EAP 6.0.1.ER3* and jboss-as-client-6.0.1.ER3.jar on the client side then there are ChannelBusyExceptions. It seems that problem is on the server.

Probably the best way how to reproduce this issue is to create many threads where each thread creates InitialContext and does jndi lookup to remote server. This is exactly what the satellite does. It creates many standalone jms clients which do remote jndi lookup for queue.

Comment 17 Miroslav Novak 2012-11-08 21:46:30 UTC
Attached reproducer.zip should help to investigate. It simply creates many threads where each thread create one InitialContext to remote EAP 6.0.1.ER3 server and do jndi lookup for "jms/RemoteConnectionFactory".

{code}
How to use the reproducer:
1. Download and unzip reproducer.zip. Following scripts run in unzipped directory.
2. Prepare EAP 6.0.1.ER3 server - "sh prepare.sh" (it downloads EAP 6.0.1.ER3 zip distribution)
3. Start server - "sh start-server1.sh localhost" (parameter "hostname")
4. Start ManyJndiLookupTest test - "sh start-lookups.sh localhost 200" (paraters "hostname_of_eap_server" "number_of_threads")
5. See exceptions in log.
{code}

Step 4. is somewhat tricky. Hard to say how many threads to set. I used 200 threads and tested with EAP 6.0.1.ER2 and ER3. With ER2 there are no exceptions and with ER3 I can see ChannelBusyException. In both of the cases I used jboss-as-client.jar from EAP 6.0.1.ER3.

I've attached also maven project with test - client-maven-project.zip. It's class org.jboss.jms.client.ManyJndiLookupTest. I hope I did not miss something.



Comment 18 Miroslav Novak 2012-11-08 21:46:57 UTC
Attachment: Added: reproducer.zip
Attachment: Added: client-maven-project.zip


Comment 19 Jaikiran Pai 2012-11-08 22:23:11 UTC
Miroslav, thanks a lot for creating a reproducible application for this. I've always had trouble trying to reproduce this in the performance environment and having a simple reproducer helps. It's a bit too late in the night here and I'm traveling tomorrow, but as soon as I get some time in front of my laptop, I'm going to take a look at this.


Comment 20 Miroslav Novak 2012-11-09 10:54:59 UTC
You're welcome Jaikiran and thanks for looking at this issue. 

Note:
jboss-as-client-6.0.1.ER3.jar = jboss-client.jar from EAP 6.0.1.ER3. I've just noticed that this could be confusing.
client-maven-project.zip is having dependency to QA repo. I hope it'll work for you but probably you'll need to change it. Sorry for this.

Comment 21 Anne-Louise Tangring 2012-11-13 20:43:47 UTC
Release Notes Docs Status: Removed: Not Required 
Docs QE Status: Removed: NEW 


Comment 23 Jaikiran Pai 2012-11-15 06:53:25 UTC
I've figured out what's going wrong. This is a JBoss Remoting specific semantic going wrong. I'll be discussing with David Llyod about this today and will update this JIRA with what can be done to fix this and how long it might take to fix this.


Comment 24 Chao Wang 2012-11-15 09:42:12 UTC
I just followed "reproducer" and used 200 threads to test with "EAP 6.0.1.ER2", can also see "Caused by: org.jboss.remoting3.ChannelBusyException: Too many open outbound writes"

Comment 25 Jaikiran Pai 2012-11-15 09:53:26 UTC
Actually, the reproducer isn't the right one to test this problem since it doesn't involve the flow control that's done by the EJB client API. The EJB client API makes sure that it doesn't send more messages than what is configured. The reproducer isn't using EJBs so that doesn't come into picture. So although the exceptions are the same, the root cause isn't really identified in the reproducer.

The real issue lies in JBoss Remoting where it doesn't really decrement the open message count on a message close and instead expects the count to be decremented asynchronously which results in the EJB client API letting more messages to be written out while JBoss Remoting has still not decremented the previous message close. I have to discuss this with David today.



Comment 26 Jaikiran Pai 2012-11-15 17:08:27 UTC
Update: I've discussed this with David and David's working on a fix for this in JBoss Remoting library. 


Comment 27 David M. Lloyd 2012-11-15 23:11:27 UTC
Link: Added: This issue Cloned to JBPAPP6-1633


Comment 28 David M. Lloyd 2012-11-15 23:14:56 UTC
I have a possible fix here: http://github.com/dmlloyd/jboss-remoting/commit/ea08b8b

I'll check the reproducer when I can (if someone else gets to it before me, all the better).

Comment 29 Jaikiran Pai 2012-11-16 03:39:47 UTC
{quote}
I'll check the reproducer when I can (if someone else gets to it before me, all the better).
{quote}
The reproducer doesn't have flow control implemented on the client side. So it's always going to hit the max allowed message count and run into the exception. I've a different (modified quickstart) application which uses EJB invocations and I've been using it to reproduce this. I'll give it a try with that application.


Comment 30 Jaikiran Pai 2012-11-16 05:59:39 UTC
I've attached a test application (JBPAPP6-1172-test-app.zip) which can be used to test this fix (and reproduce the issue against EAP 6.0.1.ER3).

To use the application, follow these steps:

1) Build JBoss AS/EAP with JBoss Remoting 3.2.13.GA-SNAPSHOT (you'll have to build remoting containing the fix which David pointed to)
2) Extract the JBPAPP6-1172-test-app.zip to a location of your choice. Let's call it APP_HOME
3) cd APP_HOME
4) mvn clean install
5) cp server-side/target/jboss-as-ejb-remote-app.jar JBOSS_HOME/standalone/deployments
6) Start the server (./standalone.sh)
7) Add a user under ApplicationRealm using the add-user script in JBOSS_HOME/bin. The user name is expected to be jai and the password is expected to be 1234# and realm is ApplicationRealm
8) cd client (from APP_HOME folder)
9) mvn exec:exec

The client application triggers 200 threads which all invoke on the deployed EJB simultaneously, 10 times each. If the run completes successfully (i.e. if all 200 threads are able to invoke 10 times each) then you will see the following message:

{code}
******* 200 out of 200 threads completed successfully ********
{code}

Comment 31 Jaikiran Pai 2012-11-16 05:59:39 UTC
Attachment: Added: JBPAPP6-1172-test-app.zip


Comment 32 Jaikiran Pai 2012-11-16 06:18:17 UTC
By the way, the test application that I attached passes against David's fix.

Comment 33 Fernando Nasser 2012-11-19 17:21:50 UTC
David, so you have a PR we can cherry-pick for the product branch?

Comment 34 David M. Lloyd 2012-11-19 18:33:43 UTC
No, I haven't yet released this fix.  I'll do it this week though.

Comment 35 Anne-Louise Tangring 2012-11-19 18:43:39 UTC
David, any chance we can get it sooner? We are holding the build for this one and we are now at risk for the planned GA. Thanks, Anne-Louise


Comment 36 David M. Lloyd 2012-11-19 18:51:53 UTC
OK, I can push up the release today and submit the PR.  Hopefully there will be no more issues reported then :)

Comment 37 Vladimir Dosoudil 2012-11-21 14:33:11 UTC
JBoss Remoting 3.2.13.GA resolving related issue REM3-165 has been [productized|https://brewweb.devel.redhat.com/buildinfo?buildID=244430] and [EAP BOM updated|https://brewweb.devel.redhat.com/buildinfo?buildID=244432] to include the fixed/upgraded JBoss Remoting in EAP 6.0.1 ER 4.

Comment 38 Fernando Nasser 2012-11-21 14:33:32 UTC
Vladimir built jboss-remoting3-3.2.13-1.GA_redhat_1.ep6.el6  and it is included in the ER4

Comment 39 Fernando Nasser 2012-11-28 21:06:21 UTC
We had to build a 3.2.14.GA as there was a loop on the 13 one.

Comment 40 Jitka Kozana 2012-11-29 09:21:51 UTC
Not seen neither in clustering tests nor in HornetQ tests. 

Verified, closing.