Bug 901135 (JBPAPP6-1172)
| Summary: | ChannelBusyException: Too many open outbound writes | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [JBoss] JBoss Enterprise Application Platform 6 | Reporter: | Jitka Kozana <jkudrnac> | ||||||||
| Component: | EJB, Remoting | Assignee: | David M. Lloyd <david.lloyd> | ||||||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | |||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | urgent | ||||||||||
| Version: | 6.0.0 | CC: | asaji, atangrin, cdewolf, david.lloyd, dosoudil, fnasser, jkudrnac, jpai, lthon, mnovak, ochaloup | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | EAP 6.0.1 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| URL: | http://jira.jboss.org/jira/browse/JBPAPP6-1172 | ||||||||||
| Whiteboard: | Regression | ||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2012-11-29 09:21:51 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
2000 clients running simultaneously? What do each of those 2000 clients do? And how many messages/operations do they do simultaneously? Here is the link to the job: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-Failover-EJB/job/eap-6x-failover-ejb-ejbremote-shutdown-repl-async/29/console-perf17/ The clients are just calling a statefull EJB via remote invocation. One client = one call to the EJB's method getSerialAndIncrement(). I saw similar issues: JBPAPP-8721, AS7-3796, can this be a regression? Also, we did not see this issue during ER2 testing. Can this issue be introduced in remoting upgrade? - All our functional test scenarios involving EJB calls are failing because of this issue (there were passing in ER2) - Performance was dropped significantly: in ER2 4 node cluster was able to serve 5200 concurrent clients, in ER3 it is only 1600 clients - the environment was the same in ER2 and ER3 testing This may have big impact on our customers. Can you add the exact client side configurations being used and also the server side configurations for EJB3 and remoting subsystem? Are you sure the client side uses the correct set of jars (we had this issue with performance setups earlier where after spending a lot of time, we realized it was a client side jar inconsistency)? A stateful session creation is 1 outbound message and a method invocation is another outbound message. Are you sure that the test setup doesn't exceed the *simultaneous* message limit that's been configured? $cat jboss-ejb-client.properties
{code}
remote.connectionprovider.create.options.org.xnio.Options.SSL_ENABLED=false
remote.connections=perf18,perf19,perf20,perf21
invocation.timeout=120000
remote.connection.perf18.host=perf18
remote.connection.perf18.port=4447
remote.connection.perf18.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS=false
remote.connection.perf18.channel.options.org.jboss.remoting3.RemotingOptions.MAX_OUTBOUND_MESSAGES=30000
remote.connection.perf19.host=perf19
remote.connection.perf19.port=4447
remote.connection.perf19.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS=false
remote.connection.perf19.channel.options.org.jboss.remoting3.RemotingOptions.MAX_OUTBOUND_MESSAGES=30000
remote.connection.perf20.host=perf20
remote.connection.perf20.port=4447
remote.connection.perf20.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS=false
remote.connection.perf20.channel.options.org.jboss.remoting3.RemotingOptions.MAX_OUTBOUND_MESSAGES=30000
remote.connection.perf21.host=perf21
remote.connection.perf21.port=4447
remote.connection.perf21.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS=false
remote.connection.perf21.channel.options.org.jboss.remoting3.RemotingOptions.MAX_OUTBOUND_MESSAGES=30000
{code}
server side config: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-Performance-Clustering-4node/job/eap-6x-stress-ejbremote-repl-sync/7/artifact/report/config/perf18/standalone-ha.xml
classpath:
{code}
clustering/target/dependency/xml-apis-1.3.04.jar
clustering/target/dependency/xercesImpl-2.9.1.jar
clustering/target/dependency/validation-api-1.0.0.GA.jar
clustering/target/dependency/sf-www-3.17.014.jar
clustering/target/dependency/odfdom-java-0.8.6.jar
clustering/target/dependency/log4j-1.2.16.jar
clustering/target/dependency/junit-3.8.jar
clustering/target/dependency/jfreechart-1.0.0.jar
clustering/target/dependency/jcommon-1.0.0.jar
clustering/target/dependency/jboss-transaction-api_1.1_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-smartfrog-stats-0.2.0.jar
clustering/target/dependency/jboss-smartfrog-loaddriver-0.2.1.CR1.jar
clustering/target/dependency/jboss-smartfrog-httpd-0.1.0.jar
clustering/target/dependency/jboss-smartfrog-extension-0.2.0.jar
clustering/target/dependency/jboss-smartfrog-ejb3-0.2.1-SNAPSHOT.jar
clustering/target/dependency/jboss-smartfrog-controller-0.2.1.CR1.jar
clustering/target/dependency/jboss-smartfrog-clustering-0.2.1-SNAPSHOT.jar
clustering/target/dependency/jboss-smartfrog-as7-0.2.1-SNAPSHOT.jar
clustering/target/dependency/jboss-smartfrog-as-0.1.1.jar
clustering/target/dependency/jboss-servlet-api_3.0_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-qa-utils-0.1.0.jar
clustering/target/dependency/jboss-jstl-api_1.2_spec-1.0.1.Final.jar
clustering/target/dependency/jboss-jsp-api_2.2_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-jsf-api_2.0_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-jaxrs-api_1.1_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-jaxb-api_2.2_spec-1.0.2.Final.jar
clustering/target/dependency/jboss-jaspi-api_1.0_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-interceptors-api_1.1_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-el-api_2.2_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-ejb-api_3.1_spec-1.0.1.Final.jar
clustering/target/dependency/jboss-connector-api_1.6_spec-1.0.0.Final.jar
clustering/target/dependency/jboss-annotations-api_1.1_spec-1.0.0.Final.jar
clustering/target/dependency/javax.inject-1.jar
clustering/target/dependency/hibernate-jpa-2.0-api-1.0.1.Final.jar
clustering/target/dependency/eap-jboss-threads-6.0.1.ER3.jar
clustering/target/dependency/eap-jboss-dmr-6.0.1.ER3.jar
clustering/target/dependency/eap-jboss-client-6.0.1.ER3.jar
clustering/target/dependency/eap-jboss-as-protocol-6.0.1.ER3.jar
clustering/target/dependency/eap-jboss-as-controller-client-6.0.1.ER3.jar
clustering/target/dependency/commons-logging-1.0.4.jar
clustering/target/dependency/commons-lang-2.4.jar
clustering/target/dependency/commons-httpclient-3.1.jar
clustering/target/dependency/commons-csv-1.0-mlinhard.jar
clustering/target/dependency/commons-codec-1.2.jar
clustering/target/dependency/commons-beanutils-1.7.0.jar
clustering/target/dependency/clusterbench-ee6-ejb-1.0.1.jar
clustering/target/dependency/clusterbench-common-1.0.1.jar
clustering/target/dependency/cdi-api-1.0-SP4.jar
{code}
eap-jboss-threads-6.0.1.ER3.jar = jboss-threads-7.1.3.Final-redhat-3.jar
eap-jboss-dmr-6.0.1.ER3.jar = jboss-dmr-1.1.1.Final-redhat-2.jar
eap-jboss-client-6.0.1.ER3.jar = jboss-client.jar, v. 7.1.3.Final-redhat-3
eap-jboss-as-protocol-6.0.1.ER3.jar = jboss-as-protocol-7.1.3.Final-redhat-3.jar
eap-jboss-as-controller-client-6.0.1.ER3.jar = jboss-as-controller-client-7.1.3.Final-redhat-3.jar
Release Notes Docs Status: Added: Not Required This issue blocks performance and SOAK testing of JMS subsytem. Problem is that Specjms2007 creates load using standalone JMS clients which are making remote jndi lookups for connection factories and destinations. When there is too many clients this issue occur. Console log: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-HornetQ/job/eap-60-hornetq-performance-specjjms2007-horizontal-nio-journal-with-non-blocking-io/38/console-messaging-17/ Have you tried using the remoting JAR from ER2? Job was started again with EAP 6.0.1.ER2 and it's running for more than 6 hours now without failures. For sure this is regression. Hudson job: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-HornetQ/job/eap-60-hornetq-performance-specjjms2007-horizontal-nio-journal-with-non-blocking-io/ Labels: Added: Regression I'm still trying to narrow this down, but in the meantime, could someone from the perf team who has access to these systems and knows how they are setup, please use this new remoting jar https://repository.jboss.org/nexus/content/repositories/snapshots/org/jboss/remoting3/jboss-remoting/3.2.13.GA-SNAPSHOT/jboss-remoting-3.2.13.GA-20121108.153303-1.jar which just has a minor exception logging related change and reproduce this issue? Update the *client* classpath to get rid of the older remoting version jar and use this new snapshot jar instead. Updating the server side (for now) isn't necessary. I'll continue with the investigation in the meantime. {quote}
This issue blocks performance and SOAK testing of JMS subsytem. Problem is that Specjms2007 creates load using standalone JMS clients which are making remote jndi lookups for connection factories and destinations. When there is too many clients this issue occur.
Console log:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-HornetQ/job/eap-60-hornetq-performance-specjjms2007-horizontal-nio-journal-with-non-blocking-io/38/console-messaging-17/
{quote}
That job is using incorrect client jar versions. For example:
jboss-hornetq-testsuite/target/dependency/jboss-remoting-3.2.7.GA.jar
It shouldn't be using 3.2.7 version of remoting on the client side and instead be using the version that's specified in the client BOM for 6.0.1.ER3.
This does look like some regression or some bug which we (re)exposed again, but based on the data so far, I haven't been able to confirm either way. I'll be taking a look at this later today as soon as I find some time (am away in Brno for a AS meeting). The data that I asked for in one of my previous replies, is going to help in the meantime. Nice catch Jaikiran! This jar should not be there. It got there because of jboss-as-arquillian-container-managed dependency in pom. But I can see that jboss-as-client-6.0.1.ER3.jar is before jboss-remoting-3.2.7.GA.jar in classpath when sattelite was started. It seems that this could not cause this issue. Strange thing is that when performance test is started with *EAP 6.0.1.ER2* and jboss-as-client-6.0.1.ER3.jar on the client side (on satellite side) then *no* exception occur. When it's started with *EAP 6.0.1.ER3* and jboss-as-client-6.0.1.ER3.jar on the client side then there are ChannelBusyExceptions. It seems that problem is on the server. Probably the best way how to reproduce this issue is to create many threads where each thread creates InitialContext and does jndi lookup to remote server. This is exactly what the satellite does. It creates many standalone jms clients which do remote jndi lookup for queue. Attached reproducer.zip should help to investigate. It simply creates many threads where each thread create one InitialContext to remote EAP 6.0.1.ER3 server and do jndi lookup for "jms/RemoteConnectionFactory".
{code}
How to use the reproducer:
1. Download and unzip reproducer.zip. Following scripts run in unzipped directory.
2. Prepare EAP 6.0.1.ER3 server - "sh prepare.sh" (it downloads EAP 6.0.1.ER3 zip distribution)
3. Start server - "sh start-server1.sh localhost" (parameter "hostname")
4. Start ManyJndiLookupTest test - "sh start-lookups.sh localhost 200" (paraters "hostname_of_eap_server" "number_of_threads")
5. See exceptions in log.
{code}
Step 4. is somewhat tricky. Hard to say how many threads to set. I used 200 threads and tested with EAP 6.0.1.ER2 and ER3. With ER2 there are no exceptions and with ER3 I can see ChannelBusyException. In both of the cases I used jboss-as-client.jar from EAP 6.0.1.ER3.
I've attached also maven project with test - client-maven-project.zip. It's class org.jboss.jms.client.ManyJndiLookupTest. I hope I did not miss something.
Attachment: Added: reproducer.zip Attachment: Added: client-maven-project.zip Miroslav, thanks a lot for creating a reproducible application for this. I've always had trouble trying to reproduce this in the performance environment and having a simple reproducer helps. It's a bit too late in the night here and I'm traveling tomorrow, but as soon as I get some time in front of my laptop, I'm going to take a look at this. You're welcome Jaikiran and thanks for looking at this issue. Note: jboss-as-client-6.0.1.ER3.jar = jboss-client.jar from EAP 6.0.1.ER3. I've just noticed that this could be confusing. client-maven-project.zip is having dependency to QA repo. I hope it'll work for you but probably you'll need to change it. Sorry for this. Release Notes Docs Status: Removed: Not Required Docs QE Status: Removed: NEW I've figured out what's going wrong. This is a JBoss Remoting specific semantic going wrong. I'll be discussing with David Llyod about this today and will update this JIRA with what can be done to fix this and how long it might take to fix this. I just followed "reproducer" and used 200 threads to test with "EAP 6.0.1.ER2", can also see "Caused by: org.jboss.remoting3.ChannelBusyException: Too many open outbound writes" Actually, the reproducer isn't the right one to test this problem since it doesn't involve the flow control that's done by the EJB client API. The EJB client API makes sure that it doesn't send more messages than what is configured. The reproducer isn't using EJBs so that doesn't come into picture. So although the exceptions are the same, the root cause isn't really identified in the reproducer. The real issue lies in JBoss Remoting where it doesn't really decrement the open message count on a message close and instead expects the count to be decremented asynchronously which results in the EJB client API letting more messages to be written out while JBoss Remoting has still not decremented the previous message close. I have to discuss this with David today. Update: I've discussed this with David and David's working on a fix for this in JBoss Remoting library. Link: Added: This issue Cloned to JBPAPP6-1633 I have a possible fix here: http://github.com/dmlloyd/jboss-remoting/commit/ea08b8b I'll check the reproducer when I can (if someone else gets to it before me, all the better). {quote}
I'll check the reproducer when I can (if someone else gets to it before me, all the better).
{quote}
The reproducer doesn't have flow control implemented on the client side. So it's always going to hit the max allowed message count and run into the exception. I've a different (modified quickstart) application which uses EJB invocations and I've been using it to reproduce this. I'll give it a try with that application.
I've attached a test application (JBPAPP6-1172-test-app.zip) which can be used to test this fix (and reproduce the issue against EAP 6.0.1.ER3).
To use the application, follow these steps:
1) Build JBoss AS/EAP with JBoss Remoting 3.2.13.GA-SNAPSHOT (you'll have to build remoting containing the fix which David pointed to)
2) Extract the JBPAPP6-1172-test-app.zip to a location of your choice. Let's call it APP_HOME
3) cd APP_HOME
4) mvn clean install
5) cp server-side/target/jboss-as-ejb-remote-app.jar JBOSS_HOME/standalone/deployments
6) Start the server (./standalone.sh)
7) Add a user under ApplicationRealm using the add-user script in JBOSS_HOME/bin. The user name is expected to be jai and the password is expected to be 1234# and realm is ApplicationRealm
8) cd client (from APP_HOME folder)
9) mvn exec:exec
The client application triggers 200 threads which all invoke on the deployed EJB simultaneously, 10 times each. If the run completes successfully (i.e. if all 200 threads are able to invoke 10 times each) then you will see the following message:
{code}
******* 200 out of 200 threads completed successfully ********
{code}
Attachment: Added: JBPAPP6-1172-test-app.zip By the way, the test application that I attached passes against David's fix. David, so you have a PR we can cherry-pick for the product branch? No, I haven't yet released this fix. I'll do it this week though. David, any chance we can get it sooner? We are holding the build for this one and we are now at risk for the planned GA. Thanks, Anne-Louise OK, I can push up the release today and submit the PR. Hopefully there will be no more issues reported then :) JBoss Remoting 3.2.13.GA resolving related issue REM3-165 has been [productized|https://brewweb.devel.redhat.com/buildinfo?buildID=244430] and [EAP BOM updated|https://brewweb.devel.redhat.com/buildinfo?buildID=244432] to include the fixed/upgraded JBoss Remoting in EAP 6.0.1 ER 4. Vladimir built jboss-remoting3-3.2.13-1.GA_redhat_1.ep6.el6 and it is included in the ER4 We had to build a 3.2.14.GA as there was a loop on the 13 one. Not seen neither in clustering tests nor in HornetQ tests. Verified, closing. |
project_key: JBPAPP6 After node failover, we are seeing on the client side: {code} 2012/10/24 08:01:52:369 EDT [ERROR][Runner - 368] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Error getting response. <java.lang.reflect.UndeclaredThrowableException> java.lang.reflect.UndeclaredThrowableException at $Proxy0.getSerialAndIncrement(Unknown Source) at org.jboss.smartfrog.clustering.ejb3.StatefulSBProcessorFactoryImpl$EJB3RequestProcessor.processRequest(StatefulSBProcessorFactoryImpl.java:75) at org.jboss.smartfrog.loaddriver.CompoundRequestProcessorFactoryImpl$CompoundRequestProcessor.processRequest(CompoundRequestProcessorFactoryImpl.java:52) at org.jboss.smartfrog.loaddriver.Runner.run(Runner.java:87) at java.lang.Thread.run(Thread.java:662) Caused by: org.jboss.remoting3.ChannelBusyException: Too many open outbound writes at org.jboss.remoting3.remote.RemoteConnectionChannel.openOutboundMessage(RemoteConnectionChannel.java:112) at org.jboss.remoting3.remote.RemoteConnectionChannel.writeMessage(RemoteConnectionChannel.java:297) at org.jboss.ejb.client.remoting.ChannelAssociation.acquireChannelMessageOutputStream(ChannelAssociation.java:249) at org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver.processInvocation(RemotingConnectionEJBReceiver.java:196) at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:181) at org.jboss.ejb.client.TransactionInterceptor.handleInvocation(TransactionInterceptor.java:42) at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:183) at org.jboss.ejb.client.ReceiverInterceptor.handleInvocation(ReceiverInterceptor.java:125) at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:183) at org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:136) at org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:121) at org.jboss.ejb.client.EJBInvocationHandler.invoke(EJBInvocationHandler.java:104) {code} We have 2000 clients. Here is the jboss-ejb-client.properties: {code} remote.connection.perf18.host=perf18 remote.connection.perf18.port=4447 remote.connection.perf18.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS=false remote.connection.perf18.channel.options.org.jboss.remoting3.RemotingOptions.MAX_OUTBOUND_MESSAGES=30000 {code}