Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1098226

Summary: Inaccessible JBoss EWS container after a few REST requests
Product: [Retired] JBoss BPMS Platform 6 Reporter: Ivo Bek <ibek>
Component: Business CentralAssignee: Marco Rietveld <mrietvel>
Status: CLOSED CURRENTRELEASE QA Contact: Ivo Bek <ibek>
Severity: urgent Docs Contact:
Priority: high    
Version: 6.0.2CC: ibek, kverlaen, mbaluch, mrietvel, rrajasek, smcgowan, zkrejcov
Target Milestone: ER3   
Target Release: 6.0.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-06 19:50:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
catalina.out
none
threaddump
none
list of REST requests
none
start state charts
none
end state charts
none
GC details
none
heapdump screenshot
none
catalina.out from bpms gui tests
none
patched kie-services-remote jar
none
patched kie-services-remote jar (sources)
none
1st patched - end state chart
none
patched kie-services-jaxb jar
none
1st patched - end state chart
none
Screenshot of heap dump graph (largest dominators)
none
Screenshot showing SessionFactoryImpl instances as main culprits
none
patched EMF producer classes
none
2nd patched - end state chart none

Description Ivo Bek 2014-05-15 14:07:38 UTC
Description of problem:

I have configured JBoss EWS with BPMS 6.0.2.ER2 and PostgreSQL DB 9.2. When I run my test suite where I execute REST requests upon business central, after a while (45 REST requests):

 - Use of all processors goes up to 100%
 - EWS is inaccessible and every request timeouts
 - The server runs out of memory (OutOfMemoryError)

Maximum heap space was extended to 1024m but it only prolonged the issue to appear.

Please contact me for configured BPMS 6.0.2.ER2 on JBoss EWS and I will send it to you.

I will also give you settings files and other instructions including PostgreSQL setting to your own local instance.

Meanwhile you can look inside the attachments. You will find there:

 - threaddump which was created in the time when the server was not accessible
 - list of REST commands which were executed upon business central
 - server logs which contain information about java.lang.OutOfMemoryError: GC overhead limit exceeded
 - screenshots from visualvm where you can see start state of JBoss EWS, graphs of GC, the end state of JBoss EWS (CPU usage, Heap, Classes, and Threads), and part of heapdump sorted by the classes which have the biggest number of instances. Surprisingly it is HashMap$Entry and also others might seem suspiciously.

The problem is always reproducible and it was reproduced on different machines (my own laptop - Fedora19, 8GB RAM, performance machine, ...) with different databases as well (PostgreSQL, Oracle11gR1, ...).

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Contact me, I will upload you the configured BPMS 6.0.2.ER2 on JBoss EWS
2. Configure database in conf/resources.properties according to your PostgreSQL DB details
3. Start BPMS
4. Clone git://git.app.eng.bos.redhat.com/jbossqe/brms.git
5. In the brms directory, run "mvn clean install -Pjbpm-integration"
6. Watch the java process of JBoss EWS in visualvm, jconsole or your preferred monitoring tool.
7. When the tests start to timeout while performing REST requests and the all your processors are used at 100%, you can kill the tests and see that the server is inaccessible and that the use of processors has not changed (they remain being used for 100% though there are no more requests).

Actual results:


Expected results:


Additional info:

Comment 1 Ivo Bek 2014-05-15 14:13:20 UTC
Created attachment 895948 [details]
catalina.out

Comment 2 Ivo Bek 2014-05-15 14:13:47 UTC
Created attachment 895949 [details]
threaddump

Comment 3 Ivo Bek 2014-05-15 14:14:16 UTC
Created attachment 895950 [details]
list of REST requests

Comment 4 Ivo Bek 2014-05-15 14:14:49 UTC
Created attachment 895951 [details]
start state charts

Comment 5 Ivo Bek 2014-05-15 14:15:23 UTC
Created attachment 895952 [details]
end state charts

Comment 6 Ivo Bek 2014-05-15 14:15:52 UTC
Created attachment 895953 [details]
GC details

Comment 7 Ivo Bek 2014-05-15 14:16:42 UTC
Created attachment 895954 [details]
heapdump screenshot

Comment 8 Zuzana Krejčová 2014-05-16 08:14:26 UTC
The same or very similar happened with BPMS gui tests - after 28 test classes, the business-central simply failed to load. There is a bunch of 'java.lang.OutOfMemoryError: Java heap space' in catalina.out. Didn't happen when testing BRMS.

Comment 9 Zuzana Krejčová 2014-05-16 08:18:06 UTC
Created attachment 896245 [details]
catalina.out from bpms gui tests

Comment 10 Maciej Swiderski 2014-05-16 09:15:11 UTC
I was able to reproduce the memory leak with REST operations but not with UI, would it be possible to know what sort of tests where executed?

Comment 12 Maciej Swiderski 2014-05-16 11:44:34 UTC
Marco, made a fix in rest api that according to my tests removes the memory leak when using REST api. With this fix I am not able to reproduce any memory leak any more so would it be possible to get retest by QE with a patched jar (I can attach it here)?

Comment 13 Marco Rietveld 2014-05-16 14:22:40 UTC
Created attachment 896419 [details]
patched kie-services-remote jar

Added the patched jar, which was created using the 6.0.x branch. 

We're not completely sure that this fixes it, but it dramatically improves performance on tomcat. Ivo/Zuzanna, please test this and let us know if it fixes it. 

What was happening (AFAICT) is that the ProcessRequestBean was being created for every request. However, (I think that) Tomcat REST calls each have their own session because we use form-based auth on Tomcat. This would result in the injected EMF in the ProcessRequestBean being created for every request, along with a lot of other (per-session) class instances. Inspecting the heap showed that there were at least 110 hibernate SessionFactoryImpl instances, which are part of hibernate's EMF implementation. 

Changing the ProcessRequestBean to ApplicationScoped (which it probably should have been from the start), means that the EMF (and other injected/underlying class instances) is only created once for that bean, instead of every request. 

Commits: 

6.0.x: 
https://github.com/droolsjbpm/droolsjbpm-integration/commit/a8ca1caf

master: 
https://github.com/droolsjbpm/droolsjbpm-integration/commit/c42f4d33

Comment 14 Marco Rietveld 2014-05-16 14:23:21 UTC
Created attachment 896420 [details]
patched kie-services-remote jar (sources)

Comment 17 Ivo Bek 2014-05-19 07:42:09 UTC
Created attachment 897034 [details]
1st patched - end state chart

Marco, I retested BPMS 6.0.2.ER2 with your patch. It behaves different but I still see some more memory leaks in the attached chart where the heap is still exceeded soon though it is little better now.

Exception in thread "bitronix-recovery-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
	at bitronix.tm.utils.Uid.arrayToString(Uid.java:113)
	at bitronix.tm.utils.Uid.<init>(Uid.java:35)
	at bitronix.tm.journal.TransactionLogCursor.readLog(TransactionLogCursor.java:139)
	at bitronix.tm.journal.TransactionLogCursor.readLog(TransactionLogCursor.java:71)
	at bitronix.tm.journal.DiskJournal.collectDanglingRecords(DiskJournal.java:444)
	at bitronix.tm.journal.DiskJournal.collectDanglingRecords(DiskJournal.java:270)
	at bitronix.tm.recovery.Recoverer.run(Recoverer.java:129)
	at java.lang.Thread.run(Thread.java:744)

Also I get the following exception after most of REST request:

May 19, 2014 9:00:28 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [Resteasy] in context with path [/business-central] threw exception
javax.enterprise.event.ObserverException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at java.lang.Class.newInstance(Class.java:374)
	at org.jboss.weld.util.reflection.SecureReflections$16.work(SecureReflections.java:344)
	at org.jboss.weld.util.reflection.SecureReflectionAccess.run(SecureReflectionAccess.java:52)
	at org.jboss.weld.util.reflection.SecureReflectionAccess.runAsInstantiation(SecureReflectionAccess.java:173)
	at org.jboss.weld.util.reflection.SecureReflections.newInstance(SecureReflections.java:341)
	at org.jboss.weld.injection.Exceptions.rethrowException(Exceptions.java:33)
	at org.jboss.weld.injection.Exceptions.rethrowException(Exceptions.java:73)
	at org.jboss.weld.injection.MethodInjectionPoint.invokeOnInstanceWithSpecialValue(MethodInjectionPoint.java:162)
	at org.jboss.weld.event.ObserverMethodImpl.sendEvent(ObserverMethodImpl.java:245)
	at org.jboss.weld.event.ObserverMethodImpl.sendEvent(ObserverMethodImpl.java:233)
	at org.jboss.weld.event.ObserverMethodImpl.notify(ObserverMethodImpl.java:213)
	at org.jboss.weld.event.ObserverNotifier.notifyObserver(ObserverNotifier.java:117)
	at org.jboss.weld.event.ObserverNotifier.notifyObservers(ObserverNotifier.java:85)
	at org.jboss.weld.event.ObserverNotifier.fireEvent(ObserverNotifier.java:75)
	at org.jboss.weld.event.ObserverNotifier.fireEvent(ObserverNotifier.java:70)
	at org.jboss.weld.manager.BeanManagerImpl.fireEvent(BeanManagerImpl.java:562)
	at org.jboss.solder.servlet.exception.CatchExceptionFilter.doFilter(CatchExceptionFilter.java:74)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
	at org.jboss.solder.servlet.event.ServletEventBridgeFilter.doFilter(ServletEventBridgeFilter.java:74)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
	at org.jbpm.designer.web.filter.impl.PluggableFilter.doFilter(PluggableFilter.java:70)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
	at org.uberfire.security.server.UberFireSecurityFilter.doFilter(UberFireSecurityFilter.java:266)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
	at org.kie.integration.tomcat.JACCValve.invoke(JACCValve.java:101)
	at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
	at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008)
	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
	at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
Caused by: javax.servlet.ServletException: Servlet execution threw an exception
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:328)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
	at org.jboss.solder.servlet.exception.CatchExceptionFilter.doFilter(CatchExceptionFilter.java:65)
	... 26 more
Caused by: java.lang.NoSuchFieldError: KIE_JAXB_CLASS_SET
	at org.kie.remote.services.rest.jaxb.JaxbContextManager.getJaxbContext(JaxbContextManager.java:96)
	at org.kie.remote.services.rest.jaxb.JaxbContextManager$Proxy$_$$_WeldClientProxy.getJaxbContext(JaxbContextManager$Proxy$_$$_WeldClientProxy.java)
	at org.kie.remote.services.rest.jaxb.JaxbContextResolver.getContext(JaxbContextResolver.java:33)
	at org.kie.remote.services.rest.jaxb.JaxbContextResolver.getContext(JaxbContextResolver.java:19)
	at org.kie.remote.services.rest.jaxb.JaxbContextResolver$Proxy$_$$_WeldClientProxy.getContext(JaxbContextResolver$Proxy$_$$_WeldClientProxy.java)
	at org.jboss.resteasy.plugins.providers.jaxb.AbstractJAXBContextFinder.findProvidedJAXBContext(AbstractJAXBContextFinder.java:76)
	at org.jboss.resteasy.plugins.providers.jaxb.XmlJAXBContextFinder.findCachedContext(XmlJAXBContextFinder.java:37)
	at org.jboss.resteasy.plugins.providers.jaxb.AbstractJAXBProvider.findJAXBContext(AbstractJAXBProvider.java:69)
	at org.jboss.resteasy.plugins.providers.jaxb.AbstractJAXBProvider.getMarshaller(AbstractJAXBProvider.java:166)
	at org.jboss.resteasy.plugins.providers.jaxb.AbstractJAXBProvider.writeTo(AbstractJAXBProvider.java:142)
	at org.jboss.resteasy.core.interception.MessageBodyWriterContextImpl.proceed(MessageBodyWriterContextImpl.java:117)
	at org.jboss.resteasy.plugins.providers.jackson.JacksonJsonpInterceptor.write(JacksonJsonpInterceptor.java:112)
	at org.jboss.resteasy.core.interception.MessageBodyWriterContextImpl.proceed(MessageBodyWriterContextImpl.java:123)
	at org.jboss.resteasy.plugins.interceptors.encoding.GZIPEncodingInterceptor.write(GZIPEncodingInterceptor.java:104)
	at org.jboss.resteasy.core.interception.MessageBodyWriterContextImpl.proceed(MessageBodyWriterContextImpl.java:123)
	at org.jboss.resteasy.core.ServerResponse.writeTo(ServerResponse.java:250)
	at org.jboss.resteasy.core.SynchronousDispatcher.writeJaxrsResponse(SynchronousDispatcher.java:602)
	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:528)
	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:126)
	at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:208)
	at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:55)
	at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:50)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
	... 28 more

Comment 18 Marco Rietveld 2014-05-19 10:15:35 UTC
Created attachment 897090 [details]
patched kie-services-jaxb jar

Comment 19 Ivo Bek 2014-05-19 11:43:04 UTC
Created attachment 897114 [details]
1st patched - end state chart

Marco, I updated the "1st patched - end state chart". Unfortunately, it does not work either when I added the second jar file. As you can see in the chart, it is slightly better but the memory leaks are still there and I get OutOfMemoryError. I have heapdump if you want it. Did the patch worked for you as you said that you were able to reproduce the problem?

Comment 20 Maciej Swiderski 2014-05-19 12:28:24 UTC
Ivo, would it be possible to execute exact same test cases as you do locally so we can analyze the issue why test suite runs?

Comment 21 Ivo Bek 2014-05-19 12:53:11 UTC
Hi Maciej, 

yes, you can use the exact steps I wrote in the description of the BZ:

4. Clone git://git.app.eng.bos.redhat.com/jbossqe/brms.git
5. In the brms directory, run "mvn clean install -Pjbpm-integration"

Comment 22 Marco Rietveld 2014-05-20 08:08:51 UTC
Created attachment 897491 [details]
Screenshot of heap dump graph (largest dominators)

It looks like the main culprit is (hibernate impl's of) EMF instances. 

In the most recent heap dump (May 19th, from Ivo), the largest dominator is the (tomcat) WebappClassLodaer. If we drill down into which class instances are being held by the WebappClassLoader, we see that that ~95% of that (and thus more than 80% of the total heap) are SessionFactoryImpl class instances. These are the main implementation class for the EntityManagerFactory. 

The most recent commit (a8ca1caf on 6.0.x) improves performance by changing the ProcessRequestBean from ApplicationScoped to RequestScoped -- and one of the main side effects of this seems to be that the EMF in the ProcessRequestBean is not recreated for every request. 

What's odd is that it looks like Tomcat creates a new EMF for every injection point where JBoss (AS/EAP) create's one in memory and then injects that where it's needed. 

And actually, that's not odd at all. As mentioned on stackoverflow[1]: 

"EntityManager's life cycle depends on its type, which can be:

    container-managed transactional,
    container-managed extended,
    application-managed.

The first two are only available in a full-blown application server. So if you're going to stick with a servlet container you're narrowed to the 3rd option."

EMF management on kie-wb on tomcat is of course application-managed -- and it looks like there are a number of places where we're not doing it right. :/ 

[1] http://stackoverflow.com/questions/20166218/how-to-manage-entitymanager-life-cycle-in-cdi-environement-using-tomcat

Comment 23 Marco Rietveld 2014-05-20 08:10:55 UTC
Created attachment 897494 [details]
Screenshot showing SessionFactoryImpl instances as main culprits

Comment 25 Marco Rietveld 2014-05-21 02:23:27 UTC
The commit above moves EMF generation from an non-annotated class to a class annotated with an @ApplicationScoped annotation. 

What's interesting is that JBoss (AS 7 and EAP 6) is smart enough to *not* recreate a new EMF per request, even though the producer class is not @ApplicationScoped. 

On the other hand, Tomcat works as follows: 
1. If a producer class is not annotated with a scope, use @RequestScoped
2. If an EMF instance is produced by the class, create a new EMF instance per request. 

I was able to verify this behaviour here: 
https://github.com/mrietveld/emf-mgmt-example

Comment 26 Marco Rietveld 2014-05-21 02:28:55 UTC
Created attachment 897786 [details]
patched EMF producer classes

The new attached classes (patchedEmfProducerClasses.tgz) should be extracted in the root directory of the exploded business-central.war.

Comment 27 Ivo Bek 2014-05-21 09:36:33 UTC
Created attachment 897889 [details]
2nd patched - end state chart

As you can see in the attached chart, the 2nd patch fixed the problem. There are no more serious memory leaks and the use of processes is very low (under 20%).

Comment 28 Marek Baluch 2014-06-06 14:27:13 UTC
Verified on 6.0.2.ER3.

The tests do not timeout anymore so marking this issue as VERIFIED.