Created attachment 832008 [details]
Jbossweb containing patch for BZ#1032552 (http://anonsvn.jboss.org/repos/jbossweb/branches/7.3.x/) with updated module.xml
Description of problem:
When running from java multiple CLI operations it consumes with each operation more and more memory resulting in the end in OOM
Version-Release number of selected component (if applicable):
How reproducible: always
Steps to Reproduce:
1. download and unpack EAP and ideally apply patch for BZ#1032552
2. git clone https://github.com/psakar/eap-ws-management-tests.git
4. mvn clean verify -Djboss.home=$JBOSS_HOME -Dit.test=org/jboss/qa/management/ws/cli/**/ReloadIT.java -DreadCommandCount=5000 2>&1 | tee log.txt, where $JBOSS_HOME points to installed EAP
5. results with OOM, or view how is the heap increasing by each CLI operation invocation
Actual results: The memory usage is not cleaned after previous CLI commands resulting in OOM
Expected results: The memory usage will not be continuously being increased by each CLI command invocation
I have uploaded heapdump to nfs-01.eng.brq.redhat.com:/exports/scratch/rhatlapa/oomInCLI.hprof
The issue might have been fixed in https://github.com/jboss-remoting/jboss-remoting/commit/a93b2454683cd1224be2
This has to be tested against the current wildfly master. Unfortunately, I haven't been able to update the dependencies of the test to run against the wildfly so far. Mostly arquillian related. Some of the dependencies were missing in the wildfly and couldn't be satisfied. As the result the wildfly container would start but the connection couldn't be established and the test wouldn't run.
Here is the error in the log that I see
ERROR [org.jboss.remoting.remote.connection] (Remoting "fedorka:MANAGEMENT" I/O-1) JBREM000200: Remote connection failed: java.io.IOException: XNIO000804: Received an invalid message length of 1195725856
Looks like a dependency issue but so far I couldn't figure it out.
I have updated the cli reproducer to work with wildfly from master (wildfly 8.0.0.Beta2-SNAPSHOT) and pushed the changes to the reproducer located in git://git.app.eng.bos.redhat.com/jbossqe-eap-tests-management.git branch cliOOMIssueReproducer
With this version I am still hitting the OOM error.
According to my test runs, this https://github.com/dmlloyd/xnio/commit/266d9116d6c9f1d290099ca68c6162fb8c0f3759 fixes the issue.
And after more testing I found one more leak this time in the CLI.
There is a class called CLIShutdownHook with a static list referencing all the created user contexts (or sessions). The point of it is to make sure they are all properly closed when the JVM exits. But during the runtime they are only accumulated and never even if the context was properly closed by the user (which normally means the end of the interactive session but in case of a test like this it may lead to an OOME).
Pull request with the fix for the CLI sent (specified on the WFLY issue).
Alexey Loubyansky <email@example.com> updated the status of jira WFLY-2644 to Resolved
The issue is still not resolved - see https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-WS/job/eap-6.x-count-of-open-files/jdk=jdk1.6.unlimited.BC,label=RHEL6/17/console
Number of threads before reloading server is 38, after reload is 41. Thus there is still some leak, though you fixed some. I do not say it is in CLI - you can find out which threads are new comparing thread dumps which are in archived artifact created by job.
The original issue was fixed in EAP 6.3.0.DR4. For the issue mentioned in #c9 I have created a separate BZ: BZ#1079362
=> changing status to verified