Bug 806060

Summary: OutOfMemory Error in tomcat6 when comparing config file revisions
Product: [Community] Spacewalk Reporter: Stephen Herr <sherr>
Component: WebUIAssignee: Stephen Herr <sherr>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.8CC: akarlsso, cperry, mhuth, tpapaioa, xdmoon
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 729210 Environment:
Last Closed: 2012-11-01 16:18:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 729210    
Bug Blocks: 871344    

Description Stephen Herr 2012-03-22 19:32:19 UTC
+++ This bug was initially created as a clone of Bug #729210 +++

Description of problem:
When trying to compare the revisions of a config file, satellite encounters an Internal Server Error, and an OutOfMemory error is recorded in the tomcat catalina.out log.

<catalina.out>
Caused by: 
java.lang.OutOfMemoryError
	at com.redhat.rhn.common.filediff.Trace.fork(Trace.java:108)
	at com.redhat.rhn.common.filediff.Trace.step(Trace.java:170)
	at com.redhat.rhn.common.filediff.Differ.step(Differ.java:76)
	at com.redhat.rhn.common.filediff.Differ.diff(Differ.java:49)
	at com.redhat.rhn.common.filediff.Diff.diffFiles(Diff.java:92)
	at com.redhat.rhn.common.filediff.Diff.htmlDiff(Diff.java:51)
	at com.redhat.rhn.frontend.action.configuration.files.DiffAction.performFileDiff(DiffAction.java:100)
	at com.redhat.rhn.frontend.action.configuration.files.DiffAction.execute(DiffAction.java:70)
</catalina.out>

Version-Release number of selected component (if applicable):
Satellite 5.4.1 on RHEL6.1 64bit
tomcat6-6.0.24-33.el6.noarch
spacewalk-java-1.2.39-89.el6sat.noarch

How reproducible:
Always when comparing the config files in the link below. (not all config file comparisons cause the problem, but comparing these 2 files always causes the problem)

Steps to Reproduce:
1. Create a config file
2. Upload the files listed in the attachment as newer versions (sequentially)
3. Click on "Compare File" button
3. Click on the 'View Comparison' button
  
Actual results:
After about 30seconds an Internal Server Error appears in the WebUI.  In the catalina.out, a 'java.lang.OutOfMemoryError' error appears.

Expected results:
No out of memory error.

Additional info:
Increasing the tomcat6 heap size doesn't resolve the issue.  That is, I modified the following in /etc/tomcat6/tomcat6.conf:

JAVA_OPTS="$JAVA_OPTS -ea -Xms2048m -Xmx2048m -Djava.awt.headless=true -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser -XX:MaxNewSize=2048 -XX:-UseConcMarkSweepGC"

However tomcat didn't get an OutOfMemory error now, but I saw in the output of top that it quickly consumed its 2G of heap space (by observing the RES column) and the java process stayed pegged at 100% CPU utilization (indefinately - until I restarted satellite).

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12513 tomcat    20   0 3328m 2.1g  13m S 99.2 27.7   2:07.50 java

--- Additional comment from mhuth on 2011-08-09 04:01:43 EDT ---

I forgot to mention in the 'Additional info' section in my previous update, with regards to increasing the java heap size, that about a minute after clicking on the 'View Comparison' button, an ISE appears in the satellite UI.  No OutOfMemory errors appear in the catalina.out this time, but in the httpd error logs there appear ajp timeout errors regarding getting a response back from tomcat.  This is all in conjunction with what I mentioned previously about the 100% CPU usage of the java/tomcat process and full use of the increased heap memory space.

Cheers,
Mark


--- Additional comment from mhuth on 2011-08-26 03:21:45 EDT ---

Produced the OutOfMemoryError again with a different set of configuration files.  My own this time so I've attached them to the ticket for reproducing.

java.lang.OutOfMemoryError 
 at com.redhat.rhn.common.filediff.Edit.copy(Edit.java:82)        
 at com.redhat.rhn.common.filediff.Trace.fork(Trace.java:106)
 at com.redhat.rhn.common.filediff.Trace.step(Trace.java:170)
 at com.redhat.rhn.common.filediff.Differ.step(Differ.java:85)
 at com.redhat.rhn.common.filediff.Differ.diff(Differ.java:49)
 at com.redhat.rhn.common.filediff.Diff.diffFiles(Diff.java:92)
 at com.redhat.rhn.common.filediff.Diff.htmlDiff(Diff.java:51)


In this new situation, and as with the original reproducer, there are quite a number of differences between the files that are being compared and the differences occur all throughout the file.  

Seems the problem is in the recursive step method of the Trace class in com/redhat/rhn/common/filediff/Trace.java.

--- Additional comment from mhuth on 2011-08-26 03:27:24 EDT ---

Created attachment 520021 [details]
Reproducer file v1

Add this file to the configuration channel.

--- Additional comment from mhuth on 2011-08-26 03:29:26 EDT ---

Created attachment 520022 [details]
Reproducer file v2

Then upload this file as a revision of the first file.  Then try comparing the 2 files.  This will reproduce the tomcat out of memory error.

Comment 1 Stephen Herr 2012-03-22 19:57:03 UTC
The crux of the issue is that the diff algorithm we use branches and explores every possibility of resolution when it encounters a difference in the two files. These branches are explored concurrently so that we can choose not bother exploring branches if we know that there is simpler explanation for the difference. However, we only *know* that a branch is sub-optimal if a better branch has made it all the way to the end of the file.

This means that for large files with many changes we can end up branching many, many, many times before any of the branches actually reach the end of the file. The growth of the branches is exponential for each change we encounter, resulting in the out of memory error.

The fix that I have implemented is to simply limit the number of branches to our best guess at the 1000 most efficient branches. This should still give us the optimal diff between the two files the vast majority of the time, while greatly cutting back on the number of branches we have to explore. The resulting diff is still guaranteed to be *correct*, but not necessarily *optimal*.

Committed to spacewalk master: 623df35a7ee271e713f908dced5f04d0ce06d434

Comment 3 Jan Pazdziora 2012-10-30 19:23:01 UTC
Moving ON_QA. Packages that address this bugzilla should now be available in yum repos at http://yum.spacewalkproject.org/nightly/

Comment 4 Jan Pazdziora 2012-11-01 16:18:04 UTC
Spacewalk 1.8 has been released: https://fedorahosted.org/spacewalk/wiki/ReleaseNotes18