Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1074365

Summary:

App server runs out of memory after deploying zanata.war about 5 times

Product:

[Retired] Zanata

Reporter:

Sean Flanigan <sflaniga>

Component:

Maintainability

Assignee:

Sean Flanigan <sflaniga>

Status:

CLOSED UPSTREAM

QA Contact:

Damian Jansen <djansen>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

development

CC:

camunoz, dchen, djansen, zanata-bugs

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-07-31 01:25:30 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Thread dump	none
Test run log file	none

Description Sean Flanigan 2014-03-10 02:39:12 UTC

Description of problem:

If the webapp is deployed multiple times (eg during development), it causes a PermGen leak, eventually leading to java.lang.OutOfMemoryError: PermGen space

Version-Release number of selected component (if applicable):
3.4.0-SNAPSHOT

How reproducible:
100%

Steps to Reproduce:
1. Deploy zanata.war [touch zanata.war.dodeploy and wait about 25 sec]
2. Undeploy zanata.war [touch zanata.war.undeploy and wait about 5 sec]
3. Generate a thread dump using Ctrl-\
4. Repeat 5 or 6 times

Actual results:
 - Thread dump shows "log4j mail appender" and "GWTEventService-UserActivityScheduler" still running after each undeploy.
 - After about 5 iterations: java.lang.OutOfMemoryError: PermGen space, server unresponsive

Expected results:
 - Thread dump does not show "log4j mail appender" and "GWTEventService-UserActivityScheduler" running.
 - No OutOfMemoryError.

Additional info:
https://github.com/mjiderhamn/classloader-leak-prevention looks promising

Comment 1 Sean Flanigan 2014-03-10 02:44:32 UTC

PR: https://github.com/zanata/zanata-server/pull/388

Comment 2 Carlos Munoz 2014-03-10 04:08:26 UTC

We should take look at removing the shutdown hook or the @Destroy invocation since it seems both are being called and one them is unnecessary. My vote is for removing the shutdown hook. In the meanwhile, it can act as a double level of assurance that memory leaks are being prevented.

Comment 3 Sean Flanigan 2014-03-10 04:59:01 UTC

Shutdown hook removed.

Comment 5 Damian Jansen 2014-03-11 00:14:47 UTC

How to:

1) Run JBoss
2) touch $JBOSS_HOME/standalone/deployments/zanata.war.dodeploy
3) Repeat 2) 5 - 10 times

Comment 6 Damian Jansen 2014-03-11 00:24:00 UTC

Created attachment 872908 [details]
Thread dump

Comment 7 Damian Jansen 2014-03-11 00:24:55 UTC

Created attachment 872909 [details]
Test run log file

Comment 8 Damian Jansen 2014-03-11 00:26:08 UTC

Dodeploy may potentially be a different scenario, but still causes the same error.

Comment 9 Sean Flanigan 2014-03-11 07:43:20 UTC

Another "how to" for maximum turnaround:

1. Run jboss
2. Run the following:

    while true; do if test ! -f ROOT.war.isdeploying; then echo -n 'Deploying: '; date; touch ROOT.war.dodeploy; sleep 5; else sleep 1; fi; done

Deploying: Tue Mar 11 15:42:38 EST 2014
Deploying: Tue Mar 11 15:43:07 EST 2014
Deploying: Tue Mar 11 15:43:34 EST 2014
Deploying: Tue Mar 11 15:43:59 EST 2014
Deploying: Tue Mar 11 15:44:27 EST 2014
Deploying: Tue Mar 11 15:44:56 EST 2014
Deploying: Tue Mar 11 15:45:21 EST 2014
Deploying: Tue Mar 11 15:45:47 EST 2014
Deploying: Tue Mar 11 15:46:19 EST 2014
Deploying: Tue Mar 11 15:46:43 EST 2014
Deploying: Tue Mar 11 15:47:09 EST 2014
Deploying: Tue Mar 11 15:47:36 EST 2014
Deploying: Tue Mar 11 15:48:01 EST 2014
Deploying: Tue Mar 11 15:48:27 EST 2014
Deploying: Tue Mar 11 15:48:56 EST 2014
Deploying: Tue Mar 11 15:49:20 EST 2014
Deploying: Tue Mar 11 15:49:46 EST 2014
Deploying: Tue Mar 11 15:50:15 EST 2014
Deploying: Tue Mar 11 15:50:39 EST 2014
Deploying: Tue Mar 11 15:51:04 EST 2014
Deploying: Tue Mar 11 15:51:34 EST 2014
Deploying: Tue Mar 11 15:51:59 EST 2014
Deploying: Tue Mar 11 15:52:26 EST 2014
Deploying: Tue Mar 11 15:52:54 EST 2014


I did several more (unlogged) successful deployments while getting the above one-liner worked out, plus the ones above, so "it works on my machine".  (Incidentally, it takes 25-30 seconds for each cycle.)

Possibilities:
- I may have more heap (but we are probably both using the EAP defaults)
- threading/concurrency (perhaps quick redeploys don't give the leaking thread a chance to execute)
- different version of EAP (I'm using 6.2.0).


This version waits at least 60 seconds each cycle (thus about 30 seconds sitting idle between deploys):

    while true; do if test ! -f ROOT.war.isdeploying; then echo -n 'Deploying: '; date; touch ROOT.war.dodeploy; sleep 60; else sleep 1; fi; done

Deploying: Tue Mar 11 15:53:26 EST 2014
Deploying: Tue Mar 11 15:54:31 EST 2014
Deploying: Tue Mar 11 15:55:36 EST 2014
Deploying: Tue Mar 11 15:56:41 EST 2014
Deploying: Tue Mar 11 15:57:46 EST 2014
Deploying: Tue Mar 11 15:58:13 EST 2014
Deploying: Tue Mar 11 15:59:13 EST 2014
Deploying: Tue Mar 11 16:00:13 EST 2014
Deploying: Tue Mar 11 16:01:13 EST 2014
Deploying: Tue Mar 11 16:02:13 EST 2014
Deploying: Tue Mar 11 16:03:13 EST 2014
Deploying: Tue Mar 11 16:04:13 EST 2014
Deploying: Tue Mar 11 16:05:13 EST 2014

(I did have sleep 65 for the first few, which is why the early times are different.)

I restarted jboss and it has deployed another 55+ times.


The problem might have something to do with ehcache, which left a daemon thread "net.sf.ehcache.CacheManager@2525dc83" hanging around in djansen's thread dump.  ehcache-core-2.5.1.jar must have been loaded from the webapp, not the appserver.

ClassLoaderLeakPreventor should have stopped this thread, I think.   But perhaps ServletContextListener.contextDestroyed() is not get called for failed deployments (at least when permgen is exhausted)?

Comment 10 Sean Flanigan 2014-03-12 03:00:21 UTC

I ran this overnight, and managed to get a permgen error:

    while true; do if test ! -f ROOT.war.isdeploying; then echo -n 'Deploying: '; date; touch ROOT.war.dodeploy; sleep 25; curl --retry 10 http://localhost:8080/aoueoaeua >/dev/null; else sleep 1; fi; done


Possibly irrelevant, but I found these in the thread dump:

"log4j mail appender" daemon prio=10 tid=0x00007f297df41800 nid=0x33be in Object.wait() [0x00007f2967b2c000]
   java.lang.Thread.State: WAITING (on object monitor)                                                      
        at java.lang.Object.wait(Native Method)                                                             
        - waiting on <0x00000000bb036d38> (a java.util.TaskQueue)                                           
        at java.lang.Object.wait(Object.java:503)                                                           
        at java.util.TimerThread.mainLoop(Timer.java:526)                                                   
        - locked <0x00000000bb036d38> (a java.util.TaskQueue)                                               
        at java.util.TimerThread.run(Timer.java:505)                                                        

"log4j mail appender" daemon prio=10 tid=0x00007f297c929000 nid=0x1c33 in Object.wait() [0x00007f29c482b000]
   java.lang.Thread.State: WAITING (on object monitor)                                                      
        at java.lang.Object.wait(Native Method)                                                             
        - waiting on <0x00000000b847ad38> (a java.util.TaskQueue)                                           
        at java.lang.Object.wait(Object.java:503)                                                           
        at java.util.TimerThread.mainLoop(Timer.java:526)                                                   
        - locked <0x00000000b847ad38> (a java.util.TaskQueue)                                               
        at java.util.TimerThread.run(Timer.java:505)

Comment 11 Sean Flanigan 2014-03-28 06:18:20 UTC

This problem can be annoying, but it only affects development, so I wouldn't say it is urgent.
 
Note that https://github.com/zanata/zanata-server/pull/388 has been merged.  It does appear to prevent the memory leak, as long as you don't actually submit any HTTP requests to Zanata!  (So the test in https://bugzilla.redhat.com/show_bug.cgi?id=1074365#c9 passes.)

But something that happens when processing the first HTTP request (some lazy-loaded component?) does still leak.  So we're keeping the bug open until we can pass the test in https://bugzilla.redhat.com/show_bug.cgi?id=1074365#c10

Comment 12 Zanata Migrator 2015-07-31 01:25:30 UTC

Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-496