Bug 1074365
| Summary: | App server runs out of memory after deploying zanata.war about 5 times | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] Zanata | Reporter: | Sean Flanigan <sflaniga> | ||||||
| Component: | Maintainability | Assignee: | Sean Flanigan <sflaniga> | ||||||
| Status: | CLOSED UPSTREAM | QA Contact: | Damian Jansen <djansen> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | development | CC: | camunoz, dchen, djansen, zanata-bugs | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-07-31 01:25:30 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Sean Flanigan
2014-03-10 02:39:12 UTC
We should take look at removing the shutdown hook or the @Destroy invocation since it seems both are being called and one them is unnecessary. My vote is for removing the shutdown hook. In the meanwhile, it can act as a double level of assurance that memory leaks are being prevented. Shutdown hook removed. How to: 1) Run JBoss 2) touch $JBOSS_HOME/standalone/deployments/zanata.war.dodeploy 3) Repeat 2) 5 - 10 times Created attachment 872908 [details]
Thread dump
Created attachment 872909 [details]
Test run log file
Dodeploy may potentially be a different scenario, but still causes the same error. Another "how to" for maximum turnaround:
1. Run jboss
2. Run the following:
while true; do if test ! -f ROOT.war.isdeploying; then echo -n 'Deploying: '; date; touch ROOT.war.dodeploy; sleep 5; else sleep 1; fi; done
Deploying: Tue Mar 11 15:42:38 EST 2014
Deploying: Tue Mar 11 15:43:07 EST 2014
Deploying: Tue Mar 11 15:43:34 EST 2014
Deploying: Tue Mar 11 15:43:59 EST 2014
Deploying: Tue Mar 11 15:44:27 EST 2014
Deploying: Tue Mar 11 15:44:56 EST 2014
Deploying: Tue Mar 11 15:45:21 EST 2014
Deploying: Tue Mar 11 15:45:47 EST 2014
Deploying: Tue Mar 11 15:46:19 EST 2014
Deploying: Tue Mar 11 15:46:43 EST 2014
Deploying: Tue Mar 11 15:47:09 EST 2014
Deploying: Tue Mar 11 15:47:36 EST 2014
Deploying: Tue Mar 11 15:48:01 EST 2014
Deploying: Tue Mar 11 15:48:27 EST 2014
Deploying: Tue Mar 11 15:48:56 EST 2014
Deploying: Tue Mar 11 15:49:20 EST 2014
Deploying: Tue Mar 11 15:49:46 EST 2014
Deploying: Tue Mar 11 15:50:15 EST 2014
Deploying: Tue Mar 11 15:50:39 EST 2014
Deploying: Tue Mar 11 15:51:04 EST 2014
Deploying: Tue Mar 11 15:51:34 EST 2014
Deploying: Tue Mar 11 15:51:59 EST 2014
Deploying: Tue Mar 11 15:52:26 EST 2014
Deploying: Tue Mar 11 15:52:54 EST 2014
I did several more (unlogged) successful deployments while getting the above one-liner worked out, plus the ones above, so "it works on my machine". (Incidentally, it takes 25-30 seconds for each cycle.)
Possibilities:
- I may have more heap (but we are probably both using the EAP defaults)
- threading/concurrency (perhaps quick redeploys don't give the leaking thread a chance to execute)
- different version of EAP (I'm using 6.2.0).
This version waits at least 60 seconds each cycle (thus about 30 seconds sitting idle between deploys):
while true; do if test ! -f ROOT.war.isdeploying; then echo -n 'Deploying: '; date; touch ROOT.war.dodeploy; sleep 60; else sleep 1; fi; done
Deploying: Tue Mar 11 15:53:26 EST 2014
Deploying: Tue Mar 11 15:54:31 EST 2014
Deploying: Tue Mar 11 15:55:36 EST 2014
Deploying: Tue Mar 11 15:56:41 EST 2014
Deploying: Tue Mar 11 15:57:46 EST 2014
Deploying: Tue Mar 11 15:58:13 EST 2014
Deploying: Tue Mar 11 15:59:13 EST 2014
Deploying: Tue Mar 11 16:00:13 EST 2014
Deploying: Tue Mar 11 16:01:13 EST 2014
Deploying: Tue Mar 11 16:02:13 EST 2014
Deploying: Tue Mar 11 16:03:13 EST 2014
Deploying: Tue Mar 11 16:04:13 EST 2014
Deploying: Tue Mar 11 16:05:13 EST 2014
(I did have sleep 65 for the first few, which is why the early times are different.)
I restarted jboss and it has deployed another 55+ times.
The problem might have something to do with ehcache, which left a daemon thread "net.sf.ehcache.CacheManager@2525dc83" hanging around in djansen's thread dump. ehcache-core-2.5.1.jar must have been loaded from the webapp, not the appserver.
ClassLoaderLeakPreventor should have stopped this thread, I think. But perhaps ServletContextListener.contextDestroyed() is not get called for failed deployments (at least when permgen is exhausted)?
I ran this overnight, and managed to get a permgen error:
while true; do if test ! -f ROOT.war.isdeploying; then echo -n 'Deploying: '; date; touch ROOT.war.dodeploy; sleep 25; curl --retry 10 http://localhost:8080/aoueoaeua >/dev/null; else sleep 1; fi; done
Possibly irrelevant, but I found these in the thread dump:
"log4j mail appender" daemon prio=10 tid=0x00007f297df41800 nid=0x33be in Object.wait() [0x00007f2967b2c000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000bb036d38> (a java.util.TaskQueue)
at java.lang.Object.wait(Object.java:503)
at java.util.TimerThread.mainLoop(Timer.java:526)
- locked <0x00000000bb036d38> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"log4j mail appender" daemon prio=10 tid=0x00007f297c929000 nid=0x1c33 in Object.wait() [0x00007f29c482b000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000b847ad38> (a java.util.TaskQueue)
at java.lang.Object.wait(Object.java:503)
at java.util.TimerThread.mainLoop(Timer.java:526)
- locked <0x00000000b847ad38> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
This problem can be annoying, but it only affects development, so I wouldn't say it is urgent. Note that https://github.com/zanata/zanata-server/pull/388 has been merged. It does appear to prevent the memory leak, as long as you don't actually submit any HTTP requests to Zanata! (So the test in https://bugzilla.redhat.com/show_bug.cgi?id=1074365#c9 passes.) But something that happens when processing the first HTTP request (some lazy-loaded component?) does still leak. So we're keeping the bug open until we can pass the test in https://bugzilla.redhat.com/show_bug.cgi?id=1074365#c10 Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-496 |