Bug 752110

Summary: RHQ server OOM error
Product: [Other] RHQ Project Reporter: Alan Santos <asantos>
Component: No ComponentAssignee: Nobody <nobody>
Status: NEW --- QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.2CC: hrupp, kejohnso
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
latest server log
none
agent log none

Description Alan Santos 2011-11-08 15:27:38 UTC
Description of problem:
Left RHQ server 4.2 running over night. Unresponsive in the AM, logs showed the server was out of heap space. 

Version-Release number of selected component (if applicable):


How reproducible:
Happened 2x.

Steps to Reproduce:
1.Started server
2.Imported agent running on same machine
3.Configured Apache, Postgres
4. Created drift template, made a change to monitored directory
5. Went away for a long time.

Comment 1 Alan Santos 2011-11-08 15:30:23 UTC
Created attachment 532312 [details]
latest server log

Comment 2 Alan Santos 2011-11-08 15:31:31 UTC
Created attachment 532313 [details]
agent log

Comment 3 Alan Santos 2011-11-08 15:32:43 UTC
I have heap dump if helpful, but it's 1.3 GB

Comment 4 Charles Crouch 2011-11-09 18:52:34 UTC
I note that you are running on H2. I presume you haven't seen this on PG or 
ORA? From the server logs there appear to be a lot of DB related errors prior 
to the memory issues, e.g. trying to redeliver a message to the JMS dead letter 
queue (dlq) nearly 900,000 times. Also exceptions such as "Caused by: 
java.lang.ClassCastException: org.h2.jdbc.JdbcBlob cannot be cast to 
org.jboss.mq.SpyMessage" that I'm very suspicious of. 

Given the secondary level support we have for H2, below PG and Oracle, I don't 
see this a release blocker. Mike confirmed that he frequently runs drift 
overnight using Oracle as the DB and has not observed similar issues. I will 
make this issue block the drift tracking bug, but set it as medium priority. 
Once we're done with major drift related issues for the release, we can see if 
we can reproduce this.

Comment 5 Alan Santos 2011-11-09 19:02:04 UTC
that's fair. I'm only this build/database on the laptop that's acting like a desktop.  I'll swap out the database to pg and see if it reoccurs. 

I guess - secondary to this bz - is what caused the missing queue. Is it possible to simulate a similar JMS dead letter failure using another database?

Comment 6 Heiko W. Rupp 2011-11-09 20:10:39 UTC
We used to have huge issues when jms was on hsqldb (the predecessor of h2). While they say h2 is much much better than hsqldb, it may be the same here in the end.