Created attachment 875950 [details] server.log Description of problem: statement timeout during drift dara purge job Version-Release number of selected component (if applicable): jon 3.2.0 long running instance How reproducible: Steps to Reproduce: 1. install 3.2.0 and run for >15 days 2. crone job doing "cli log in - log out" runs every 1 min 3. look into server.log Actual results: statement timeout exception during drift dara purge job Expected results: no statement timeout exception during the drift data purge job Additional info: fragment from server.log attached instance is: http://long-jon32.bc.jonqe.lab.eng.bos.redhat.com:7080/
Are you sure the steps from the Description are sufficient? I believe that to make the JPADrift purge timeout you need to add quite a number of files. Which resource in Inventory had Drift activated?
I don't see how this can have anything to do with the cron job in question. It has to do either with a slow DB or a whole bunch of drift purge. What's interesting maybe is that it timed out after 30s. I would have thought a longer statement timeout would be in play. I wonder if it has to do with it being a call through a server plugin? We need to find out if 30s is what we expect.
We should look at this at least to see if we can understand the timeout.
I had not seen that Jay. Eagle eye. Looking again at the stack trace, it appears that this come from the "statement_timeout" Postgresl configuration parameter, which has been set to 30 seconds, probably because the RHQ wiki recommends it: https://docs.jboss.org/author/display/RHQ/PostgreSQL#PostgreSQL-postgresql.conf There was a similar (baseline insertion) issue reported: https://community.jboss.org/thread/146172
Moving to ER04, as this will probably require quite some investigation. Armine, can you answer the questions in Comment 2? Can you reproduce on 3.3 ER02?
I had no indication on how to reproduce so I did some testing. Found a big issue with the JPA drift purge query (it wouldn't find the rows to purge in many cases). commit 3251090d63f58f0da3d604f30cc8f819533e0fcc Author: Thomas Segismont <tsegismo> Date: Thu Sep 25 14:17:10 2014 +0200 Changed the query WHERE clause: don't use NOT IN because NOT IN(NULL) will always return false. See http://stackoverflow.com/questions/17150208/sql-query-with-not-in-returning-no-rows In other words, if any row in RHQ_DRIFT as a NULL value in OLD_DRIFT_FILE or NEW_DRIFT_FILE columns, we'll never purge anything. Besides, the NOT EXISTS variant has a better execution plan. Use PurgeTemplate to delete rows in batches and select the keys of the rows to delete without locking the drift files table. Also, some code cleanup.
branch: release/jon3.3.x link: https://github.com/rhq-project/rhq/commit/f4d09f5c2 time: 2014-09-26 16:51:13 +0200 commit: f4d09f5c229d6d69ad556e4ae7fcba03a24ee513 author: Thomas Segismont - tsegismo message: Bug 1077744 - statement timeout during drift purge job Changed the query WHERE clause: don't use NOT IN because NOT IN(NULL) will always return false. See http://stackoverflow.com/questions/17150208/sql-query-with-not-in-returning-no-rows In other words, if any row in RHQ_DRIFT as a NULL value in OLD_DRIFT_FILE or NEW_DRIFT_FILE columns, we'll never purge anything. Besides, the NOT EXISTS variant has a better execution plan. Use PurgeTemplate to delete rows in batches and select the keys of the rows to delete without locking the drift files table. Also, some code cleanup. (cherry picked from commit 3251090d63f58f0da3d604f30cc8f819533e0fcc) Signed-off-by: Libor Zoubek <lzoubek>
Moving to ON_QA as available for test with build: https://brewweb.devel.redhat.com/buildinfo?buildID=388959
Verified on Version : 3.3.0.ER04 Build Number : 99d2107:d7c537e