Bug 1077744 - statement timeout during drift purge job
Summary: statement timeout during drift purge job
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Core Server
Version: JON 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ER04
: JON 3.3.0
Assignee: Thomas Segismont
QA Contact: Filip Brychta
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-18 13:58 UTC by Armine Hovsepyan
Modified: 2015-09-03 00:02 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-11 14:04:42 UTC
Type: Bug


Attachments (Terms of Use)
server.log (78.65 KB, text/x-log)
2014-03-18 13:58 UTC, Armine Hovsepyan
no flags Details

Description Armine Hovsepyan 2014-03-18 13:58:03 UTC
Created attachment 875950 [details]
server.log

Description of problem:
statement timeout during drift dara purge job

Version-Release number of selected component (if applicable):
jon 3.2.0 long running instance

How reproducible:


Steps to Reproduce:
1. install 3.2.0 and run for >15 days
2. crone job doing "cli log in - log out" runs every 1 min
3. look into server.log

Actual results:
statement timeout exception during drift dara purge job

Expected results:
no statement timeout exception during the drift data purge job


Additional info:
fragment from server.log attached

instance is: http://long-jon32.bc.jonqe.lab.eng.bos.redhat.com:7080/

Comment 2 Thomas Segismont 2014-08-22 15:16:28 UTC
Are you sure the steps from the Description are sufficient? I believe that to make the JPADrift purge timeout you need to add quite a number of files. Which resource in Inventory had Drift activated?

Comment 3 Jay Shaughnessy 2014-08-27 21:02:12 UTC
I don't see how this can have anything to do with the cron job in question.  It has to do either with a slow DB or a whole bunch of drift purge.

What's interesting maybe is that it timed out after 30s.  I would have thought a longer statement timeout would be in play.  I wonder if it has to do with it being a call through a server plugin?  We need to find out if 30s is what we expect.

Comment 4 Jay Shaughnessy 2014-08-27 21:03:36 UTC
We should look at this at least to see if we can understand the timeout.

Comment 5 Thomas Segismont 2014-08-27 21:18:49 UTC
I had not seen that Jay. Eagle eye.

Looking again at the stack trace, it appears that this come from the "statement_timeout" Postgresl configuration parameter, which has been set to 30 seconds, probably because the RHQ wiki recommends it:
https://docs.jboss.org/author/display/RHQ/PostgreSQL#PostgreSQL-postgresql.conf

There was a similar (baseline insertion) issue reported:
https://community.jboss.org/thread/146172

Comment 6 Thomas Segismont 2014-09-10 16:03:37 UTC
Moving to ER04, as this will probably require quite some investigation.

Armine, can you answer the questions in Comment 2? Can you reproduce on 3.3 ER02?

Comment 7 Thomas Segismont 2014-09-25 13:01:15 UTC
I had no indication on how to reproduce so I did some testing.

Found a big issue with the JPA drift purge query (it wouldn't find the rows to purge in many cases).



commit 3251090d63f58f0da3d604f30cc8f819533e0fcc
Author: Thomas Segismont <tsegismo@redhat.com>
Date:   Thu Sep 25 14:17:10 2014 +0200

    Changed the query WHERE clause: don't use NOT IN because NOT IN(NULL) will always return false.
    See http://stackoverflow.com/questions/17150208/sql-query-with-not-in-returning-no-rows
    
    In other words, if any row in RHQ_DRIFT as a NULL value in OLD_DRIFT_FILE or NEW_DRIFT_FILE columns, we'll never purge anything.
    
    Besides, the NOT EXISTS variant has a better execution plan.
    
    Use PurgeTemplate to delete rows in batches and select the keys of the rows to delete without locking the drift files table.
    
    Also, some code cleanup.

Comment 8 Libor Zoubek 2014-09-26 14:51:39 UTC
branch:  release/jon3.3.x
link:    https://github.com/rhq-project/rhq/commit/f4d09f5c2
time:    2014-09-26 16:51:13 +0200
commit:  f4d09f5c229d6d69ad556e4ae7fcba03a24ee513
author:  Thomas Segismont - tsegismo@redhat.com
message: Bug 1077744 - statement timeout during drift purge job
         Changed the query WHERE clause: don't use NOT IN because NOT
         IN(NULL) will always return false. See
         http://stackoverflow.com/questions/17150208/sql-query-with-not-in-returning-no-rows
         In other words, if any row in RHQ_DRIFT as a NULL value in
         OLD_DRIFT_FILE or NEW_DRIFT_FILE columns, we'll never purge
         anything.
         Besides, the NOT EXISTS variant has a better execution plan.
         Use PurgeTemplate to delete rows in batches and select the keys
         of the rows to delete without locking the drift files table.
         Also, some code cleanup.
         (cherry picked from commit
         3251090d63f58f0da3d604f30cc8f819533e0fcc) Signed-off-by: Libor
         Zoubek <lzoubek@redhat.com>

Comment 9 Simeon Pinder 2014-10-01 21:33:26 UTC
Moving to ON_QA as available for test with build:
https://brewweb.devel.redhat.com/buildinfo?buildID=388959

Comment 11 Filip Brychta 2014-10-30 08:13:47 UTC
Verified on
Version :	
3.3.0.ER04
Build Number :	
99d2107:d7c537e


Note You need to log in before you can comment on or make changes to this bug.