Bug 1077744

Summary: statement timeout during drift purge job
Product: [JBoss] JBoss Operations Network Reporter: Armine Hovsepyan <ahovsepy>
Component: Core ServerAssignee: Thomas Segismont <tsegismo>
Status: CLOSED CURRENTRELEASE QA Contact: Filip Brychta <fbrychta>
Severity: high Docs Contact:
Priority: unspecified    
Version: JON 3.2CC: ahovsepy, jshaughn, lzoubek, mfoley, tsegismo
Target Milestone: ER04   
Target Release: JON 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-12-11 14:04:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
server.log none

Description Armine Hovsepyan 2014-03-18 13:58:03 UTC
Created attachment 875950 [details]
server.log

Description of problem:
statement timeout during drift dara purge job

Version-Release number of selected component (if applicable):
jon 3.2.0 long running instance

How reproducible:


Steps to Reproduce:
1. install 3.2.0 and run for >15 days
2. crone job doing "cli log in - log out" runs every 1 min
3. look into server.log

Actual results:
statement timeout exception during drift dara purge job

Expected results:
no statement timeout exception during the drift data purge job


Additional info:
fragment from server.log attached

instance is: http://long-jon32.bc.jonqe.lab.eng.bos.redhat.com:7080/

Comment 2 Thomas Segismont 2014-08-22 15:16:28 UTC
Are you sure the steps from the Description are sufficient? I believe that to make the JPADrift purge timeout you need to add quite a number of files. Which resource in Inventory had Drift activated?

Comment 3 Jay Shaughnessy 2014-08-27 21:02:12 UTC
I don't see how this can have anything to do with the cron job in question.  It has to do either with a slow DB or a whole bunch of drift purge.

What's interesting maybe is that it timed out after 30s.  I would have thought a longer statement timeout would be in play.  I wonder if it has to do with it being a call through a server plugin?  We need to find out if 30s is what we expect.

Comment 4 Jay Shaughnessy 2014-08-27 21:03:36 UTC
We should look at this at least to see if we can understand the timeout.

Comment 5 Thomas Segismont 2014-08-27 21:18:49 UTC
I had not seen that Jay. Eagle eye.

Looking again at the stack trace, it appears that this come from the "statement_timeout" Postgresl configuration parameter, which has been set to 30 seconds, probably because the RHQ wiki recommends it:
https://docs.jboss.org/author/display/RHQ/PostgreSQL#PostgreSQL-postgresql.conf

There was a similar (baseline insertion) issue reported:
https://community.jboss.org/thread/146172

Comment 6 Thomas Segismont 2014-09-10 16:03:37 UTC
Moving to ER04, as this will probably require quite some investigation.

Armine, can you answer the questions in Comment 2? Can you reproduce on 3.3 ER02?

Comment 7 Thomas Segismont 2014-09-25 13:01:15 UTC
I had no indication on how to reproduce so I did some testing.

Found a big issue with the JPA drift purge query (it wouldn't find the rows to purge in many cases).



commit 3251090d63f58f0da3d604f30cc8f819533e0fcc
Author: Thomas Segismont <tsegismo>
Date:   Thu Sep 25 14:17:10 2014 +0200

    Changed the query WHERE clause: don't use NOT IN because NOT IN(NULL) will always return false.
    See http://stackoverflow.com/questions/17150208/sql-query-with-not-in-returning-no-rows
    
    In other words, if any row in RHQ_DRIFT as a NULL value in OLD_DRIFT_FILE or NEW_DRIFT_FILE columns, we'll never purge anything.
    
    Besides, the NOT EXISTS variant has a better execution plan.
    
    Use PurgeTemplate to delete rows in batches and select the keys of the rows to delete without locking the drift files table.
    
    Also, some code cleanup.

Comment 8 Libor Zoubek 2014-09-26 14:51:39 UTC
branch:  release/jon3.3.x
link:    https://github.com/rhq-project/rhq/commit/f4d09f5c2
time:    2014-09-26 16:51:13 +0200
commit:  f4d09f5c229d6d69ad556e4ae7fcba03a24ee513
author:  Thomas Segismont - tsegismo
message: Bug 1077744 - statement timeout during drift purge job
         Changed the query WHERE clause: don't use NOT IN because NOT
         IN(NULL) will always return false. See
         http://stackoverflow.com/questions/17150208/sql-query-with-not-in-returning-no-rows
         In other words, if any row in RHQ_DRIFT as a NULL value in
         OLD_DRIFT_FILE or NEW_DRIFT_FILE columns, we'll never purge
         anything.
         Besides, the NOT EXISTS variant has a better execution plan.
         Use PurgeTemplate to delete rows in batches and select the keys
         of the rows to delete without locking the drift files table.
         Also, some code cleanup.
         (cherry picked from commit
         3251090d63f58f0da3d604f30cc8f819533e0fcc) Signed-off-by: Libor
         Zoubek <lzoubek>

Comment 9 Simeon Pinder 2014-10-01 21:33:26 UTC
Moving to ON_QA as available for test with build:
https://brewweb.devel.redhat.com/buildinfo?buildID=388959

Comment 11 Filip Brychta 2014-10-30 08:13:47 UTC
Verified on
Version :	
3.3.0.ER04
Build Number :	
99d2107:d7c537e