999165 – Frequent statement timeouts while updating availabilities

Bug 999165 - Frequent statement timeouts while updating availabilities

Summary: Frequent statement timeouts while updating availabilities

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Core Server
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Jay Shaughnessy
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	998945
TreeView+	depends on / blocked

Reported:	2013-08-20 20:38 UTC by John Sanda
Modified:	2015-12-21 13:59 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-12-21 13:59:34 UTC
Embargoed:

Attachments	(Terms of Use)
The server log (14.65 MB, text/plain) 2013-08-20 20:43 UTC, John Sanda	no flags	Details
postgres log (58.43 KB, text/plain) 2013-08-20 20:44 UTC, John Sanda	no flags	Details
View All

Description John Sanda 2013-08-20 20:38:33 UTC

Description of problem:
In a testing environment, I am seeing very regular, frequent statement timeouts. The exceptions in the server log look like,

5:05:28,558 WARN  [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (http-/0.0.0.0:7080-3) SQL Error: 0, SQLState: 57014
15:05:28,558 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (http-/0.0.0.0:7080-3) ERROR: canceling statement due to statement timeout
15:05:28,590 ERROR [org.jboss.as.ejb3.invocation] (http-/0.0.0.0:7080-3) JBAS014134: EJB Invocation failed on component AvailabilityManagerBean for method public abstract void org.rhq.enterprise.server.measurement.AvailabilityManagerLocal.updateLastAvailabilityReportInNewTransaction(int): javax.ejb.EJBException: javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: could not execute statement
...
Caused by: org.postgresql.util.PSQLException: ERROR: canceling statement due to statement timeout

And in the postgres log I am seeing over and over,

ERROR:  canceling statement due to statement timeout
STATEMENT:  update RHQ_AGENT set LAST_AVAILABILITY_REPORT=$1, BACKFILLED=false where ID=$2

This issue is directly related to bug 998945. The build number is 3c8110a. 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 John Sanda 2013-08-20 20:43:45 UTC

Created attachment 788652 [details]
The server log

Comment 2 John Sanda 2013-08-20 20:44:20 UTC

Created attachment 788653 [details]
postgres log

Comment 3 Heiko W. Rupp 2013-09-04 12:00:05 UTC

Is that still an issue or just temporary?

Could this be related to the jdbc driver version that we
use as 9.2-1000 seems to have introduced query timeouts:

http://jdbc.postgresql.org/changes.html#version_9.2-1000

I personally did not see those timeouts yet.

Comment 4 John Sanda 2013-09-04 14:28:24 UTC

I think this is related to bug 1000584. I have not see the timeouts in my local environment. I only encountered them in test environments with less resources (CPU, memory, etc.).

Comment 5 Heiko W. Rupp 2013-12-09 09:16:31 UTC

This was filed before our recent mergeAvail changes.
The GWT times BZ may have solved this.

But as the original description talks about backfilling, it may also be something else. 

Let's see if the recent changes helped to solve this issue.

Comment 6 Jay Shaughnessy 2013-12-09 20:09:01 UTC

commit d01bb14a36a1121d377dc2445d1456c611c37007
Author: Jay Shaughnessy <jshaughn>
Date:   Mon Dec 9 14:13:52 2013 -0500

 I did a visual inspection of code looking for things that may
 contribute to the slow updates and/or timeouts reported in these BZs. In
 general the updates for availReports (infrequent but can spike in
 certain scenarios, 1 per avail report) and availPing
 (frequent, 1 per minute per agent) should be very fast and not in
 themselves be an issue. Slowness in these queries is likely caused by the
 Agent table being locked by some other longer running transaction.
 Everything looked pretty clean with one exception. When backfilling an agent
 due to an agent shutdown or suspect job detection, it looked like we
 may have locked th agent table for the duration of the backfilling, which
 can be a fairly large operation for a big agent inventory.  I'm not sure
 this is the problem, but this commit should allevaite the Agent table
 locking while the backfilling is performed.

 Also:
 - move some inline JPA updates on the Agent to be NamedQuery in the Agent
   entity.  No need to build those every time we update an agent ping time.

Comment 7 Jay Shaughnessy 2015-12-21 13:59:34 UTC

Closing, no follow up after fix 2 years ago.

Note You need to log in before you can comment on or make changes to this bug.