Bug 749707

Summary: Quartz lock error if alert sender waits after scheduleing operation
Product: [Other] RHQ Project Reporter: Steven North <swn>
Component: AlertsAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1CC: hrupp, jshaughn
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-05-08 18:33:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Example alert pluign that illustrates problem. See build.sh to build stand-alone.
none
This is the soruce files and XML. none

Description Steven North 2011-10-28 01:30:41 UTC
Created attachment 530595 [details]
Example alert pluign that illustrates problem.  See build.sh to build stand-alone.

Description of problem:

AN alert that waits after executing an operation on  a resource causes Quartz error.

Version-Release number of selected component (if applicable):

RHQ 4.1

How reproducible:

Build attached alert plugin, load onto RHQ server.  Create an alert definition on a Linux platform resource conditioned on Free Memory > 0 (should alert right away) and diable after one trigger.  Alert sender will schedule "veiwProcessList" operation on the platform resource then wait for 5 minutes.  Quartz will throw execptions like those shown below.  (I tried just waiting, but the schedule operation is needed to cause the problem.)

  
Actual results:


2011-10-27 19:40:44,747 INFO  [org.rhq.enterprise.server.plugin.pc.alert.AlertSenderPluginManager] Server plugin [alert-problem] is loaded but disabled
2011-10-27 19:40:44,747 INFO  [org.rhq.enterprise.server.plugin.pc.MasterServerPluginContainer] Loaded server plugin [alert-problem]
2011-10-27 19:40:44,764 INFO  [org.rhq.enterprise.server.plugin.pc.alert.AlertSenderPluginManager] Enabling server plugin [alert-problem]
2011-10-27 19:40:45,172 INFO  [com.ocsystems.rtiplugin.MyAlertComponent] PROBLEM: init:
2011-10-27 19:40:45,175 INFO  [com.ocsystems.rtiplugin.MyAlertComponent] PROBLEM: start:
2011-10-27 19:40:45,175 INFO  [org.rhq.enterprise.server.plugin.ServerPluginsBean] Server plugin [alert-problem] has been registered in the database
2011-10-27 19:40:45,176 INFO  [org.rhq.enterprise.server.core.plugin.ServerPluginScanner] Registered server plugin [alert-problem], version 1
2011-10-27 19:41:10,560 INFO  [org.rhq.enterprise.server.alert.engine.jms.AlertConditionConsumerBean] AlertCondition[id=10311] has been removed after it was triggered; this message will be discarded
2011-10-27 19:41:23,874 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2011-10-27 19:41:26,265 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2011-10-27 19:41:27,295 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2011-10-27 19:41:27,439 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2011-10-27 19:41:27,677 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2011-10-27 19:41:27,879 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2011-10-27 19:41:27,943 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2011-10-27 19:41:28,315 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2011-10-27 19:41:28,839 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2011-10-27 19:41:29,110 INFO  [com.ocsystems.rtiplugin.MyAlertSender] PROBLEM: send:
2011-10-27 19:41:29,412 INFO  [com.ocsystems.rtiplugin.MyAlertSender] PROBLEM: wait 300000
2011-10-27 19:41:31,637 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2011-10-27 19:41:35,108 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2011-10-27 19:41:57,259 INFO  [org.rhq.enterprise.server.cloud.instance.CacheConsistencyManagerBean] rhel5-64 took [134]ms to reload global cache
2011-10-27 19:41:57,628 INFO  [org.rhq.enterprise.server.cloud.instance.CacheConsistencyManagerBean] rhel5-64 took [348]ms to reload cache for 1 agents
2011-10-27 19:41:59,406 ERROR [org.quartz.core.ErrorLogger] An error occured while releasing trigger 'org.rhq.enterprise.server.scheduler.jobs.SavedSearchResultCountRecalculationJob.org.rhq.enterprise.server.scheduler.jobs.SavedSearchResultCountRecalculationJob'
org.quartz.impl.jdbcjobstore.LockException: Failure obtaining db row lock: ERROR: canceling statement due to statement timeout [See nested exception: org.postgresql.util.PSQLException: ERROR: canceling statement due to statement timeout]
	at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:112)
	at org.quartz.impl.jdbcjobstore.DBSemaphore.obtainLock(DBSemaphore.java:112)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3753)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3722)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.releaseAcquiredTrigger(JobStoreSupport.java:2805)
	at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:305)
Caused by: org.postgresql.util.PSQLException: ERROR: canceling statement due to statement timeout
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
	at org.jboss.resource.adapter.jdbc.CachedPreparedStatement.executeQuery(CachedPreparedStatement.java:90)
	at org.jboss.resource.adapter.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:342)
	at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:92)
	... 5 more
2011-10-27 19:43:18,549 ERROR [org.quartz.impl.jdbcjobstore.JobStoreCMT] MisfireHandler: Error handling misfires: Failure obtaining db row lock: ERROR: canceling statement due to statement timeout
org.quartz.impl.jdbcjobstore.LockException: Failure obtaining db row lock: ERROR: canceling statement due to statement timeout [See nested exception: org.postgresql.util.PSQLException: ERROR: canceling statement due to statement timeout]
	at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:112)
	at org.quartz.impl.jdbcjobstore.DBSemaphore.obtainLock(DBSemaphore.java:112)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.doRecoverMisfires(JobStoreSupport.java:3124)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.manage(JobStoreSupport.java:3887)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.run(JobStoreSupport.java:3907)
Caused by: org.postgresql.util.PSQLException: ERROR: canceling statement due to statement timeout
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
	at org.jboss.resource.adapter.jdbc.CachedPreparedStatement.executeQuery(CachedPreparedStatement.java:90)
	at org.jboss.resource.adapter.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:342)
	at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:92)
	... 4 more
2011-10-27 19:44:59,428 ERROR [org.quartz.core.ErrorLogger] An error occured while releasing trigger 'org.rhq.enterprise.server.scheduler.jobs.SavedSearchResultCountRecalculationJob.org.rhq.enterprise.server.scheduler.jobs.SavedSearchResultCountRecalculationJob'
org.quartz.impl.jdbcjobstore.LockException: Failure obtaining db row lock: ERROR: canceling statement due to statement timeout [See nested exception: org.postgresql.util.PSQLException: ERROR: canceling statement due to statement timeout]
	at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:112)
	at org.quartz.impl.jdbcjobstore.DBSemaphore.obtainLock(DBSemaphore.java:112)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3753)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3722)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.releaseAcquiredTrigger(JobStoreSupport.java:2805)
	at org.quartz.core.QuartzSchedulerThread.releaseTriggerRetryLoop(QuartzSchedulerThread.java:534)
	at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:314)
Caused by: org.postgresql.util.PSQLException: ERROR: canceling statement due to statement timeout
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
	at org.jboss.resource.adapter.jdbc.CachedPreparedStatement.executeQuery(CachedPreparedStatement.java:90)
	at org.jboss.resource.adapter.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:342)
	at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:92)
	... 6 more
2011-10-27 19:46:29,469 INFO  [org.rhq.enterprise.server.util.concurrent.AlertSerializer] tid= 1517270: alertDefinitionId=10301: releasing write lock after being locked for millis=300592
2011-10-27 19:46:29,469 INFO  [org.rhq.enterprise.server.util.concurrent.AlertSerializer] tid= 1517394: alertDefinitionId=10301: acquired write lock in millis=202881
2011-10-27 19:46:29,479 INFO  [org.rhq.enterprise.server.util.concurrent.AlertSerializer] tid= 1517604: alertDefinitionId=10301: acquired write lock in millis=138655
2011-10-27 19:46:29,483 INFO  [org.rhq.enterprise.server.util.concurrent.AlertSerializer] tid= 1517814: alertDefinitionId=10301: acquired write lock in millis=79202
2011-10-27 19:46:29,488 INFO  [org.rhq.enterprise.server.util.concurrent.AlertSerializer] tid= 1518033: alertDefinitionId=10301: acquired write lock in millis=19375
2011-10-27 19:46:35,956 INFO  [STDOUT] (I) Remove RTI metrics snapshot: /var/opt/ocsystems/rti/ee/log/jboss_default/rti-metrics-20111027-193535-11020_rhel5-64.txt

Expected results:


Operation should execute, alert wait, and no error in server log.


Additional info:

Comment 1 Steven North 2011-10-28 01:37:24 UTC
Created attachment 530596 [details]
This is the soruce files and XML.

This is just the source files and XML.

Comment 2 Jay Shaughnessy 2014-05-08 18:33:17 UTC
I believe this was probably fixed when we restructured the transactioning and execution order for alert notifications.