Bug 1126208 - Purge job can hang indefinitely
Summary: Purge job can hang indefinitely
Keywords:
Status: NEW
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server, Storage Node
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHQ 4.13
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1133605
TreeView+ depends on / blocked
 
Reported: 2014-08-03 15:42 UTC by Elias Ross
Modified: 2022-03-31 04:27 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Elias Ross 2014-08-03 15:42:42 UTC
Description of problem:

The purge process can be stuck waiting for a semaphore, if there are problems doing the process, it will effectively "run" (hang) forever.

(Unfortunately I lost the stack trace.)

04:00:00,008 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-1) Data Purge Job STARTING
04:00:00,014 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-1) Measurement data compression starting at Sun Aug 03 04:00:00 UTC 2014
04:00:00,014 INFO  [org.rhq.server.metrics.aggregation.AggregationManager] (RHQScheduler_Worker-1) Starting aggregation for time slice 2014-08-03T03:00:00.000Z
... then nothing is logged

There must be a leaky release().

Version-Release number of selected component (if applicable): 4.12

Comment 1 John Sanda 2014-08-06 01:31:17 UTC
I have done some initial investigation based on some errors provided by Elias. In that case the problem was due to lack of exception handling in Guava's Futures.transform(ListenableFuture, Function) method. The function call is wrapped in an AsyncFunction which lacks exception handling that we have in Futures.transform(ListenableFuture, AsyncFunction). I made some changes[1] to add the necessary exception handling, but there are probably other areas that need to be addressed as well. I think the best thing to do is set an uncaught exception handler so that we can terminate aggregation when any unexpected errors occur.

[1] https://github.com/jsanda/rhq/commit/b2775e5d0621f45df40737b73a9e88ac594fa287


Note You need to log in before you can comment on or make changes to this bug.