Bug 1084626

Summary: Metrics aggregation deadlocks on single processor machine
Product: [Other] RHQ Project Reporter: John Sanda <jsanda>
Component: Core Server, Storage NodeAssignee: John Sanda <jsanda>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.10CC: hrupp
Target Milestone: GA   
Target Release: RHQ 4.11   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-07-21 10:13:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Sanda 2014-04-04 21:37:42 UTC
Description of problem:
The metrics aggregation that is run during the DataPurgeJob will deadlock on a single processor (having only a single core) machine. The thread pool size used during aggregation is configured as follows,

private int numAggregationWorkers = Math.min(Integer.parseInt(System.getProperty("rhq.metrics.aggregation.workers",
        "4")), Runtime.getRuntime().availableProcessors());

If Runtime.getRuntime().availableProcessors() returns a value of one, then the aggregation will deadlock. BatchAggregationScheduler runs as a thread pool task. It queries metrics_index and schedules aggregation tasks. Reads are throttled using a Semaphore. The relevant code looks like,

for (Row row : rows) {
    aggregationState.getPermits().acquire();
    // schedule tasks...
}

Permits are released by the tasks that get scheduled. With a single thread, those tasks will not run until BatchAggregationScheduler finishes, and it will block indefinitely once there are no more permits.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 John Sanda 2014-04-05 01:37:57 UTC
Note that this bug ONLY affects RHQ 4.10.0.

Comment 2 John Sanda 2014-04-30 13:53:56 UTC
Changes have been pushed to master. We now default to 4 threads regardless of the number of processors. If the user overrides the rhq.metrics.aggregation.workers system property with a value of 1, then we default to 2 to avoid the possible deadlock scenario.

master commit hash: 039044395d

Comment 3 John Sanda 2014-04-30 14:00:57 UTC
I have also updated the RHQ Server Measurement Subsystem resource type in the rhq server plugin. I have added a minimum value constraint for the AggregationWorkers property.

master commit hash: 1beac55fa

Comment 4 Heiko W. Rupp 2014-07-21 10:13:39 UTC
Bulk closing of RHQ 4.11 issues, now that RHQ 4.12 is out.

If you find an issue with those, please open a new BZ, linking to the old one.