Bug 1127921 - Metrics aggregation does not retry after timeout
Summary: Metrics aggregation does not retry after timeout
Keywords:
Status: NEW
Alias: None
Product: RHQ Project
Classification: Other
Component: Storage Node
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified vote
Target Milestone: ---
: ---
Assignee: RHQ Project Maintainer
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1133605
TreeView+ depends on / blocked
 
Reported: 2014-08-07 20:32 UTC by Elias Ross
Modified: 2014-08-25 14:47 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)

Description Elias Ross 2014-08-07 20:32:54 UTC
Description of problem:

If there is a timeout doing the query to Cassandra for past data, then what happens is the aggregation completes prematurely, although it seems to at least complete 500 or so schedules.

20:01:00,152 WARN  [org.rhq.server.metrics.aggregation.PastDataAggregator] (RHQScheduler_Worker-5) There was an error querying the cache index: org.rhq.server.metrics.aggregation
.CacheIndexQueryException: Failed to load cache index entries prior to current time slice 2014-08-07T19:00:00.000Z
        at org.rhq.server.metrics.aggregation.IndexEntriesLoader.loadPastIndexEntries(IndexEntriesLoader.java:72) [rhq-server-metrics-4.12.0.jar:4.12.0]
        at org.rhq.server.metrics.aggregation.PastDataAggregator.getIndexEntries(PastDataAggregator.java:75) [rhq-server-metrics-4.12.0.jar:4.12.0]
        at org.rhq.server.metrics.aggregation.BaseAggregator.execute(BaseAggregator.java:168) [rhq-server-metrics-4.12.0.jar:4.12.0]
        at org.rhq.server.metrics.aggregation.AggregationManager.run(AggregationManager.java:107) [rhq-server-metrics-4.12.0.jar:4.12.0]
        at org.rhq.server.metrics.MetricsServer.calculateAggregates(MetricsServer.java:641) [rhq-server-metrics-4.12.0.jar:4.12.0]
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responde
d)
        at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69) [cassandra-driver-core-1.0.5.jar:]
        at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:269) [cassandra-driver-core-1.0.5.jar:]
        at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:183) [cassandra-driver-core-1.0.5.jar:]
        at org.rhq.server.metrics.StorageResultSetFuture.get(StorageResultSetFuture.java:57) [rhq-server-metrics-4.12.0.jar:4.12.0]
        at org.rhq.server.metrics.aggregation.IndexEntriesLoader.addResultSet(IndexEntriesLoader.java:116) [rhq-server-metrics-4.12.0.jar:4.12.0]
        at org.rhq.server.metrics.aggregation.IndexEntriesLoader.loadPastIndexEntries(IndexEntriesLoader.java:68) [rhq-server-metrics-4.12.0.jar:4.12.0]
        ... 9 more
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)
...
20:01:06,571 INFO  [org.rhq.server.metrics.aggregation.AggregationManager] (RHQScheduler_Worker-5) Finished aggregation of {"raw schedules": 500, "1 hour schedules": 0, "6 hour schedules": 0} in 66542 ms
20:01:06,571 INFO  [org.rhq.server.metrics.MetricsServer] (RHQScheduler_Worker-5) Finished metrics aggregation in 66543 ms

Expected behavior: It probably makes sense to continue the rest of aggregation at least. It might make sense to simply retry the query a few times, or possibly the results are simply too large.

Version-Release number of selected component (if applicable): 4.12


Note You need to log in before you can comment on or make changes to this bug.