Bug 1120775
| Summary: | Handle inconsistent reads during data aggregation | ||
|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | John Sanda <jsanda> |
| Component: | Core Server, Storage Node | Assignee: | Nobody <nobody> |
| Status: | NEW --- | QA Contact: | |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.12 | CC: | genman, hrupp |
| Target Milestone: | --- | ||
| Target Release: | RHQ 4.13 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1133605, 1120441, 1120442 | ||
|
Description
John Sanda
2014-07-17 16:00:23 UTC
A slight correction from the description. I meant to say that we *can* do quorum reads to ensure consistency; however, that is not even always the case. With our current replication strategy, described at https://docs.jboss.org/author/display/RHQ/Data+Replication+and+Consistency, quorum reads would ensure consistency only for a two or three node cluster. For four or more nodes, we would need to use a CL of 3 or all. So yes, I have a three node cluster and the behavior can be inconsistent across hosts.
I've also seen this:
23:00:36,561 WARN [org.rhq.server.metrics.StorageSession] (RHQScheduler_Worker-1) Encountered NoHostAvailableException due to following error(s): {/17.176.20
8.117=Timeout during read, /17.176.208.118=Timeout during read, /17.176.208.119=Timeout during read}
23:00:36,562 INFO [org.rhq.server.metrics.StorageSession] (RHQScheduler_Worker-1) Changing request throughput from 90000.0 request/sec to 90000.0 requests/se
c
23:00:36,562 WARN [org.rhq.server.metrics.aggregation.PastDataAggregator] (RHQScheduler_Worker-1) There was an error querying the cache index: org.rhq.server
.metrics.aggregation.CacheIndexQueryException: Failed to load cache index entries prior to current time slice 2014-07-17T22:00:00.000Z
at org.rhq.server.metrics.aggregation.IndexEntriesLoader.loadPastIndexEntries(IndexEntriesLoader.java:72) [rhq-server-metrics-4.12.0.jar:4.12.0]
at org.rhq.server.metrics.aggregation.PastDataAggregator.getIndexEntries(PastDataAggregator.java:74) [rhq-server-metrics-4.12.0.jar:4.12.0]
at org.rhq.server.metrics.aggregation.BaseAggregator.execute(BaseAggregator.java:167) [rhq-server-metrics-4.12.0.jar:4.12.0]
at org.rhq.server.metrics.aggregation.AggregationManager.run(AggregationManager.java:101) [rhq-server-metrics-4.12.0.jar:4.12.0]
at org.rhq.server.metrics.MetricsServer.calculateAggregates(MetricsServer.java:619) [rhq-server-metrics-4.12.0.jar:4.12.0]
at org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob.compressMeasurementData(DataPurgeJob.java:114) [rhq-server.jar:4.12.0]
at org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob.executeJobCode(DataPurgeJob.java:92) [rhq-server.jar:4.12.0]
at org.rhq.enterprise.server.scheduler.jobs.AbstractStatefulJob.execute(AbstractStatefulJob.java:48) [rhq-server.jar:4.12.0]
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-1.6.5.jar:1.6.5]
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) [quartz-1.6.5.jar:1.6.5]
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /17.176.208.117 (Timeout during read), /17
.176.208.118 (Timeout during read), /17.176.208.119 (Timeout during read))
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:64) [cassandra-driver-core-1.0.5.jar:]
at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:269) [cassandra-driver-core-1.0.5.jar:]
at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:183) [cassandra-driver-core-1.0.5.jar:]
at org.rhq.server.metrics.StorageResultSetFuture.get(StorageResultSetFuture.java:57) [rhq-server-metrics-4.12.0.jar:4.12.0]
at org.rhq.server.metrics.aggregation.IndexEntriesLoader.addResultSet(IndexEntriesLoader.java:116) [rhq-server-metrics-4.12.0.jar:4.12.0]
at org.rhq.server.metrics.aggregation.IndexEntriesLoader.loadPastIndexEntries(IndexEntriesLoader.java:68) [rhq-server-metrics-4.12.0.jar:4.12.0]
... 9 more
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /17.176.208.117 (Timeout during read), /17
.176.208.118 (Timeout during read), /17.176.208.119 (Timeout during read))
at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:106) [cassandra-driver-core-1.0.5.jar:]
at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:177) [cassandra-driver-core-1.0.5.jar:]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_40]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_40]
at java.lang.Thread.run(Thread.java:724) [rt.jar:1.7.0_40]
|