Description of problem:
"SELECT bucket, day, partition, collection_time_slice, start_schedule_id, insert_time_slice, schedule_ids " +
"FROM " + MetricsTable.METRICS_CACHE_INDEX + " " +
"WHERE bucket = ? AND day = ? AND partition = ? AND collection_time_slice < ?");
is used to locate metrics needing compression. Unfortunately, it can potentially query a whole's day worth of data, including the schedules (are they needed?) and cause time outs.
Here are my cfstats, for example:
Column Family: metrics_cache_index
SSTable count: 13
Compacted row minimum size: 771
Compacted row maximum size: 268650950
Compacted row mean size: 139821861
Unfortunately the rows (compressed) are around 140MB and there's practically no way to query that much data practically, or so it would seem.
Version-Release number of selected component (if applicable): 4.12
How reproducible: Depends on size of data
Steps to Reproduce:
1. Have a number of Cassandra nodes
2. Insert ~500 metrics per second and have rows grow to ~100MB in size.
3. Take the server offline for a bit
4. Attempt to start the server
Actual results: timeouts, e.g.
21:47:49,557 WARN [org.rhq.enterprise.server.storage.StorageClientManager] (pool-6-thread-1) Storage client subsystem wasn't initialized. The RHQ server will be set to MAINTENANCE mode. Please verify that the storage cluster is operational.: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)
at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69) [cassandra-driver-core-1.0.5.jar:]
at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:269) [cassandra-driver-core-1.0.5.jar:]
at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:183) [cassandra-driver-core-1.0.5.jar:]
at org.rhq.server.metrics.StorageResultSetFuture.get(StorageResultSetFuture.java:57) [rhq-server-metrics-4.12.0.jar:4.12.0]
at org.rhq.server.metrics.MetricsServer.determineMostRecentRawDataSinceLastShutdown(MetricsServer.java:180) [rhq-server-metrics-4.12.0.jar:4.12.0]
at org.rhq.server.metrics.MetricsServer.init(MetricsServer.java:160) [rhq-server-metrics-4.12.0.jar:4.12.0]
at org.rhq.enterprise.server.storage.StorageClientManager.initMetricsServer(StorageClientManager.java:567) [rhq-server.jar:4.12.0]
at org.rhq.enterprise.server.storage.StorageClientManager.init(StorageClientManager.java:186) [rhq-server.jar:4.12.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0]
Expected results: server functions
The metrics_cache_index table does not contain any metric data. It stores schedule ids and two or three different timestamps. It is a replacement of the metrics_index table from pre-4.12. The concern Elias raises about the size of a partition is very valid. It was a concern with the former metrics_index table and even more so with the new metrics_cache_index table.
I think we might need to consider some changes to cap the size of the rows in the index table. Let me explain with an example for aggregating data from the current time slice. This example is applicable to both 4.12 as well earlier versions of RHQ. Suppose we have N schedules with data to be aggregated. The index partition (from metrics_cache_index or from metrics_index) will contain N rows. We load all of those rows in a single query. As N gets really big, it can create hot spots on a node and make us more susceptible to read timeouts.
Since schedule ids are monotonically increasing integers, we can easily implement paging to reduce the likelihood of read timeouts. That does not address the issue of partitions being big, i.e., really wide rows. We could break up the single partition into multiple partitions where schedule id offsets are part of the partition key. This means more reads during aggregation but I think it can effectively prevent the problems that Elias is experiencing.
I am closing this because the determineMostRecentRawDataSinceLastShutdown method has been removed as part of the work for bug 1114202.