Bug 1125470

Summary: Allow setting of socket options for Cassandra, i.e. read timeout
Product: [Other] RHQ Project Reporter: Elias Ross <genman>
Component: Core Server, DatabaseAssignee: Nobody <nobody>
Status: NEW --- QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.12CC: hrupp
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Elias Ross 2014-08-01 00:53:43 UTC
Description of problem:

Several of the large queries can time out on a under-performing cluster. I manually increased the read timeout to good effect:

diff --git a/modules/common/cassandra-util/src/main/java/org/rhq/cassandra/util/ClusterBuilder.java b/modules/common/cassandra-util/src/main/java/org/rhq/cassandra/util/ClusterBu
index 335213d..47b4ff3 100644
--- a/modules/common/cassandra-util/src/main/java/org/rhq/cassandra/util/ClusterBuilder.java
+++ b/modules/common/cassandra-util/src/main/java/org/rhq/cassandra/util/ClusterBuilder.java
@@ -28,6 +28,7 @@
 import com.datastax.driver.core.Cluster;
 import com.datastax.driver.core.PoolingOptions;
 import com.datastax.driver.core.ProtocolOptions;
+import com.datastax.driver.core.SocketOptions;
 import com.datastax.driver.core.policies.LoadBalancingPolicy;
 import com.datastax.driver.core.policies.RetryPolicy;
 
@@ -46,6 +47,12 @@
 
     private ProtocolOptions.Compression compression;
 
+    public ClusterBuilder() {
+        SocketOptions options = new SocketOptions();
+        options.setReadTimeoutMillis(1000 * 60); // from 12 seconds
+        builder.withSocketOptions(options);
+    }
+
     /**
      * @see Cluster.Builder#addContactPoints(String...)
      */

This should be configurable through a system property, if possible.

17:36:57,316 TRACE [com.datastax.driver.core.Connection] (New I/O worker #56) [/17.176.208.118-1] received: RESULT PREPARED 779e6af0f062c6ebbf421df943f9b816 [bucket(rhq, metrics_
cache_index), org.apache.cassandra.db.marshal.UTF8Type][day(rhq, metrics_cache_index), org.apache.cassandra.db.marshal.DateType][partition(rhq, metrics_cache_index), org.apache.cassandra.db.marshal.Int32Type][collection_time_slice(rhq, metrics_cache_index), org.apache.cassandra.db.marshal.DateType]

Query:

        at org.rhq.server.metrics.MetricsServer.determineMostRecentRawDataSinceLastShutdown(MetricsServer.java:197) [rhq-server-metrics-4.12.0.jar:4.12.0]
        at org.rhq.server.metrics.MetricsServer.init(MetricsServer.java:160) [rhq-server-metrics-4.12.0.jar:4.12.0]
        at org.rhq.enterprise.server.storage.StorageClientManager.initMetricsServer(StorageClientManager.java:567) [rhq-server.jar:4.12.0]
        at org.rhq.enterprise.server.storage.StorageClientManager.init(StorageClientManager.java:186) [rhq-server.jar:4.12.0]

This was at startup time.

This is the query that could complete in more in time, but often failed to, causing a loop of timeouts. It seemed like the data was coming but the query result size was too huge to quickly be returned.

There would be less issue when Cassandra 2.0 is released to support data streaming, I suppose.

Version-Release number of selected component (if applicable): 4.12

How reproducible: Depending on load.

Comment 1 Elias Ross 2014-08-07 02:59:22 UTC
With removing determineMostRecentRawDataSinceLastShutdown in changes John Sanda proposed, this probably isn't needed as none of the reads seem to time out in this way anymore. Just leaving this open as a possible config feature but unclear if it is worth exposing.