Bug 1045589

Summary:	Add auto-throttling support for storage client subsystem
Product:	[Other] RHQ Project	Reporter:	John Sanda <jsanda>
Component:	Core Server, Storage Node	Assignee:	RHQ Project Maintainer <rhq-maint>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Mike Foley <mfoley>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.9	CC:	genman, hrupp
Target Milestone:	GA
Target Release:	RHQ 4.10
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1098243 (view as bug list)		Environment:
Last Closed:	2014-04-23 12:30:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1098243

Description John Sanda 2013-12-20 19:21:48 UTC

Description of problem:
The changes for bug 1009945 put a new throttling mechanism in place for the storage client. Previously, only the insertion of raw metrics was throttled since that was the only place where async requests were used. We should implement an auto-throttling strategy that is configurable through the RHQ plugin (see bug 1045584).

If the storage cluster cannot keep up with the requests being submitted by an RHQ server and we consequently start hitting timeout exceptions, then we should increase the throttling to avoid further request timeouts. We might also want to have a pre-defined alert definition for this as well.

When the storage cluster grows, it stands to reason that we ought to decrease the throttling to improve throughput. And if the cluster decreases in size, then we ought to increase the throttling. Again, all of this should be configurable in some way since the performance characteristics will vary from environment to environment.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Elias Ross 2014-01-09 19:47:52 UTC

What I did many years ago for an SMPP server (SMS messages) is create a notion of a window size. This is done for TCP as well. You simply track the number of outstanding requests and then either block or buffer additional requests until the window size goes down. I used JMS at the time to buffer but you can always reply to the client to stop sending more traffic, etc. (Not sure this would really help matters though, unless it was a temporary issue.)

But it seems the driver already has this notion. Wouldn't you simply get an exception immediately if more than, say 1000 async outstanding requests were made?

+        PoolingOptions poolingOptions = builder.poolingOptions();
poolingOptions.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.LOCAL+            option("rhq.storage.client.local-simultaneous", 1000));
poolingOptions.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOT+            option("rhq.storage.client.remote-simultaneous", 1000));

Anyway needs more research I guess.

Comment 2 John Sanda 2014-01-09 20:15:57 UTC

The solution you described is in effect what RateLimiter does. The driver does not have any built-in throttling. Within the constraints of the PoolingOptions used, the driver will fire off requests as fast it can. This makes it possible and easy for the client to overwhelm Cassandra without any restraint. For RHQ this is a concern particularly when merging measurement reports where we can quickly wind up with lots of concurrent writes.

Comment 3 Elias Ross 2014-01-09 20:29:34 UTC

If Cassandra is busy, then the number of outstanding requests will increase, then eventually the driver will indicate Cassandra is saturated and prevent any additional requests. I would consider this 'throttling', like how TCP does throttling using window size, etc. Effectively you are bound by the number of max connections times the number of max simultaneous requests. When all connections are fully busy, then you should see the TimeoutException.

I agree that setting up rate limiting on the server side may help deal with temporary spikes in load, i.e. metrics aggregation jobs that can safely span an hour, but does not make sense for insertion of raw metrics IMHO.

The primary issue is you don't have any way to identify the correct way to limit rates. If you guess too low, then you end up with a growing backlog. If you guess too high, then you hit the Cassandra/driver limit anyway.

What you could do is calculate the job duration for aggregation, and if it is less than < 15 minutes, then add throttling automatically. And visa-versa.

Comment 4 Elias Ross 2014-01-10 19:11:31 UTC

Yes I see now that every server thread raising a TimeoutException at the same time will cause trouble and the driver can't tell you when this might happen.

Still, I can imagine scenarios where the rate is fine 99% of the time, but then (like you're running 'nodetool repair'), shit hits the fan, so to speak: All the requests back up, clients retry, and repeat until the performance is better. Fundamentally, I see the rate limiter as sort of a hack on top of bad driver design, so I'm not really in favor of it...

What you kind of want is a way to detect the driver limit is being reached, then block the calling threads, causing blockage on the client (producer) side.

I do agree that the aggregate jobs probably need to be rate limited in some way, just to prevent unnecessary saturation.

Comment 5 John Sanda 2014-01-10 21:49:50 UTC

@Elias,

Your analogy with TCP window size is a good one, and RateLimiter is what provides that behavior. Without any throttling, we cannot make any guarantees about request rates. Suppose there is a burst of incoming measurement reports, and at the same time, a REST and/or remote client pushing/pulling a lot of metric data? Or when those measurement reports come in, the driver just about reaches the point of saturation, then there some requests to load metric graph, and this pushes us past the tipping point. What would happen? A lot of requests would wind up failing and overall throughput would suffer.

By using RateLimiter, we can make some guarantees around request throughput. And by making it configurable (see bug 1045584), the throughput can be tailored to your environment. Another thing to keep in mind is that for testing purposes I have been working with only a single node. These errors are far less likely to happen in a production deployment with multiple nodes.

Comment 6 Elias Ross 2014-01-15 20:10:09 UTC

There are a couple of issues with RateLimiter. I'd like to make some suggestions.

1) The metrics index update: If you rate limit within a execution thread pool, you can end up with a lot of threads or queued future tasks. Neither one is great. I've run out of memory in some situations. I think the metrics index update should not require any write permits. There is no feedback mechanism to the client side. Anyway, the rate limiting is already happening on the front end.

2) I would suggest you obtain N permits for N data points, rather than for each data point. The reason is writes can stall in the middle.

@@ -351,8 +351,8 @@ public void addNumericData(final Set<MeasurementDataNumeric> dataSet,
             final long startTime = System.currentTimeMillis();
             final AtomicInteger remainingInserts = new AtomicInteger(dataSet.size());
 
+            writePermits.acquire(dataSet.size());
             for (final MeasurementDataNumeric data : dataSet) {
-                writePermits.acquire();

Comment 7 John Sanda 2014-02-10 16:58:08 UTC

Changes have finally landed in master. Here is a brief summary of changes. The throttling, which is done using a RateLimiter, has been pushed down into StorageSession. The code base is no longer littered with calls to RateLimter.acquire(). when there is a topology change, (e.g., node added/removed or node up/down), throttling will be adjusted by a configurable amount. If there is a client-side request timeout, throttling will be increased by a configurable amount. The properties that control and tune the throttling are all defined in rhq-server.properties. Just a reminder that bug 1045584 has been created so that these settings will be exposed the resource configuration of the RHQ Server. Here are the newly added properties:

rhq.storage.request.limit
rhq.storage.request.limit.topology-delta
rhq.storage.request.limit.timeout-delta
rhq.storage.request.limit.timeout-dampening
rhq.storage.request.limit.min


The properties are documented in rhq-server.properties. I will also update this design doc[1] as well as add a page in the user docs here[1]

[1] https://docs.jboss.org/author/display/RHQ/Request+Throttling
[2] https://docs.jboss.org/author/display/RHQ/RHQ+Storage+Cluster+Administration

Comment 8 Heiko W. Rupp 2014-04-23 12:30:55 UTC

Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.