Bug 1045589
| Summary: | Add auto-throttling support for storage client subsystem | |||
|---|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | John Sanda <jsanda> | |
| Component: | Core Server, Storage Node | Assignee: | RHQ Project Maintainer <rhq-maint> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.9 | CC: | genman, hrupp | |
| Target Milestone: | GA | |||
| Target Release: | RHQ 4.10 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1098243 (view as bug list) | Environment: | ||
| Last Closed: | 2014-04-23 12:30:55 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1098243 | |||
| 
        
          Description
        
        
          John Sanda
        
        
        
        
        
          2013-12-20 19:21:48 UTC
        
       
What I did many years ago for an SMPP server (SMS messages) is create a notion of a window size. This is done for TCP as well. You simply track the number of outstanding requests and then either block or buffer additional requests until the window size goes down. I used JMS at the time to buffer but you can always reply to the client to stop sending more traffic, etc. (Not sure this would really help matters though, unless it was a temporary issue.)
But it seems the driver already has this notion. Wouldn't you simply get an exception immediately if more than, say 1000 async outstanding requests were made?
+        PoolingOptions poolingOptions = builder.poolingOptions();
poolingOptions.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.LOCAL+            option("rhq.storage.client.local-simultaneous", 1000));
poolingOptions.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOT+            option("rhq.storage.client.remote-simultaneous", 1000));
Anyway needs more research I guess.
The solution you described is in effect what RateLimiter does. The driver does not have any built-in throttling. Within the constraints of the PoolingOptions used, the driver will fire off requests as fast it can. This makes it possible and easy for the client to overwhelm Cassandra without any restraint. For RHQ this is a concern particularly when merging measurement reports where we can quickly wind up with lots of concurrent writes. If Cassandra is busy, then the number of outstanding requests will increase, then eventually the driver will indicate Cassandra is saturated and prevent any additional requests. I would consider this 'throttling', like how TCP does throttling using window size, etc. Effectively you are bound by the number of max connections times the number of max simultaneous requests. When all connections are fully busy, then you should see the TimeoutException. I agree that setting up rate limiting on the server side may help deal with temporary spikes in load, i.e. metrics aggregation jobs that can safely span an hour, but does not make sense for insertion of raw metrics IMHO. The primary issue is you don't have any way to identify the correct way to limit rates. If you guess too low, then you end up with a growing backlog. If you guess too high, then you hit the Cassandra/driver limit anyway. What you could do is calculate the job duration for aggregation, and if it is less than < 15 minutes, then add throttling automatically. And visa-versa. Yes I see now that every server thread raising a TimeoutException at the same time will cause trouble and the driver can't tell you when this might happen. Still, I can imagine scenarios where the rate is fine 99% of the time, but then (like you're running 'nodetool repair'), shit hits the fan, so to speak: All the requests back up, clients retry, and repeat until the performance is better. Fundamentally, I see the rate limiter as sort of a hack on top of bad driver design, so I'm not really in favor of it... What you kind of want is a way to detect the driver limit is being reached, then block the calling threads, causing blockage on the client (producer) side. I do agree that the aggregate jobs probably need to be rate limited in some way, just to prevent unnecessary saturation. @Elias, Your analogy with TCP window size is a good one, and RateLimiter is what provides that behavior. Without any throttling, we cannot make any guarantees about request rates. Suppose there is a burst of incoming measurement reports, and at the same time, a REST and/or remote client pushing/pulling a lot of metric data? Or when those measurement reports come in, the driver just about reaches the point of saturation, then there some requests to load metric graph, and this pushes us past the tipping point. What would happen? A lot of requests would wind up failing and overall throughput would suffer. By using RateLimiter, we can make some guarantees around request throughput. And by making it configurable (see bug 1045584), the throughput can be tailored to your environment. Another thing to keep in mind is that for testing purposes I have been working with only a single node. These errors are far less likely to happen in a production deployment with multiple nodes. 
There are a couple of issues with RateLimiter. I'd like to make some suggestions.
1) The metrics index update: If you rate limit within a execution thread pool, you can end up with a lot of threads or queued future tasks. Neither one is great. I've run out of memory in some situations. I think the metrics index update should not require any write permits. There is no feedback mechanism to the client side. Anyway, the rate limiting is already happening on the front end.
2) I would suggest you obtain N permits for N data points, rather than for each data point. The reason is writes can stall in the middle.
@@ -351,8 +351,8 @@ public void addNumericData(final Set<MeasurementDataNumeric> dataSet,
             final long startTime = System.currentTimeMillis();
             final AtomicInteger remainingInserts = new AtomicInteger(dataSet.size());
 
+            writePermits.acquire(dataSet.size());
             for (final MeasurementDataNumeric data : dataSet) {
-                writePermits.acquire();
Changes have finally landed in master. Here is a brief summary of changes. The throttling, which is done using a RateLimiter, has been pushed down into StorageSession. The code base is no longer littered with calls to RateLimter.acquire(). when there is a topology change, (e.g., node added/removed or node up/down), throttling will be adjusted by a configurable amount. If there is a client-side request timeout, throttling will be increased by a configurable amount. The properties that control and tune the throttling are all defined in rhq-server.properties. Just a reminder that bug 1045584 has been created so that these settings will be exposed the resource configuration of the RHQ Server. Here are the newly added properties: rhq.storage.request.limit rhq.storage.request.limit.topology-delta rhq.storage.request.limit.timeout-delta rhq.storage.request.limit.timeout-dampening rhq.storage.request.limit.min The properties are documented in rhq-server.properties. I will also update this design doc[1] as well as add a page in the user docs here[1] [1] https://docs.jboss.org/author/display/RHQ/Request+Throttling [2] https://docs.jboss.org/author/display/RHQ/RHQ+Storage+Cluster+Administration Bulk closing of 4.10 issues. If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10. |