We're currently using the REST API, and have settled on a limited call rate that seems to work. However, going forward it would be useful to have some documented rate limits that are supported by Zanata and EngOps that we could work against.
Assigning to Damian for triage.
This needs to be a global thing - if at all possible we'd want to see where client and web GUI are used most and by whom, in order to schedule improvement the high use areas and for debugging purposes.
I can't find the previous discussion about this, but I did find the leaky bucket implementation: https://github.com/bbeck/token-bucket <dependency> <groupId>org.isomorphism</groupId> <artifactId>token-bucket</artifactId> <version>1.1</version> </dependency>
Secondary contact: sflaniga
The list of "practical considerations" in the article Patrick found is worth reading: http://amistrongeryet.blogspot.com.au/2011/01/rate-limiting-in-150-lines-or.html In particular, we should ensure we implement: - multiple buckets (ie one per API key, as we already planned) - sparse buckets (or periodic cleanup of full buckets, perhaps hourly) - dynamic configuration (eg notify existing buckets when the limits are changed) - Monitoring and introspection (or at least some *terse* logging when limits are enforced, or a simple REST service for admin which lets us query the bucket sizes) Also, I think we should also be able to change the bucket size via an admin REST API, in case the web UI is slow to load. (I think we discussed that point somewhere.)
After a chat with Lee today, we think the performance problems associated with PressGang syncs in the past may have been due to triggering a connection leak in Zanata when under load. The fact that sync was running for about a month before Zanata became unresponsive is strongly reminiscent of https://bugzilla.redhat.com/show_bug.cgi?id=1018633 Now we were looking at some sort of extra logic to use with the leaky bucket, perhaps based on charging API users a number of tokens calculated from the cost of the *previous* API call (the "post-paid" model, like a phone bill), or perhaps based on charging API users a large number of tokens at the beginning of each request, but then rebating a proportion of them after processing, based on the actual processing costs (the "rental deposit" model). (The capacity of the bucket divided by the upfront token cost would determine the maximum number of simultaneous API calls.) However, given what we now know (or think we do), perhaps the following solution would be a little more straightforward, whilst better addressing our expected problem areas: * First semaphore (eg 6 permits). Using Semaphone.tryAcquire, if a REST request doesn't obtain a permit immediately, we immediately return a 503. This semaphone prevents any user from tying up more than 5 REST threads. Few legitimate users will use more than 6 simultaneous requests, so 503s should be rare. This may help (a very little) with DoS attacks, at least accidental ones. (For any chance at real DoS protection, you would need to prevent requests from even reaching the app server, but that's out of scope here.) * Second semaphone (eg 3 permits). Using Semaphone.acquire, if a request doesn't obtain a permit immediately, it will block. This semaphone prevents any user from actively using more than 3 REST threads for processing or database I/O. (Note: if someone does submit 6 simultaneous requests, 3 of them will block, leading to 3 idle threads in addition to the 3 active threads.) * Token bucket (eg capacity: 100 tokens, refill: 100/sec). Each request simply consumes one token. This bucket will be given a generous capacity by default, but it could be drastically reduced if we ever need to mitigate another concurrency problem like "PressGang sync makes Zanata unresponsive". By reducing the refill rate to 1 or 2 tokens per second, we could force an API user to run as slowly as PressGang sync currently does. (According to Lee, PressGang's API calls take 200-300ms each, plus a 300ms pause between requests.) (Obviously, all the suggested permit and token capacities are subject to tuning.) One scenario not addressed by this solution is the problem of expensive API calls, like "export the whole database with TMX" which might take 30 minutes to execute. However, we don't have any evidence that users like to export the whole database twice per hour, or that this can cause Zanata to become unresponsive. (If Zanata can survive 30 minutes of database activity, would pausing for 30 seconds or 30 minutes afterwards really help?) Trying to limit TMX exports is more likely to cause problems for users than it is to prevent them. So it's probably not worth implementing (and testing!) the "post-paid" or "rental deposit" solutions at this stage, but a couple of semaphores might save our bacon at some point.
script to simulate skynet load (GET all translations for all locales) https://github.com/zanata/zanata-scripts/blob/master/getTranslationLoadTest.groovy
https://github.com/zanata/zanata-server/pull/390
at the end we decided to only limit concurrent requests (two semaphores) not limit the rate.
A few questions: 1. What's the default limit? 2. Where can I configure the default limit, is it in a configuration file, or it is hard coded?
1. Six concurrent requests per API key, but Zanata will only work on two of them at a time (the others will be queued). 2. The default limits come from ApplicationConfiguration.java, but they can be changed in the database, using the Server Config admin page.
Verified.