Red Hat Bugzilla – Bug 609534
Perf: OOME when server is swamped with Calltime data
Last modified: 2014-05-29 17:10:20 EDT
Created attachment 428005 [details]
Patch mentioned in the description
When the server is swamped with call time data (amount depends on JVM size, database etc. ; in my case RHQ server has 384M max heap and I am supplying around 800k-1M values per hour),
a little slowness on the DB (for example) makes CT data pile up and the Server yield an OOME.
Heap dump shows
- Prepared Statement with 27MB in size
- 3 http threads with 77 MB of data each.
The attached (too simple) patch improves the situation a lot, as smaller chunks of data are sent to the database, so the PS does not get that big.
An improved version of the patch would
- watch for the size of incoming data and not chop into too small pieces
- chunk the data that goes into alert processing as well
- null out the already processed data after the previous step to help garbage collection
With the patch I can (with the end_time index present) to 4min intervals, which mean ~ 1.4million values/hour. So I propose including this in the next release.
Actually it allows to process ~100k values per minute every minute which accounts for 6 million values per hour on a postgres instance on one laptop hard disk and a RHQ server VM with 384 MB of ram.
This is for a 20mins interval now, so no definitive proof, but vm statistics look good.