Bug 609534
Summary: | Perf: OOME when server is swamped with Calltime data | ||||||
---|---|---|---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Heiko W. Rupp <hrupp> | ||||
Component: | Monitoring | Assignee: | RHQ Project Maintainer <rhq-maint> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 3.0.0 | CC: | ccrouch, jshaughn | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-05-29 21:10:20 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 620933 | ||||||
Attachments: |
|
With the patch I can (with the end_time index present) to 4min intervals, which mean ~ 1.4million values/hour. So I propose including this in the next release. Actually it allows to process ~100k values per minute every minute which accounts for 6 million values per hour on a postgres instance on one laptop hard disk and a RHQ server VM with 384 MB of ram. This is for a 20mins interval now, so no definitive proof, but vm statistics look good. |
Created attachment 428005 [details] Patch mentioned in the description When the server is swamped with call time data (amount depends on JVM size, database etc. ; in my case RHQ server has 384M max heap and I am supplying around 800k-1M values per hour), a little slowness on the DB (for example) makes CT data pile up and the Server yield an OOME. Heap dump shows - Prepared Statement with 27MB in size - 3 http threads with 77 MB of data each. The attached (too simple) patch improves the situation a lot, as smaller chunks of data are sent to the database, so the PS does not get that big. An improved version of the patch would - watch for the size of incoming data and not chop into too small pieces - chunk the data that goes into alert processing as well - null out the already processed data after the previous step to help garbage collection