Bug 1479339
Summary: | Memory leak in MetricsProcessor Worker | |||
---|---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | Archit Sharma <arcsharm> | |
Component: | Performance | Assignee: | Nick LaMuro <nlamuro> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Tasos Papaioannou <tpapaioa> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 5.8.0 | CC: | abellott, bsorota, dajohnso, dmetzger, epacific, fsimonce, hroy, jhardy, mburman, obarenbo, pmcgowan, psuriset, simaishi, tpapaioa, yzamir | |
Target Milestone: | GA | Keywords: | TestOnly | |
Target Release: | cfme-future | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | c&u:worker:perf | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | 1456775 | |||
: | 1479356 (view as bug list) | Environment: | ||
Last Closed: | 2018-07-12 17:44:26 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | CFME Core | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1456775 | |||
Bug Blocks: | 1479356 | |||
Attachments: |
Description
Archit Sharma
2017-08-08 11:50:24 UTC
Created attachment 1310597 [details]
Generic worker leakage w.r.t stable processor queue and powered on/off vms
Created attachment 1310598 [details]
all worker types memory usage comparison
To further add to 'steps to reproduce' in description, I had increased memory thresholds / counts for specific worker processes on all appliances, just enough to accommodate those many VMs for a 6 appliance setup. For reference:- ---- # DB - Generic - 2, 500 MB - Priority - 2, 600 MB ---- # Worker appliances - Generic - 4, 500 MB - Priority - 2, 800 MB - C&U Data Collectors - 6, 600 MB - C&U Data Processors - 4, 800 MB - Refresh - 2 GB ---- The refresh worker's (leaked?) memory grew by few MBs. Its RSS memory growth is included in the attachment https://bugzilla.redhat.com/attachment.cgi?id=1310598 Created attachment 1311340 [details]
PSS & RSS utilization - 4+ day test run
Worker Config:
Single Metrics Processor Worker
1.5Gb Memory Threshold
Provider:
Clusters: 10
Hosts: 50
Datastores: 61
VMs: 1,000
Type: VMware VC 5.5.0
I think based on some talks with Dennis regarding similar tickets, I think enabling the metrics collection is the root cause to some of the "leaks" that we are seeing. Most of my commenting will probably be done on: https://bugzilla.redhat.com/show_bug.cgi?id=1458392 Will update here when I have more to share. A possible fix has been proposed in this related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1535720 That is targeted for the MiqServer, and high confidence that it will fix the leak there. Updates will probably happen there more regularly until we determine if there is a different leak in the MetricsProcessor Worker, and there is a high probability this was a leak across all workers. The fix above has been backported to 5.8: https://bugzilla.redhat.com/show_bug.cgi?id=1536672 As well as for future releases here: https://bugzilla.redhat.com/show_bug.cgi?id=1535720 We are going to do some testing ourselves to see if this is fixing the issue with the MetricsProcessor as well, and will update with those results. Update: We are relatively sure that this leak will be resolved with the patch provided in https://bugzilla.redhat.com/show_bug.cgi?id=1535720 (or the respective backported version), so this might already be fixed. That said, we are doing some final long term comparisons with our test environments to confirm that the systems that had the patch applied and displayed no leak, will start leaking once the patch is removed. We are confident this patch fixes the leak with MiqServer, but want to be confident in saying this is the same with the other workers as well, and that there isn't possibly another leak at play here. Next update will be roughly in a week's time. After testing on a pair of appliances for about a week, we are fairly confident that this has a substantial impact to the memory footprint of all the workers, including the MetricsProcessorWorker, as mentioned here. Please retest with the changes in place, and if the issue persists, feel free to kick the ticket back so we can look into it further. *** Bug 1511897 has been marked as a duplicate of this bug. *** Verified. |