Bug 1458392 - MetricsCollectorWorker memory exceeded and memory threshold
Summary: MetricsCollectorWorker memory exceeded and memory threshold
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Performance
Version: 5.7.0
Hardware: All
OS: All
high
high
Target Milestone: GA
: 5.7.5
Assignee: Nick LaMuro
QA Contact: Tasos Papaioannou
URL:
Whiteboard: perf:c&u
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-02 18:08 UTC by Ryan Spagnola
Modified: 2021-06-10 12:24 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-20 16:03:05 UTC
Category: ---
Cloudforms Team: CFME Core
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Infrastructure_Providers_2017_06_13 (1.50 KB, text/plain)
2017-06-13 20:44 UTC, Ryan Spagnola
no flags Details
Chart of Metrics Collector PSS Usage (from log data) (21.96 KB, image/png)
2017-06-22 17:40 UTC, dmetzger
no flags Details

Description Ryan Spagnola 2017-06-02 18:08:30 UTC
Description of problem:
MetricsCollectorWorker is exceeded memory threshold and it is already set to the max of 1.5 GB.  

Version-Release number of selected component (if applicable):
5.7.2.1

How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 Ryan Spagnola 2017-06-13 20:44:22 UTC
Created attachment 1287449 [details]
Infrastructure_Providers_2017_06_13

Comment 13 dmetzger 2017-06-22 17:40:57 UTC
Created attachment 1290791 [details]
Chart of Metrics Collector PSS Usage (from log data)

Comment 24 dmetzger 2017-08-07 17:45:17 UTC
*** Bug 1456775 has been marked as a duplicate of this bug. ***

Comment 25 Nick LaMuro 2017-08-11 21:18:23 UTC
Currently, four different proposals for memory reduction have been created:

https://github.com/ManageIQ/manageiq/pull/15757
https://github.com/ManageIQ/manageiq/pull/15791
https://github.com/ManageIQ/more_core_extensions/pull/54
https://github.com/ManageIQ/more_core_extensions/pull/55

All slowly chip away at some of the extraneous objects being created by the MetricsCollector.  Hoping to get some measurements done with the above for patched in on a test appliance to see if more is needed to be done for the time being.

Note:  While the last two have been merged, they still need to be integrated into ManageIQ, so all of the above are still pending any kind of integration.


-Nick

Comment 28 dmetzger 2017-08-29 12:33:19 UTC
Agreed, the changes provided by Nick thus far do not alleviate the memory leak being experienced by the worker.

Comment 36 Nick LaMuro 2018-01-12 15:57:35 UTC
A "band-aid" fix has been applied, and will most likely be backported:

https://github.com/ManageIQ/manageiq/pull/16807

For this BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1533484


That should mitigate the affects of this BZ.  This can most likely be reduced in severity, but not closed, since solving it still makes sense.


This assumes that the theory that the MetricsCollectorWorker leak is related to the MiqServer leak that is currently being investigated more closely at the moment.

Comment 38 Nick LaMuro 2018-01-18 16:06:33 UTC
A possible fix has been proposed in this related BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1535720


That is targeted for the MiqServer, and high confidence that it will fix the leak there.  Updates will probably happen there more regularly until we determine if there is a different leak in the MetricsCollectorWorker, and there is a high probability this was a leak across all workers.

Comment 39 Nick LaMuro 2018-01-19 23:50:20 UTC
The fix above has been backported here:

https://bugzilla.redhat.com/show_bug.cgi?id=1536692

We are going to do some testing ourselves to see if this is fixing the issue with the MetricsCollectorWorker as well, and will update with those results.

Comment 40 Nick LaMuro 2018-02-01 22:46:09 UTC
Update:

We are relatively sure that this leak will be resolved with the patch provided in https://bugzilla.redhat.com/show_bug.cgi?id=1535720 (or the respective backported version), so this might already be fixed.

That said, we are doing some final long term comparisons with our test environments to confirm that the systems that had the patch applied and displayed no leak, will start leaking once the patch is removed.  We are confident this patch fixes the leak with MiqServer, but want to be confident in saying this is the same with the other workers as well, and that there isn't possibly another leak at play here.

Next update will be roughly in a week's time.

Comment 41 Keenan Brock 2018-02-07 13:45:56 UTC
Ryan,

Would you be able to run this with the latest code to see if your issue is fixed?
When I ran them, this memory leak seemed to be resolved.

Keenan

Comment 47 Satoe Imaishi 2018-02-26 17:26:32 UTC
Marking as TestOnly, as the fix wasn't specific to MetricsCollectorWorker and fix/hotfix for generic memory fix is tracked in bug #1535720 and its clones for all versions.


Note You need to log in before you can comment on or make changes to this bug.