Description of problem: MetricsCollectorWorker is exceeded memory threshold and it is already set to the max of 1.5 GB. Version-Release number of selected component (if applicable): 5.7.2.1 How reproducible: always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1287449 [details] Infrastructure_Providers_2017_06_13
Created attachment 1290791 [details] Chart of Metrics Collector PSS Usage (from log data)
*** Bug 1456775 has been marked as a duplicate of this bug. ***
Currently, four different proposals for memory reduction have been created: https://github.com/ManageIQ/manageiq/pull/15757 https://github.com/ManageIQ/manageiq/pull/15791 https://github.com/ManageIQ/more_core_extensions/pull/54 https://github.com/ManageIQ/more_core_extensions/pull/55 All slowly chip away at some of the extraneous objects being created by the MetricsCollector. Hoping to get some measurements done with the above for patched in on a test appliance to see if more is needed to be done for the time being. Note: While the last two have been merged, they still need to be integrated into ManageIQ, so all of the above are still pending any kind of integration. -Nick
Agreed, the changes provided by Nick thus far do not alleviate the memory leak being experienced by the worker.
A "band-aid" fix has been applied, and will most likely be backported: https://github.com/ManageIQ/manageiq/pull/16807 For this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1533484 That should mitigate the affects of this BZ. This can most likely be reduced in severity, but not closed, since solving it still makes sense. This assumes that the theory that the MetricsCollectorWorker leak is related to the MiqServer leak that is currently being investigated more closely at the moment.
A possible fix has been proposed in this related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1535720 That is targeted for the MiqServer, and high confidence that it will fix the leak there. Updates will probably happen there more regularly until we determine if there is a different leak in the MetricsCollectorWorker, and there is a high probability this was a leak across all workers.
The fix above has been backported here: https://bugzilla.redhat.com/show_bug.cgi?id=1536692 We are going to do some testing ourselves to see if this is fixing the issue with the MetricsCollectorWorker as well, and will update with those results.
Update: We are relatively sure that this leak will be resolved with the patch provided in https://bugzilla.redhat.com/show_bug.cgi?id=1535720 (or the respective backported version), so this might already be fixed. That said, we are doing some final long term comparisons with our test environments to confirm that the systems that had the patch applied and displayed no leak, will start leaking once the patch is removed. We are confident this patch fixes the leak with MiqServer, but want to be confident in saying this is the same with the other workers as well, and that there isn't possibly another leak at play here. Next update will be roughly in a week's time.
Ryan, Would you be able to run this with the latest code to see if your issue is fixed? When I ran them, this memory leak seemed to be resolved. Keenan
Marking as TestOnly, as the fix wasn't specific to MetricsCollectorWorker and fix/hotfix for generic memory fix is tracked in bug #1535720 and its clones for all versions.