Bug 1296638 - Metrics Collector Workers memory threshold displayed as 200MiB in the Web UI, however they exit at 500MiB threshold
Metrics Collector Workers memory threshold displayed as 200MiB in the Web UI,...
Status: CLOSED CURRENTRELEASE
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: UI - OPS (Show other bugs)
5.5.0
Unspecified Unspecified
high Severity high
: GA
: 5.8.0
Assigned To: Harpreet Kataria
Pradeep Kumar Surisetty
c&u:perf
: TestOnly, ZStream
Depends On:
Blocks: 1411478
  Show dependency treegraph
 
Reported: 2016-01-07 12:31 EST by Alex Krzos
Modified: 2017-04-12 05:26 EDT (History)
13 users (show)

See Also:
Fixed In Version: 5.8.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1411478 (view as bug list)
Environment:
Last Closed: 2017-04-12 05:26:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alex Krzos 2016-01-07 12:31:46 EST
Description of problem:
The Web UI displays a memory threshold of 200MiB for C&U Data Collectors, however in all of my memory baseline tests in which a C&U Data Collector exceeds memory it appears that the limit is actually 400MiB (The default for queue_worker_base).  

I would recommend a minimum of 400MiB threshold for C&U collectors for VMware/RHEVM environments on 5.5.  5.4 we can seem to get away with a 200-300MiB limit.

Version-Release number of selected component (if applicable):
5.4.4.2
5.5.0.13-2
5.5.2.0


How reproducible:
I can reproduce this with my C&U memory baseline tests with RHEVM providers since those collectors regularly exceed the memory threshold.

Steps to Reproduce:
1.
2.
3.

Actual results:
The Web UI to display the default 400MiB

Expected results:


Additional info:

It is unclear if the Web UI displayed setting even affects anything.  I have not tested the functionality of it.


Relevant Log Lines from 5.5.2.0:
[----] I, [2016-01-07T10:10:30.807133 #34133:11af990]  INFO -- : MIQ(MiqQueue.put) Message id: [22667],  id: [], Zone: [default], Role: [], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [MiqEvent.raise_evm_event], Timeout: [600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: [["MiqServer", 1], "evm_worker_memory_exceeded", {:event_details=>"Worker [ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker] with ID: [18], PID: [48824], GUID: [f156dd68-b54d-11e5-9d80-001a4a223927] process memory usage [443781120] exceeded limit [419430400], requesting worker to exit", :type=>"ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker"}]
...
[----] I, [2016-01-07T10:12:01.508835 #48824:1229998]  INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker#log_status) [C&U Metrics Collector for RHEV] Worker ID [18], PID [48824], GUID [f156dd68-b54d-11e5-9d80-001a4a223927], Last Heartbeat [2016-01-07 15:08:21 UTC], Process Info: Memory Usage [593125376], Memory Size [934707200], Memory % [3.56], CPU Time [42567.0], CPU % [0.27], Priority [23]
[----] I, [2016-01-07T10:12:01.509028 #48824:1229998]  INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker::Runner) ID [18] PID [48824] GUID [f156dd68-b54d-11e5-9d80-001a4a223927] Exit request received. Worker exiting.
Comment 2 Keenan Brock 2016-01-08 07:45:01 EST
The setting of this functionality broke by:

commit 49dc2581fed7acfc50c5a7d4c984289de0031906
Author: Keenan Brock <kbrock@redhat.com>
Date:   Sat Nov 21 14:33:53 2015 -0500

    path_to_my_worker_settings

I'm still tracking down the reading of that variable and if it is/has been respected
Comment 3 Keenan Brock 2016-04-21 12:18:14 EDT
Alex,

Are you still seeing behavior like this?
We updated the code that reads/defaults these parameters.
We also rewrote the configuration system.
Comment 4 Alex Krzos 2016-05-10 09:56:31 EDT
(In reply to Keenan Brock from comment #3)
> Alex,
> 
> Are you still seeing behavior like this?
> We updated the code that reads/defaults these parameters.
> We also rewrote the configuration system.

Hi Keenan,

I reviewed 5.6.0.5 (beta2.4) and still see the memory threshold in the UI as 200MiB for C&U collectors but I have witnessed the RSS memory usage greater than 340MiB during tests with a large vmware provider.  Additionally I do not see a :memory_threshold: configured option under ems_metrics_collector_worker - defaults so I assume it is defaulting to queue_worker_base which is 400MiB.

Perhaps the patches haven't made there way into 5.6.0.5 yet?
Comment 5 Keenan Brock 2016-10-10 10:51:46 EDT
This is a configuration issue in the core.

Also suggesting moving this to 5.7
Comment 7 Joe Rafaniello 2016-11-22 16:48:58 EST
Dan, I noticed this Bz while looking at another issue and thought it was in the queue area... Can you have someone look at this?

I tried tracking this one down but couldn't understand the code in app/views/ops/_settings_workers_tab.html.haml and
app/controllers/ops_controller/settings/common.rb

I believe the problem is that the UI code is walking the hashes for the existing settings and new settings and assuming a specific structure.

    :queue_worker_base:
      :defaults:
        :cpu_usage_threshold: 100.percent
        :dequeue_method: :drb
        :memory_threshold: 500.megabytes
        :poll_method: :normal
        :queue_timeout: 10.minutes
      :ems_metrics_collector_worker:
        :defaults:
          :count: 2
          :nice_delta: 3
          :poll_method: :escalate

I believe it's trying to look at 
[:queue_worker_base][:ems_metrics_collector_worker:][:defaults][:memory_threshold], failing to find it and defaulting back to 200.megabytes.

I don't understand where 200 is coming from though since the fallback seems to be 400 megabytes if it's not found (in common.rb:1024):

      qwb[:ems_metrics_collector_worker] ||= {}
      qwb[:ems_metrics_collector_worker][:defaults] ||= {}
      w = qwb[:ems_metrics_collector_worker][:defaults]
      raw = @edit[:current].get_raw_worker_setting(:MiqEmsMetricsCollectorWorker)
      w[:count] = raw[:defaults][:count] || 2
      w[:memory_threshold] = rails_method_to_human_size(raw[:defaults][:memory_threshold] || 400.megabytes)
      @sb[:ems_metrics_collector_threshold] = []
Comment 8 Joe Rafaniello 2016-11-22 17:09:26 EST
Correction:

Dan, I noticed this Bz while looking at another issue and thought it was in the WRONG queue/assignment... Can you have someone look at this?
Comment 10 Harpreet Kataria 2016-12-05 17:34:33 EST
https://github.com/ManageIQ/manageiq/pull/12999
Comment 11 Joe Rafaniello 2016-12-06 10:00:35 EST
Note, this 400 MB value reported in this BZ was subsequently modified to 500 in https://bugzilla.redhat.com/show_bug.cgi?id=1391687 via
https://github.com/ManageIQ/manageiq/pull/12484

Sorry, changing description to reflect that change.

Note, that ems_refresh_core_worker just like Metrics Collector Workers and many other workers are inheriting the 500 MB memory_threshold from queue_worker_base and would probably exhibit similar problems as reported in this BZ.
Comment 12 CFME Bot 2016-12-06 12:17:00 EST
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/1f2687fc89328eccb37724dbeddf6311e9dffbac

commit 1f2687fc89328eccb37724dbeddf6311e9dffbac
Author:     Harpreet Kataria <hkataria@redhat.com>
AuthorDate: Mon Dec 5 17:02:16 2016 -0500
Commit:     Harpreet Kataria <hkataria@redhat.com>
CommitDate: Mon Dec 5 17:02:16 2016 -0500

    Added a missing default memeory threshold setting
    
    Added a missing default memeory threshold setting for C & U Data Collectors that was causing drop down to the firt item in list as selected value by default.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1296638

 config/settings.yml | 1 +
 1 file changed, 1 insertion(+)
Comment 14 Archit Sharma 2017-04-11 08:09:49 EDT
while connected to a 1000 vm RHVM environment, this seems to have been fixed in 580x. 

reference: https://gist.github.com/arcolife/648c83a7f53ee6a706dd8fda278080e1

[----] W, [2017-04-11T07:50:57.579104 #40011:1045140]  WARN -- : MIQ(MiqServer#validate_worker) Worker [ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker] with ID: [42], PID: [21634], GUID: [ef1e1b5a-1eac-11e7-8366-001a4a22391a] process memory usage [420320000] exceeded limit [419430400], requesting worker to exit


checks out against the UI params:

      :ems_metrics_collector_worker:
        :defaults:
          :count: 2
          :memory_threshold: 400.megabytes
          :nice_delta: 3
          :poll_method: :escalate

Note You need to log in before you can comment on or make changes to this bug.