Hide Forgot
Description of problem: The Web UI displays a memory threshold of 200MiB for C&U Data Collectors, however in all of my memory baseline tests in which a C&U Data Collector exceeds memory it appears that the limit is actually 400MiB (The default for queue_worker_base). I would recommend a minimum of 400MiB threshold for C&U collectors for VMware/RHEVM environments on 5.5. 5.4 we can seem to get away with a 200-300MiB limit. Version-Release number of selected component (if applicable): 5.4.4.2 5.5.0.13-2 5.5.2.0 How reproducible: I can reproduce this with my C&U memory baseline tests with RHEVM providers since those collectors regularly exceed the memory threshold. Steps to Reproduce: 1. 2. 3. Actual results: The Web UI to display the default 400MiB Expected results: Additional info: It is unclear if the Web UI displayed setting even affects anything. I have not tested the functionality of it. Relevant Log Lines from 5.5.2.0: [----] I, [2016-01-07T10:10:30.807133 #34133:11af990] INFO -- : MIQ(MiqQueue.put) Message id: [22667], id: [], Zone: [default], Role: [], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [MiqEvent.raise_evm_event], Timeout: [600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: [["MiqServer", 1], "evm_worker_memory_exceeded", {:event_details=>"Worker [ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker] with ID: [18], PID: [48824], GUID: [f156dd68-b54d-11e5-9d80-001a4a223927] process memory usage [443781120] exceeded limit [419430400], requesting worker to exit", :type=>"ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker"}] ... [----] I, [2016-01-07T10:12:01.508835 #48824:1229998] INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker#log_status) [C&U Metrics Collector for RHEV] Worker ID [18], PID [48824], GUID [f156dd68-b54d-11e5-9d80-001a4a223927], Last Heartbeat [2016-01-07 15:08:21 UTC], Process Info: Memory Usage [593125376], Memory Size [934707200], Memory % [3.56], CPU Time [42567.0], CPU % [0.27], Priority [23] [----] I, [2016-01-07T10:12:01.509028 #48824:1229998] INFO -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker::Runner) ID [18] PID [48824] GUID [f156dd68-b54d-11e5-9d80-001a4a223927] Exit request received. Worker exiting.
The setting of this functionality broke by: commit 49dc2581fed7acfc50c5a7d4c984289de0031906 Author: Keenan Brock <kbrock> Date: Sat Nov 21 14:33:53 2015 -0500 path_to_my_worker_settings I'm still tracking down the reading of that variable and if it is/has been respected
Alex, Are you still seeing behavior like this? We updated the code that reads/defaults these parameters. We also rewrote the configuration system.
(In reply to Keenan Brock from comment #3) > Alex, > > Are you still seeing behavior like this? > We updated the code that reads/defaults these parameters. > We also rewrote the configuration system. Hi Keenan, I reviewed 5.6.0.5 (beta2.4) and still see the memory threshold in the UI as 200MiB for C&U collectors but I have witnessed the RSS memory usage greater than 340MiB during tests with a large vmware provider. Additionally I do not see a :memory_threshold: configured option under ems_metrics_collector_worker - defaults so I assume it is defaulting to queue_worker_base which is 400MiB. Perhaps the patches haven't made there way into 5.6.0.5 yet?
This is a configuration issue in the core. Also suggesting moving this to 5.7
Dan, I noticed this Bz while looking at another issue and thought it was in the queue area... Can you have someone look at this? I tried tracking this one down but couldn't understand the code in app/views/ops/_settings_workers_tab.html.haml and app/controllers/ops_controller/settings/common.rb I believe the problem is that the UI code is walking the hashes for the existing settings and new settings and assuming a specific structure. :queue_worker_base: :defaults: :cpu_usage_threshold: 100.percent :dequeue_method: :drb :memory_threshold: 500.megabytes :poll_method: :normal :queue_timeout: 10.minutes :ems_metrics_collector_worker: :defaults: :count: 2 :nice_delta: 3 :poll_method: :escalate I believe it's trying to look at [:queue_worker_base][:ems_metrics_collector_worker:][:defaults][:memory_threshold], failing to find it and defaulting back to 200.megabytes. I don't understand where 200 is coming from though since the fallback seems to be 400 megabytes if it's not found (in common.rb:1024): qwb[:ems_metrics_collector_worker] ||= {} qwb[:ems_metrics_collector_worker][:defaults] ||= {} w = qwb[:ems_metrics_collector_worker][:defaults] raw = @edit[:current].get_raw_worker_setting(:MiqEmsMetricsCollectorWorker) w[:count] = raw[:defaults][:count] || 2 w[:memory_threshold] = rails_method_to_human_size(raw[:defaults][:memory_threshold] || 400.megabytes) @sb[:ems_metrics_collector_threshold] = []
Correction: Dan, I noticed this Bz while looking at another issue and thought it was in the WRONG queue/assignment... Can you have someone look at this?
https://github.com/ManageIQ/manageiq/pull/12999
Note, this 400 MB value reported in this BZ was subsequently modified to 500 in https://bugzilla.redhat.com/show_bug.cgi?id=1391687 via https://github.com/ManageIQ/manageiq/pull/12484 Sorry, changing description to reflect that change. Note, that ems_refresh_core_worker just like Metrics Collector Workers and many other workers are inheriting the 500 MB memory_threshold from queue_worker_base and would probably exhibit similar problems as reported in this BZ.
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/1f2687fc89328eccb37724dbeddf6311e9dffbac commit 1f2687fc89328eccb37724dbeddf6311e9dffbac Author: Harpreet Kataria <hkataria> AuthorDate: Mon Dec 5 17:02:16 2016 -0500 Commit: Harpreet Kataria <hkataria> CommitDate: Mon Dec 5 17:02:16 2016 -0500 Added a missing default memeory threshold setting Added a missing default memeory threshold setting for C & U Data Collectors that was causing drop down to the firt item in list as selected value by default. https://bugzilla.redhat.com/show_bug.cgi?id=1296638 config/settings.yml | 1 + 1 file changed, 1 insertion(+)
while connected to a 1000 vm RHVM environment, this seems to have been fixed in 580x. reference: https://gist.github.com/arcolife/648c83a7f53ee6a706dd8fda278080e1 [----] W, [2017-04-11T07:50:57.579104 #40011:1045140] WARN -- : MIQ(MiqServer#validate_worker) Worker [ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker] with ID: [42], PID: [21634], GUID: [ef1e1b5a-1eac-11e7-8366-001a4a22391a] process memory usage [420320000] exceeded limit [419430400], requesting worker to exit checks out against the UI params: :ems_metrics_collector_worker: :defaults: :count: 2 :memory_threshold: 400.megabytes :nice_delta: 3 :poll_method: :escalate