Description of problem: Podified CloudForms tries to collect metrics from OCP Providers, even if they have metrics disabled. Version-Release number of selected component (if applicable): 5.9.3 How reproducible: Always Steps to Reproduce: 1. Add OCP Provider 2. Turn on C&U roles 3. Wait until appliance performs metrics collection Actual results: CloudForms tries to collect metrics, filling up to 9 GB of data worth of logs in one day. Expected results: CloudForms does not collect metrics on container provider not configured for it Additional info: [----] I, [2018-07-31T15:42:37.661824 #774:d1d10c] INFO -- : MIQ(ManageIQ::Providers::Openshift::ContainerManager::MetricsCollectorWorker::Runner#get_message_via_drb) Message id: [259], MiqWorker id: [14], Zone: [default], Role: [ems_metrics_collector], Server: [], Ident: [openshift], Target id: [], Instance id: [10], Task id: [], Command: [ManageIQ::Providers::Kubernetes::ContainerManager::ContainerNode.perf_capture_realtime], Timeout: [600], Priority: [100], State: [dequeue], Deliver On: [], Data: [], Args: [2018-07-31 00:00:00 UTC], Dequeued in: [27.914560916] seconds [----] I, [2018-07-31T15:42:37.662860 #774:d1d10c] INFO -- : MIQ(MiqQueue#deliver) Message id: [259], Delivering... [----] I, [2018-07-31T15:42:37.843357 #782:d1d10c] INFO -- : MIQ(ManageIQ::Providers::Openshift::ContainerManager::MetricsCollectorWorker::Runner#get_message_via_drb) Message id: [260], MiqWorker id: [15], Zone: [default], Role: [ems_metrics_collector], Server: [], Ident: [openshift], Target id: [], Instance id: [9], Task id: [], Command: [ManageIQ::Providers::Kubernetes::ContainerManager::ContainerNode.perf_capture_realtime], Timeout: [600], Priority: [100], State: [dequeue], Deliver On: [], Data: [], Args: [2018-07-31 00:00:00 UTC], Dequeued in: [28.07026418] seconds [----] I, [2018-07-31T15:42:37.844282 #782:d1d10c] INFO -- : MIQ(MiqQueue#deliver) Message id: [260], Delivering... [----] I, [2018-07-31T15:42:37.902635 #774:d1d10c] INFO -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::ContainerNode#just_perf_capture) [realtime] Capture for ManageIQ::Providers::Kubernetes::ContainerManager::ContainerNode name: [node-5.cloudforms.lab.rdu2.cee.redhat.com], id: [10], start_time: [2018-07-31 00:00:00 UTC]... [----] I, [2018-07-31T15:42:38.191383 #782:d1d10c] INFO -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::ContainerNode#just_perf_capture) [realtime] Capture for ManageIQ::Providers::Kubernetes::ContainerManager::ContainerNode name: [node-4.cloudforms.lab.rdu2.cee.redhat.com], id: [9], start_time: [2018-07-31 00:00:00 UTC]... [----] I, [2018-07-31T15:42:38.443311 #774:d1d10c] INFO -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture#perf_collect_metrics) Collecting metrics for ContainerNode(10) [realtime] [2018-07-31 00:00:00 UTC] [] [----] W, [2018-07-31T15:42:38.550293 #774:d1d10c] WARN -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture#perf_collect_metrics) [ContainerNode(10)] no metrics endpoint found for ContainerNode(10) [----] I, [2018-07-31T15:42:38.608412 #774:d1d10c] INFO -- : Exception in realtime_block :total_time - Timings: {:capture_state=>0.5117576122283936, :total_time=>0.7051336765289307} [----] E, [2018-07-31T15:42:38.609509 #774:d1d10c] ERROR -- : MIQ(MiqQueue#deliver) Message id: [259], Error: [no metrics endpoint found for ContainerNode(10)] [----] E, [2018-07-31T15:42:38.609970 #774:d1d10c] ERROR -- : [ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture::TargetValidationWarning]: no metrics endpoint found for ContainerNode(10) Method:[block in method_missing] [----] E, [2018-07-31T15:42:38.610163 #774:d1d10c] ERROR -- : /opt/rh/cfme-gemset/bundler/gems/cfme-providers-kubernetes-1748f9b993cf/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture.rb:102:in `perf_collect_metrics' /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:6:in `perf_collect_metrics' /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:193:in `block in just_perf_capture' /opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-3457c5b58220/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store' /opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-3457c5b58220/lib/gems/pending/util/extensions/miq-benchmark.rb:35:in `realtime_block' /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:189:in `just_perf_capture' /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:135:in `perf_capture' /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:117:in `perf_capture_realtime' /var/www/miq/vmdb/app/models/miq_queue.rb:449:in `block in dispatch_method' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout' /var/www/miq/vmdb/app/models/miq_queue.rb:448:in `dispatch_method' /var/www/miq/vmdb/app/models/miq_queue.rb:425:in `block in deliver' /var/www/miq/vmdb/app/models/user.rb:261:in `with_user_group' /var/www/miq/vmdb/app/models/miq_queue.rb:425:in `deliver' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:104:in `deliver_queue_message' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:329:in `block in do_work_loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:326:in `loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:326:in `do_work_loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:153:in `run' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:127:in `start' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:22:in `start_worker' /var/www/miq/vmdb/app/models/miq_worker.rb:376:in `block in start_runner_via_fork' /opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork' /opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork' /var/www/miq/vmdb/app/models/miq_worker.rb:374:in `start_runner_via_fork' /var/www/miq/vmdb/app/models/miq_worker.rb:368:in `start_runner' /var/www/miq/vmdb/app/models/miq_worker.rb:415:in `start' /var/www/miq/vmdb/app/models/miq_worker.rb:266:in `start_worker' /var/www/miq/vmdb/app/models/miq_worker.rb:153:in `block in sync_workers' /var/www/miq/vmdb/app/models/miq_worker.rb:153:in `times' /var/www/miq/vmdb/app/models/miq_worker.rb:153:in `sync_workers' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:53:in `block in sync_workers' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:22:in `monitor_workers' /var/www/miq/vmdb/app/models/miq_server.rb:338:in `block in monitor' /opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-3457c5b58220/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store' /opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-3457c5b58220/lib/gems/pending/util/extensions/miq-benchmark.rb:28:in `realtime_block' /var/www/miq/vmdb/app/models/miq_server.rb:338:in `monitor' /var/www/miq/vmdb/app/models/miq_server.rb:377:in `block (2 levels) in monitor_loop' /opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-3457c5b58220/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store' /opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-3457c5b58220/lib/gems/pending/util/extensions/miq-benchmark.rb:35:in `realtime_block' /var/www/miq/vmdb/app/models/miq_server.rb:377:in `block in monitor_loop' /var/www/miq/vmdb/app/models/miq_server.rb:376:in `loop' /var/www/miq/vmdb/app/models/miq_server.rb:376:in `monitor_loop' /var/www/miq/vmdb/app/models/miq_server.rb:239:in `start' /var/www/miq/vmdb/lib/workers/evm_server.rb:27:in `start' /var/www/miq/vmdb/lib/workers/evm_server.rb:48:in `start' /var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>' [----] I, [2018-07-31T15:42:38.611092 #774:d1d10c] INFO -- : MIQ(MiqQueue#delivered) Message id: [259], State: [error], Delivered in [0.948315115] seconds [----] I, [2018-07-31T15:42:38.650455 #774:d1d10c] INFO -- : MIQ(ManageIQ::Providers::Openshift::ContainerManager::MetricsCollectorWorker::Runner#get_message_via_drb) Message id: [261], MiqWorker id: [14], Zone: [default], Role: [ems_metrics_collector], Server: [], Ident: [openshift], Target id: [], Instance id: [8], Task id: [], Command: [ManageIQ::Providers::Kubernetes::ContainerManager::ContainerNode.perf_capture_realtime], Timeout: [600], Priority: [100], State: [dequeue], Deliver On: [], Data: [], Args: [2018-07-31 00:00:00 UTC], Dequeued in: [28.859538532] seconds
For what it's worth. I'm seeing this on an appliance version of CFME. 5.9.0.22.20180221205805_f93a675
Changing component from pod to appliance due to new information from comment 2 and customer case update.
Bronagh, this looks to be related to the provider configuration and enabling of cap and u. Can you take a look? Thank you.
https://github.com/ManageIQ/manageiq/pull/17820
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/d78c0766e438d6d1d6157546fefe54cafc297699 commit d78c0766e438d6d1d6157546fefe54cafc297699 Author: Adam Grare <agrare> AuthorDate: Wed Aug 8 10:54:54 2018 -0400 Commit: Adam Grare <agrare> CommitDate: Wed Aug 8 10:54:54 2018 -0400 Don't queue metrics capture if metrics unsupported If metrics capture is unsupported by the provider then do not queue perf_capture for targets on that EMS. Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1610449 app/models/manageiq/providers/container_manager.rb | 5 + app/models/metric/targets.rb | 2 + 2 files changed, 7 insertions(+)
Verified in 5.10.0.20. There is no information in logs about collecting metrics(enabled C&U roles) from OCP provider which has metrics disabled.