Bug 1517064
| Summary: | MetricsCollectorWorker workers are started even if metrics collection is disabled for a container provider | |||
|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Prasad Mukhedkar <pmukhedk> | |
| Component: | Providers | Assignee: | Yaacov Zamir <yzamir> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Shalom Naim <snaim> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 5.9.0 | CC: | cpelland, gblomqui, jfrey, jhardy, lavenel, lsmola, obarenbo, pmukhedk, yzamir | |
| Target Milestone: | GA | Keywords: | TestOnly | |
| Target Release: | 5.10.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | testathon | |||
| Fixed In Version: | 5.10.0.0 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1524628 (view as bug list) | Environment: | ||
| Last Closed: | 2018-06-21 20:30:46 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | Container Management | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1524628 | |||
Prasad: this is expected behavior. Whether the worker should be started, that is based on if the role for metric collection is enabled. This works like that everywhere. So we would need a RFE for changing that behaviour. I think that it's recommended to have 1 provider per zone, for this purpose. Yaacov: Maybe we should not see this failure though. Should we rather display a warning if provider has no Hawkular/Prometheus set but collector is started? Also, I believe that if C&U is disabled, we should not be queuing any targets for collection? (we should have another BZs for these) > Maybe we should not see this failure though. Should we rather display a warning if provider has no Hawkular/Prometheus set but collector is started? Agree, should we use this bug to fix this or open a new one, to: a. make it Warning instead of Error b. make is one line, instead of the multiple errors. > Also, I believe that if C&U is disabled, we should not be queuing any targets for collection? (we should have another BZs for these) Also agree :-) , should we use this bug to fix this or open a new one ? P.S I will start working on a patch to fix the things Ladislav suggested, please comment here if we need to open a new BZ, or use this one ? This patch: https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/159 Changes the way we collect metrics, it also: a. issue a Warning once when no metrics endpoint is available. b. does not try to get metrics if not endpoint found. Changing this BZ to on dev since this patch (for different BZ) will also fix the above issues. Yaacov, Please add this BZ to the PR's description (add to the existing bug) merged upstream: https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/159 moving to post although this does not fix the original problem, only: a. issue a Warning once when no metrics endpoint is available. b. does not try to get metrics if not endpoint found. please re-open if this is not enough, and we need to address the underling problem of actually starting a worker when C&U is on and no metrics endpoint is defined. [ this is a core issue, not a container only problem, and we will need to re-assign this to the core team ] Note for QE: When testing: Currently we see an Error if C&U is set but no metrics endpoints: """MetricsCapture#perf_collect_metrics) Hawkular metrics service unavailable""" After fix: a. We should not see Error if C&U is on but not metrics endpoint is set. b. We should see Warnings about it in the log. |
Description of problem: Metrics collection for openshift provider is set to disabled but still following workers are started on the appliances where c&u roles enabled in the zone. [root@cfwork1 vmdb]# rake evm:status | grep openshift ManageIQ::Providers::Openshift::ContainerManager::MetricsCollectorWorker | started | 153 | 1808 | 12745 | 3 | openshift | 2017-11-24T04:05:00Z | 2017-11-24T05:35:41Z | 214 ManageIQ::Providers::Openshift::ContainerManager::MetricsCollectorWorker | started | 158 | 1816 | 12751 | 3 | openshift | 2017-11-24T04:05:00Z | 2017-11-24T05:35:28Z | 209 [root@cfwork1 vmdb]# and following error message is reported in the evm.log file repetedely : [----] E, [2017-11-24T00:31:43.351235 #1816:b11138] ERROR -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture#perf_collect_metrics) Hawkular metrics service unavailable: Hawkular::ConnectionException: Failed to open TCP connection to ose.example.com:5000 (No route to host - connect(2) for "ose.example.com" port 5000) [----] I, [2017-11-24T00:31:43.376426 #1816:b11138] INFO -- : Exception in realtime_block :total_time - Timings: {:capture_state=>0.0041866302490234375, :collect_data=>3.0460519790649414, :total_time=>3.07134747505188} [----] E, [2017-11-24T00:31:43.376893 #1816:b11138] ERROR -- : MIQ(MiqQueue#deliver) Message id: [12583], Error: [Hawkular::ConnectionException: Failed to open TCP connection to ose.example.com:5000 (No route to host - connect(2) for "ose.example.com" port 5000)] [----] E, [2017-11-24T00:31:43.377335 #1816:b11138] ERROR -- : [ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture::CollectionFailure]: Hawkular::ConnectionException: Failed to open TCP connection to ose.example.com:5000 (No route to host - connect(2) for "ose.example.com" port 5000) Method:[block in method_missing] [----] E, [2017-11-24T00:31:43.377535 #1816:b11138] ERROR -- : /opt/rh/cfme-gemset/bundler/gems/manageiq-providers-kubernetes-d2e40c479ac0/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context.rb:59:in `rescue in fetch_counters_data' /opt/rh/cfme-gemset/bundler/gems/manageiq-providers-kubernetes-d2e40c479ac0/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context.rb:50:in `fetch_counters_data' /opt/rh/cfme-gemset/bundler/gems/manageiq-providers-kubernetes-d2e40c479ac0/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/capture_context_mixin.rb:58:in `fetch_counters_rate' /opt/rh/cfme-gemset/bundler/gems/manageiq-providers-kubernetes-d2e40c479ac0/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_capture_context.rb:24:in `collect_container_metrics' /opt/rh/cfme-gemset/bundler/gems/manageiq-providers-kubernetes-d2e40c479ac0/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/capture_context_mixin.rb:28:in `collect_metrics' /opt/rh/cfme-gemset/bundler/gems/manageiq-providers-kubernetes-d2e40c479ac0/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture.rb:76:in `block in perf_collect_metrics' /opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-76700bd7592a/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store' /opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-76700bd7592a/lib/gems/pending/util/extensions/miq-benchmark.rb:28:in `realtime_block' /opt/rh/cfme-gemset/bundler/gems/manageiq-providers-kubernetes-d2e40c479ac0/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture.rb:74:in `perf_collect_metrics' /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:6:in `perf_collect_metrics' /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:193:in `block in just_perf_capture' /opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-76700bd7592a/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store' /opt/rh/cfme-gemset/bundler/gems/manageiq-gems-pending-76700bd7592a/lib/gems/pending/util/extensions/miq-benchmark.rb:35:in `realtime_block' /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:189:in `just_perf_capture' /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:135:in `perf_capture' /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:117:in `perf_capture_realtime' /var/www/miq/vmdb/app/models/miq_queue.rb:449:in `block in dispatch_method' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout' /var/www/miq/vmdb/app/models/miq_queue.rb:448:in `dispatch_method' /var/www/miq/vmdb/app/models/miq_queue.rb:425:in `block in deliver' /var/www/miq/vmdb/app/models/user.rb:253:in `with_user_group' /var/www/miq/vmdb/app/models/miq_queue.rb:425:in `deliver' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:104:in `deliver_queue_message' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:329:in `block in do_work_loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:326:in `loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:326:in `do_work_loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:153:in `run' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:127:in `start' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:22:in `start_worker' /var/www/miq/vmdb/app/models/miq_worker.rb:357:in `block in start_runner_via_fork' /opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork' /opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork' /var/www/miq/vmdb/app/models/miq_worker.rb:355:in `start_runner_via_fork' /var/www/miq/vmdb/app/models/miq_worker.rb:349:in `start_runner' /var/www/miq/vmdb/app/models/miq_worker.rb:396:in `start' /var/www/miq/vmdb/app/models/miq_worker.rb:266:in `start_worker' /var/www/miq/vmdb/app/models/miq_worker.rb:153:in `block in sync_workers' /var/www/miq/vmdb/app/models/miq_worker.rb:153:in `times' /var/www/miq/vmdb/app/models/miq_worker.rb:153:in `sync_workers' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:53:in `block in sync_workers' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers' /var/www/miq/vmdb/app/models/miq_server.rb:141:in `start' /var/www/miq/vmdb/app/models/miq_server.rb:233:in `start' /var/www/miq/vmdb/lib/workers/evm_server.rb:27:in `start' /var/www/miq/vmdb/lib/workers/evm_server.rb:48:in `start' Version-Release number of selected component (if applicable): cfme-5.9.0.10-1.el7cf.x86_64 How reproducible: 1. Navigate to Compute → Containers → Providers. 2. Click Configuration (Configuration), then click Add a New Containers Provider (Add Existing Containers Provider). 3. Enter a Name for the provider. 4. From the Type list, select OpenShift Container Platform. 5. Select appropriate zone 6. Fom the Metrics list, set disabled. 7. add provider. Actual results: MetricsCollectorWorker workers are started for container provider. Expected results: Ideally no meticcollector workers should be started when metric collection is set to disabled for container provider. Additional info: