Bug 1753788 - OCP Metrics are not being collected for 1 cluster
Summary: OCP Metrics are not being collected for 1 cluster
Keywords:
Status: CLOSED DUPLICATE of bug 1754071
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.10.8
Hardware: All
OS: All
high
high
Target Milestone: GA
: 5.10.13
Assignee: Yaacov Zamir
QA Contact: juwatts
Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-19 20:58 UTC by Tuan
Modified: 2023-03-24 15:29 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-22 18:49:35 UTC
Category: Bug
Cloudforms Team: Container Management
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Tuan 2019-09-19 20:58:56 UTC
Description of problem:
1 out of 3 clusters is not showing metrics within CloudForms. 

We see an no associated node error and namespace error related to these BZs:

No associated node error: https://bugzilla.redhat.com/show_bug.cgi?id=1614735
Namespace error: https://bugzilla.redhat.com/show_bug.cgi?id=1722808


----] E, [2019-09-17T04:05:11.888000 #170:b19108] ERROR -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture#perf_collect_metrics) Metrics unavailable: [Container(10678)] Hawkular::Exception: {"ErrorMsg": "Could not determine a namespace id for namespace ecpauto-intra-nopii-0910-20-20"}
[----] I, [2019-09-17T04:05:11.917990 #170:b19108]  INFO -- : Exception in realtime_block :total_time - Timings: {:capture_state=>0.029214859008789062, :collect_data=>0.32401561737060547, :total_time=>0.3820459842681885}
[----] E, [2019-09-17T04:05:11.918628 #170:b19108] ERROR -- : MIQ(MiqQueue#deliver) Message id: [6930914], Error: [Hawkular::Exception: {"ErrorMsg": "Could not determine a namespace id for namespace ecpauto-intra-nopii-0910-20-20"}]
[----] E, [2019-09-17T04:05:11.918976 #170:b19108] ERROR -- : [ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture::CollectionFailure]: Hawkular::Exception: {"ErrorMsg": "Could not determine a namespace id for namespace ecpauto-intra-nopii-0910-20-20"}  Method:[block in method_missing]
[----] E, [2019-09-17T04:05:11.919092 #170:b19108] ERROR -- : /opt/rh/cfme-gemset/bundler/gems/cfme-providers-kubernetes-1748f9b993cf/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context.rb:59:in `rescue in fetch_counters_data'
/opt/rh/cfme-gemset/bundler/gems/cfme-providers-kubernetes-1748f9b993cf/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context.rb:49:in `fetch_counters_data'
/opt/rh/cfme-gemset/bundler/gems/cfme-providers-kubernetes-1748f9b993cf/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context.rb:91:in `fetch_counters_rate'
/opt/rh/cfme-gemset/bundler/gems/cfme-providers-kubernetes-1748f9b993cf/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/hawkular_legacy_capture_context.rb:24:in `collect_container_metrics'


We are also seeing a no associated node failure


[----] E, [2019-09-17T03:39:39.877013 #170:b19108] ERROR -- : MIQ(MiqQueue#deliver) Message id: [6921475], Error: [no associated node]
[----] E, [2019-09-17T03:39:39.877288 #170:b19108] ERROR -- : [ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture::TargetValidationWarning]: no associated node  Method:[block in method_missing]
[----] E, [2019-09-17T03:39:39.877382 #170:b19108] ERROR -- : /opt/rh/cfme-gemset/bundler/gems/cfme-providers-kubernetes-1748f9b993cf/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/capture_context_mixin.rb:43:in `validate_target'
/opt/rh/cfme-gemset/bundler/gems/cfme-providers-kubernetes-1748f9b993cf/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture/capture_context_mixin.rb:22:in `initialize'
/opt/rh/cfme-gemset/bundler/gems/cfme-providers-kubernetes-1748f9b993cf/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture.rb:67:in `new'
/opt/rh/cfme-gemset/bundler/gems/cfme-providers-kubernetes-1748f9b993cf/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture.rb:67:in `hawkular_capture_context'
/opt/rh/cfme-gemset/bundler/gems/cfme-providers-kubernetes-1748f9b993cf/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture.rb:86:in `capture_context'
/opt/rh/cfme-gemset/bundler/gems/cfme-providers-kubernetes-1748f9b993cf/app/models/manageiq/providers/kubernetes/container_manager/metrics_capture.rb:100:in `perf_collect_metrics'
/var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:6:in `perf_collect_metrics'
/var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:193:in `block in just_perf_capture'
/opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-3457c5b58220/lib/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/opt/rh/cfme-gemset/bundler/gems/cfme-gems-pending-3457c5b58220/lib/gems/pending/util/extensions/miq-benchmark.rb:35:in `realtime_block'
/var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:189:in `just_perf_capture'
/var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:135:in `perf_capture'
/var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:117:in `perf_capture_realtime'
/var/www/miq/vmdb/app/models/miq_queue.rb:449:in `block in dispatch_method'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout'
/var/www/miq/vmdb/app/models/miq_queue.rb:448:in `dispatch_method'
/var/www/miq/vmdb/app/models/miq_queue.rb:425:in `block in deliver'
/var/www/miq/vmdb/app/models/user.rb:261:in `with_user_group'
/var/www/miq/vmdb/app/models/miq_queue.rb:425:in `deliver'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:104:in `deliver_queue_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message'


We have also reached out to the shift team to see if this is an env issue


"Not much we can tell you from the shift side regarding this with out knowing what api calls ManageIQ is making."


I've explained that we use complex api calls for this so this will not be readily available. 


Available logs can be found here: http://file.rdu.redhat.com/tuado/logs/amex02457236/

Version-Release number of selected component (if applicable):
5.9

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Yaacov Zamir 2019-09-22 07:02:37 UTC
Heads up:
Between Sep-26 and Oct-22 I will be available only on 2,3,6,7 and 10 of Oct.
In the meantime (until Sep-26) I will start setting up a dev environment for ManageIQ and look for an OCP cluster with Hawkular/Cassandra metrics.

@Oved since this is a high priority bug and I will be mostly unavailable until Oct-22, can you suggest a contact for help, if needed until Oct-22 ?

Comment 6 Yaacov Zamir 2019-09-22 10:05:29 UTC
Tuan, hi,

Do you still have any problems with refresh https://bugzilla.redhat.com/show_bug.cgi?id=1722808 ?

Comment 9 dmetzger 2019-10-22 18:49:35 UTC

*** This bug has been marked as a duplicate of bug 1722808 ***

Comment 10 dmetzger 2019-10-28 18:14:15 UTC

*** This bug has been marked as a duplicate of bug 1754071 ***


Note You need to log in before you can comment on or make changes to this bug.