Bug 1820083 - No datapoints found for metal3 metrics in UI
Summary: No datapoints found for metal3 metrics in UI
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.0
Assignee: Tomas Sedovic
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-02 08:45 UTC by Sasha Smolyak
Modified: 2023-09-18 00:20 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-16 14:42:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
curl http://localhost:8085/metrics (35.52 KB, text/plain)
2020-04-02 08:45 UTC, Sasha Smolyak
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-operator pull 671 0 None closed Bug 1820083: configure metal3 metrics collection 2021-02-08 18:46:22 UTC

Description Sasha Smolyak 2020-04-02 08:45:48 UTC
Created attachment 1675643 [details]
curl http://localhost:8085/metrics

Description of problem:
No datapoints found for metal3 metrics in UI

Version-Release number of selected component (if applicable):
Openshift: 4.4.0-0.nightly-2020-04-01-005209

How reproducible:
100%

Steps to Reproduce:
1. Deploy a cluster, observe metal3 metrics in cli: curl http://localhost:8085/metrics
2. There is a number of meal3 metrics, found in metal3 pod, metal3-baremetal-operator:
metal3_reconcile_error_total
​​​​metal3_host_error_total
​​metal3_operation_power_change_total
metal3_credentials_missing_total
metal3_credentials_invalid_total
​​metal3_credentials_unhandled_error_total​​
​​metal3_credentials_updated_total​metal3_credentials_no_management_access_total
​​metal3_host_config_data_error_total​​
​​metal3_operation_register_duration_seconds
​​metal3_operation_inspect_duration_seconds
​​metal3_operation_provision_duration_seconds
​​metal3_operation_deprovision_duration_seconds
​​metal3_provisioning_state_change_total
​​metal3_host_registration_required_total
​​metal3_delete_without_deprovisioning_total
3. Enter UI, go to Metrics, look for data about any of these metrics

Actual results:
No datapoints found

Expected results:
Datapoints for the metrics

Additional info:
log of cli metrics attached

Comment 1 Jiri Tomasek 2020-05-15 09:40:25 UTC
Can you please confirm that the URL you're getting the metrics from is prometheus? It looks like the metrics are exposed but are not scraped by prometheus and therefore it does not appear in the UI. This is very likely not a UI bug but rather a missing prometheus configuration.

Comment 2 Sasha Smolyak 2020-05-21 09:36:56 UTC
I can confirm that in cli I'm getting those metrics, they are collected, just not found by UI. So maybe it's the missing configuration, I'm not arguing about it.
I do though see them when working in cli mode

Comment 8 Honza Pokorny 2020-08-04 16:05:58 UTC
This seems to be caused by a missing BMO ServiceMonitor in machine-api-operator.

Comment 9 Steven Hardy 2020-08-18 16:05:10 UTC
*** Bug 1868411 has been marked as a duplicate of this bug. ***

Comment 16 Sasha Smolyak 2020-11-30 13:08:21 UTC
Cluster version 4.7.0-0.nightly-2020-11-29-133728


Got null results for any of the metal3 metrics both in openshift cluster UI, metrics tab,
and prometheus UI: https://prometheus-k8s-openshift-monitoring.apps.ocp-edge-cluster-0.qe.lab.redhat.com/graph)
and through prometheus API:
[kni@provisionhost-0-0 ~]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query?query=%20avg%20by%20(instance)%20(irate(​​metal3_operation_power_change_total%7Binstance%3D%22master-0-0%22%2Cmode%3D%22idle%22%7D%5B5m%5D))" | jq -r .data.result[0].value[1]

null 
[kni@provisionhost-0-0 ~]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query?query=%20avg%20by%20(instance)%20(irate(​​​​metal3_operation_register_duration_seconds%7Binstance%3D%22master-0-0%22%2Cmode%3D%22idle%22%7D%5B5m%5D))" | jq -r .data.result[0].value[1]
null
[kni@provisionhost-0-0 ~]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query?query=%20avg%20by%20(instance)%20(irate(​​​​metal3_host_registration_required_total%7Binstance%3D%22master-0-0%22%2Cmode%3D%22idle%22%7D%5B5m%5D))" | jq -r .data.result[0].value[1]
null
[kni@provisionhost-0-0 ~]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query?query=%20avg%20by%20(instance)%20(irate(​​​​metal3_host_registration_required_total%7Binstance%3D%22master-0-1%22%2Cmode%3D%22idle%22%7D%5B5m%5D))" | jq -r .data.result[0].value[1]
null
[kni@provisionhost-0-0 ~]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query?query=%20avg%20by%20(instance)%20(irate(​​​​metal3_host_registration_required_total%7Binstance%3D%22master-0-2%22%2Cmode%3D%22idle%22%7D%5B5m%5D))" | jq -r .data.result[0].value[1]

Comment 21 Zane Bitter 2021-06-21 17:38:37 UTC
This was fixed in MAO during the 4.7 cycle, but the code was removed again prior to the 4.7 release when we switched to the cluster-baremetal-operator. The fix has never been implemented in the CBO, although there is an open (but outdated) PR for it - https://github.com/openshift/cluster-baremetal-operator/pull/99.

Comment 26 Dmitry Tantsur 2023-02-16 14:42:03 UTC
This bug was opened against a very old version, and the patch has been abandoned since then. I'm closing this bug. If you still experience the issue, please open a bug (in jira) against a new version with updated information.

Comment 27 Red Hat Bugzilla 2023-09-18 00:20:42 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.