Bug 2077291 - Prometheus doesn't display acm_managed_cluster_info after upgrade from 2.4 to 2.5
Summary: Prometheus doesn't display acm_managed_cluster_info after upgrade from 2.4 to...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Cluster Lifecycle
Version: rhacm-2.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: rhacm-2.5
Assignee: Jian Qiu
QA Contact: Hui Chen
Christopher Dawson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-20 22:08 UTC by dhuynh
Modified: 2022-06-09 02:11 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-09 02:10:54 UTC
Target Upstream Version:
Embargoed:
bot-tracker-sync: rhacm-2.5+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 21830 0 None None None 2022-04-21 01:39:19 UTC
Red Hat Product Errata RHSA-2022:4956 0 None None None 2022-06-09 02:11:22 UTC

Description dhuynh 2022-04-20 22:08:17 UTC
Description of the problem:
After upgrading from 2.4 to 2.5 where we have managed clusters installed prior to upgrade, the prometheus query acm_managed_cluster_info does not return any results.

Release version:
2.4.3-DOWNSTREAM-2022-04-13-07-05-00  →  2.5.0-DOWNSTREAM-2022-04-20-06-50-05

OCP version:
4.10.9

Browser Info:
Firefox

Steps to reproduce:
1. Create/import managed clusters on 2.4, upgrade to 2.5
2. Create/import more managed clusters on 2.5
3. Go to prometheus, execute the query acm_managed_cluster_info
4. observe no results found

Actual results:
prometheus query acm_managed_cluster_info returns Empty query result

Expected results:

Additional info:
We have another environment with fresh install of ACM 2.5.0-DOWNSTREAM-2022-04-20-06-50-05 where query is returning result as expected

Comment 1 bot-tracker-sync 2022-04-21 05:04:55 UTC
G2Bsync 1104649627 comment 
 haoqing0110 Thu, 21 Apr 2022 02:45:11 UTC 
 G2Bsync

In the env, I can curl the metrics successfully inside the clusterlifecycle-state-metrics pod. 
```
✗ oc get pods -A | grep clusterlifecycle-state-metrics
multicluster-engine                                clusterlifecycle-state-metrics-v2-778b7bd6dd-bzjfm                1/1     Running                  0             10h

✗ oc exec -n multicluster-engine clusterlifecycle-state-metrics-v2-778b7bd6dd-bzjfm -it sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
sh-4.4$ curl http://localhost:8080/metrics
# HELP acm_managed_cluster_info Managed cluster information
# TYPE acm_managed_cluster_info gauge
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-aks-a243-import-before-upgrade-b",vendor="AKS",cloud="Azure",version="v1.21.9",available="Unknown",created_via="Other",core_worker="0",socket_worker="0"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="5e9196b6-d2ac-4602-958a-4afb183e4941",vendor="OpenShift",cloud="Amazon",version="4.9.25",available="True",created_via="Hive",core_worker="12",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-gke-a243-import-before-upgrade",vendor="GKE",cloud="Google",version="v1.21.9-gke.1002",available="True",created_via="Other",core_worker="0",socket_worker="0"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="d27fb041-5a9c-41ff-bdde-dc0175a521c8",vendor="OpenShift",cloud="Amazon",version="4.10.6",available="True",created_via="Other",core_worker="16",socket_worker="4"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="59b619ec-617b-4cc8-8ac3-00ee11082ea5",vendor="OpenShift",cloud="IBM",version="4.9.25",available="True",created_via="Other",core_worker="12",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="dc8626c5-127d-45c7-882d-7c635cd1fee1",vendor="OpenShift",cloud="RHV",version="4.10.3",available="True",created_via="Other",core_worker="12",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-eks-a243-import-before-upgrade",vendor="EKS",cloud="Amazon",version="v1.21.5-eks-bc4871b",available="True",created_via="Other",core_worker="0",socket_worker="0"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="9619487d-8d54-43e3-9e5c-fa2bf35a63d0",vendor="OpenShift",cloud="Google",version="4.10.3",available="True",created_via="Hive",core_worker="12",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="f0c5915d-4c31-4cc1-b999-c56ba6a1578c",vendor="OpenShift",cloud="Azure",version="4.10.3",available="True",created_via="Hive",core_worker="6",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-ocp311-a243-import-before-upgrade",vendor="OpenShift",cloud="Amazon",version="3",available="True",created_via="Other",core_worker="0",socket_worker="0"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="342598e6-c98f-482a-8b85-5b526a11218d",vendor="OpenShift",cloud="Amazon",version="4.7.18",available="True",created_via="Other",core_worker="24",socket_worker="6"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",vendor="OpenShift",cloud="Amazon",version="4.10.9",available="True",created_via="Other",core_worker="24",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="22054b74-8fc2-42ca-a3e9-654a9d6121c8",vendor="OpenShift",cloud="Azure",version="4.10.3",available="True",created_via="Hive",core_worker="6",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="dc25a4ad-9e68-4331-8a28-a154377c60da",vendor="OpenShift",cloud="vSphere",version="4.10.3",available="True",created_via="Hive",core_worker="12",socket_worker="6"} 1
```
while it is not shown in prometheus.The data upload depends on ServiceMonitor https://github.com/stolostron/backplane-operator/blob/main/pkg/templates/charts/toggle/cluster-lifecycle/templates/metrics-servicemonitor.yaml. 
Check the ServiceMonitor resource and I don't see it in the upgrade env.
```
 ✗ oc get servicemonitors.monitoring.coreos.com -A | grep clusterlifecycle
 ✗ oc get servicemonitors.monitoring.coreos.com -A | grep openshift-monitoring
openshift-monitoring                         acm-insights                                    6d10h
openshift-monitoring                         alertmanager-main                               7d3h
openshift-monitoring                         cluster-monitoring-operator                     7d4h
openshift-monitoring                         etcd                                            7d4h
openshift-monitoring                         grafana                                         7d4h
openshift-monitoring                         kube-state-metrics                              7d4h
openshift-monitoring                         kubelet                                         7d4h
openshift-monitoring                         node-exporter                                   7d4h
openshift-monitoring                         observability-observatorium-api                 6d9h
openshift-monitoring                         observability-thanos-compact                    6d9h
openshift-monitoring                         observability-thanos-query                      6d9h
openshift-monitoring                         observability-thanos-query-frontend             6d9h
openshift-monitoring                         observability-thanos-query-frontend-memcached   6d9h
openshift-monitoring                         observability-thanos-receive                    6d9h
openshift-monitoring                         observability-thanos-receive-controller         6d9h
openshift-monitoring                         observability-thanos-rule                       6d9h
openshift-monitoring                         observability-thanos-store-memcached            6d9h
openshift-monitoring                         observability-thanos-store-shard                6d9h
openshift-monitoring                         ocm-grc-688c4-policy-propagator-metrics         6d10h
openshift-monitoring                         openshift-state-metrics                         7d4h
openshift-monitoring                         prometheus-adapter                              7d4h
openshift-monitoring                         prometheus-k8s                                  7d4h
openshift-monitoring                         prometheus-operator                             7d4h
openshift-monitoring                         telemeter-client                                7d4h
openshift-monitoring                         thanos-querier                                  7d4h
openshift-monitoring                         thanos-sidecar                                  7d4h
```

Comment 2 dhuynh 2022-05-04 03:01:12 UTC
No longer seeing this after upgrade from 2.4.3-DOWNSTREAM-2022-04-13-07-05-00  → 2.5.0-DOWNSTREAM-2022-05-02-16-00-32

Comment 5 errata-xmlrpc 2022-06-09 02:10:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4956


Note You need to log in before you can comment on or make changes to this bug.