Bug 2077291

Summary: Prometheus doesn't display acm_managed_cluster_info after upgrade from 2.4 to 2.5
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: dhuynh
Component: Cluster LifecycleAssignee: Jian Qiu <jqiu>
Status: CLOSED ERRATA QA Contact: Hui Chen <huichen>
Severity: urgent Docs Contact: Christopher Dawson <cdawson>
Priority: urgent    
Version: rhacm-2.5CC: dhuynh, jagray, yuhe
Target Milestone: ---Keywords: Regression, TestBlocker
Target Release: rhacm-2.5Flags: bot-tracker-sync: rhacm-2.5+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-09 02:10:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description dhuynh 2022-04-20 22:08:17 UTC
Description of the problem:
After upgrading from 2.4 to 2.5 where we have managed clusters installed prior to upgrade, the prometheus query acm_managed_cluster_info does not return any results.

Release version:
2.4.3-DOWNSTREAM-2022-04-13-07-05-00  →  2.5.0-DOWNSTREAM-2022-04-20-06-50-05

OCP version:
4.10.9

Browser Info:
Firefox

Steps to reproduce:
1. Create/import managed clusters on 2.4, upgrade to 2.5
2. Create/import more managed clusters on 2.5
3. Go to prometheus, execute the query acm_managed_cluster_info
4. observe no results found

Actual results:
prometheus query acm_managed_cluster_info returns Empty query result

Expected results:

Additional info:
We have another environment with fresh install of ACM 2.5.0-DOWNSTREAM-2022-04-20-06-50-05 where query is returning result as expected

Comment 1 bot-tracker-sync 2022-04-21 05:04:55 UTC
G2Bsync 1104649627 comment 
 haoqing0110 Thu, 21 Apr 2022 02:45:11 UTC 
 G2Bsync

In the env, I can curl the metrics successfully inside the clusterlifecycle-state-metrics pod. 
```
✗ oc get pods -A | grep clusterlifecycle-state-metrics
multicluster-engine                                clusterlifecycle-state-metrics-v2-778b7bd6dd-bzjfm                1/1     Running                  0             10h

✗ oc exec -n multicluster-engine clusterlifecycle-state-metrics-v2-778b7bd6dd-bzjfm -it sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
sh-4.4$ curl http://localhost:8080/metrics
# HELP acm_managed_cluster_info Managed cluster information
# TYPE acm_managed_cluster_info gauge
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-aks-a243-import-before-upgrade-b",vendor="AKS",cloud="Azure",version="v1.21.9",available="Unknown",created_via="Other",core_worker="0",socket_worker="0"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="5e9196b6-d2ac-4602-958a-4afb183e4941",vendor="OpenShift",cloud="Amazon",version="4.9.25",available="True",created_via="Hive",core_worker="12",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-gke-a243-import-before-upgrade",vendor="GKE",cloud="Google",version="v1.21.9-gke.1002",available="True",created_via="Other",core_worker="0",socket_worker="0"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="d27fb041-5a9c-41ff-bdde-dc0175a521c8",vendor="OpenShift",cloud="Amazon",version="4.10.6",available="True",created_via="Other",core_worker="16",socket_worker="4"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="59b619ec-617b-4cc8-8ac3-00ee11082ea5",vendor="OpenShift",cloud="IBM",version="4.9.25",available="True",created_via="Other",core_worker="12",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="dc8626c5-127d-45c7-882d-7c635cd1fee1",vendor="OpenShift",cloud="RHV",version="4.10.3",available="True",created_via="Other",core_worker="12",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-eks-a243-import-before-upgrade",vendor="EKS",cloud="Amazon",version="v1.21.5-eks-bc4871b",available="True",created_via="Other",core_worker="0",socket_worker="0"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="9619487d-8d54-43e3-9e5c-fa2bf35a63d0",vendor="OpenShift",cloud="Google",version="4.10.3",available="True",created_via="Hive",core_worker="12",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="f0c5915d-4c31-4cc1-b999-c56ba6a1578c",vendor="OpenShift",cloud="Azure",version="4.10.3",available="True",created_via="Hive",core_worker="6",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-ocp311-a243-import-before-upgrade",vendor="OpenShift",cloud="Amazon",version="3",available="True",created_via="Other",core_worker="0",socket_worker="0"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="342598e6-c98f-482a-8b85-5b526a11218d",vendor="OpenShift",cloud="Amazon",version="4.7.18",available="True",created_via="Other",core_worker="24",socket_worker="6"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",vendor="OpenShift",cloud="Amazon",version="4.10.9",available="True",created_via="Other",core_worker="24",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="22054b74-8fc2-42ca-a3e9-654a9d6121c8",vendor="OpenShift",cloud="Azure",version="4.10.3",available="True",created_via="Hive",core_worker="6",socket_worker="3"} 1
acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="dc25a4ad-9e68-4331-8a28-a154377c60da",vendor="OpenShift",cloud="vSphere",version="4.10.3",available="True",created_via="Hive",core_worker="12",socket_worker="6"} 1
```
while it is not shown in prometheus.The data upload depends on ServiceMonitor https://github.com/stolostron/backplane-operator/blob/main/pkg/templates/charts/toggle/cluster-lifecycle/templates/metrics-servicemonitor.yaml. 
Check the ServiceMonitor resource and I don't see it in the upgrade env.
```
 ✗ oc get servicemonitors.monitoring.coreos.com -A | grep clusterlifecycle
 ✗ oc get servicemonitors.monitoring.coreos.com -A | grep openshift-monitoring
openshift-monitoring                         acm-insights                                    6d10h
openshift-monitoring                         alertmanager-main                               7d3h
openshift-monitoring                         cluster-monitoring-operator                     7d4h
openshift-monitoring                         etcd                                            7d4h
openshift-monitoring                         grafana                                         7d4h
openshift-monitoring                         kube-state-metrics                              7d4h
openshift-monitoring                         kubelet                                         7d4h
openshift-monitoring                         node-exporter                                   7d4h
openshift-monitoring                         observability-observatorium-api                 6d9h
openshift-monitoring                         observability-thanos-compact                    6d9h
openshift-monitoring                         observability-thanos-query                      6d9h
openshift-monitoring                         observability-thanos-query-frontend             6d9h
openshift-monitoring                         observability-thanos-query-frontend-memcached   6d9h
openshift-monitoring                         observability-thanos-receive                    6d9h
openshift-monitoring                         observability-thanos-receive-controller         6d9h
openshift-monitoring                         observability-thanos-rule                       6d9h
openshift-monitoring                         observability-thanos-store-memcached            6d9h
openshift-monitoring                         observability-thanos-store-shard                6d9h
openshift-monitoring                         ocm-grc-688c4-policy-propagator-metrics         6d10h
openshift-monitoring                         openshift-state-metrics                         7d4h
openshift-monitoring                         prometheus-adapter                              7d4h
openshift-monitoring                         prometheus-k8s                                  7d4h
openshift-monitoring                         prometheus-operator                             7d4h
openshift-monitoring                         telemeter-client                                7d4h
openshift-monitoring                         thanos-querier                                  7d4h
openshift-monitoring                         thanos-sidecar                                  7d4h
```

Comment 2 dhuynh 2022-05-04 03:01:12 UTC
No longer seeing this after upgrade from 2.4.3-DOWNSTREAM-2022-04-13-07-05-00  → 2.5.0-DOWNSTREAM-2022-05-02-16-00-32

Comment 5 errata-xmlrpc 2022-06-09 02:10:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4956