Description of the problem: After upgrading from 2.4 to 2.5 where we have managed clusters installed prior to upgrade, the prometheus query acm_managed_cluster_info does not return any results. Release version: 2.4.3-DOWNSTREAM-2022-04-13-07-05-00 → 2.5.0-DOWNSTREAM-2022-04-20-06-50-05 OCP version: 4.10.9 Browser Info: Firefox Steps to reproduce: 1. Create/import managed clusters on 2.4, upgrade to 2.5 2. Create/import more managed clusters on 2.5 3. Go to prometheus, execute the query acm_managed_cluster_info 4. observe no results found Actual results: prometheus query acm_managed_cluster_info returns Empty query result Expected results: Additional info: We have another environment with fresh install of ACM 2.5.0-DOWNSTREAM-2022-04-20-06-50-05 where query is returning result as expected
G2Bsync 1104649627 comment haoqing0110 Thu, 21 Apr 2022 02:45:11 UTC G2Bsync In the env, I can curl the metrics successfully inside the clusterlifecycle-state-metrics pod. ``` ✗ oc get pods -A | grep clusterlifecycle-state-metrics multicluster-engine clusterlifecycle-state-metrics-v2-778b7bd6dd-bzjfm 1/1 Running 0 10h ✗ oc exec -n multicluster-engine clusterlifecycle-state-metrics-v2-778b7bd6dd-bzjfm -it sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. sh-4.4$ curl http://localhost:8080/metrics # HELP acm_managed_cluster_info Managed cluster information # TYPE acm_managed_cluster_info gauge acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-aks-a243-import-before-upgrade-b",vendor="AKS",cloud="Azure",version="v1.21.9",available="Unknown",created_via="Other",core_worker="0",socket_worker="0"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="5e9196b6-d2ac-4602-958a-4afb183e4941",vendor="OpenShift",cloud="Amazon",version="4.9.25",available="True",created_via="Hive",core_worker="12",socket_worker="3"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-gke-a243-import-before-upgrade",vendor="GKE",cloud="Google",version="v1.21.9-gke.1002",available="True",created_via="Other",core_worker="0",socket_worker="0"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="d27fb041-5a9c-41ff-bdde-dc0175a521c8",vendor="OpenShift",cloud="Amazon",version="4.10.6",available="True",created_via="Other",core_worker="16",socket_worker="4"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="59b619ec-617b-4cc8-8ac3-00ee11082ea5",vendor="OpenShift",cloud="IBM",version="4.9.25",available="True",created_via="Other",core_worker="12",socket_worker="3"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="dc8626c5-127d-45c7-882d-7c635cd1fee1",vendor="OpenShift",cloud="RHV",version="4.10.3",available="True",created_via="Other",core_worker="12",socket_worker="3"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-eks-a243-import-before-upgrade",vendor="EKS",cloud="Amazon",version="v1.21.5-eks-bc4871b",available="True",created_via="Other",core_worker="0",socket_worker="0"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="9619487d-8d54-43e3-9e5c-fa2bf35a63d0",vendor="OpenShift",cloud="Google",version="4.10.3",available="True",created_via="Hive",core_worker="12",socket_worker="3"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="f0c5915d-4c31-4cc1-b999-c56ba6a1578c",vendor="OpenShift",cloud="Azure",version="4.10.3",available="True",created_via="Hive",core_worker="6",socket_worker="3"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="clc-nap-ocp311-a243-import-before-upgrade",vendor="OpenShift",cloud="Amazon",version="3",available="True",created_via="Other",core_worker="0",socket_worker="0"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="342598e6-c98f-482a-8b85-5b526a11218d",vendor="OpenShift",cloud="Amazon",version="4.7.18",available="True",created_via="Other",core_worker="24",socket_worker="6"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",vendor="OpenShift",cloud="Amazon",version="4.10.9",available="True",created_via="Other",core_worker="24",socket_worker="3"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="22054b74-8fc2-42ca-a3e9-654a9d6121c8",vendor="OpenShift",cloud="Azure",version="4.10.3",available="True",created_via="Hive",core_worker="6",socket_worker="3"} 1 acm_managed_cluster_info{hub_cluster_id="41407702-8f44-47b8-a1b8-1088d1a85023",managed_cluster_id="dc25a4ad-9e68-4331-8a28-a154377c60da",vendor="OpenShift",cloud="vSphere",version="4.10.3",available="True",created_via="Hive",core_worker="12",socket_worker="6"} 1 ``` while it is not shown in prometheus.The data upload depends on ServiceMonitor https://github.com/stolostron/backplane-operator/blob/main/pkg/templates/charts/toggle/cluster-lifecycle/templates/metrics-servicemonitor.yaml. Check the ServiceMonitor resource and I don't see it in the upgrade env. ``` ✗ oc get servicemonitors.monitoring.coreos.com -A | grep clusterlifecycle ✗ oc get servicemonitors.monitoring.coreos.com -A | grep openshift-monitoring openshift-monitoring acm-insights 6d10h openshift-monitoring alertmanager-main 7d3h openshift-monitoring cluster-monitoring-operator 7d4h openshift-monitoring etcd 7d4h openshift-monitoring grafana 7d4h openshift-monitoring kube-state-metrics 7d4h openshift-monitoring kubelet 7d4h openshift-monitoring node-exporter 7d4h openshift-monitoring observability-observatorium-api 6d9h openshift-monitoring observability-thanos-compact 6d9h openshift-monitoring observability-thanos-query 6d9h openshift-monitoring observability-thanos-query-frontend 6d9h openshift-monitoring observability-thanos-query-frontend-memcached 6d9h openshift-monitoring observability-thanos-receive 6d9h openshift-monitoring observability-thanos-receive-controller 6d9h openshift-monitoring observability-thanos-rule 6d9h openshift-monitoring observability-thanos-store-memcached 6d9h openshift-monitoring observability-thanos-store-shard 6d9h openshift-monitoring ocm-grc-688c4-policy-propagator-metrics 6d10h openshift-monitoring openshift-state-metrics 7d4h openshift-monitoring prometheus-adapter 7d4h openshift-monitoring prometheus-k8s 7d4h openshift-monitoring prometheus-operator 7d4h openshift-monitoring telemeter-client 7d4h openshift-monitoring thanos-querier 7d4h openshift-monitoring thanos-sidecar 7d4h ```
No longer seeing this after upgrade from 2.4.3-DOWNSTREAM-2022-04-13-07-05-00 → 2.5.0-DOWNSTREAM-2022-05-02-16-00-32
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4956