Description of problem: MON-2552 PR https://github.com/openshift/cluster-monitoring-operator/pull/1675 is in payload 4.11.0-0.nightly-2022-06-22-190830. TechPreview feature is not enabled, but find "failed to list *v1alpha1.AlertingRule: alertingrules.monitoring.openshift.io is forbidden" in cmo logs # oc get featuregate cluster -oyaml apiVersion: config.openshift.io/v1 kind: FeatureGate metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" release.openshift.io/create-only: "true" creationTimestamp: "2022-06-23T00:39:48Z" generation: 1 name: cluster ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version uid: a158e319-e88f-457f-b270-5b67f1b8c18c resourceVersion: "1321" uid: 31692f28-0688-4ba7-be51-0f4b1b83fff6 spec: {} # oc -n openshift-monitoring get pod NAME READY STATUS RESTARTS AGE alertmanager-main-0 6/6 Running 0 12h alertmanager-main-1 6/6 Running 0 12h cluster-monitoring-operator-b87447c68-b5bd6 2/2 Running 0 12h kube-state-metrics-5475455998-n65nv 3/3 Running 0 12h node-exporter-c8x9p 2/2 Running 0 12h node-exporter-fkbfr 2/2 Running 0 12h node-exporter-gjpkb 2/2 Running 0 12h node-exporter-kwgk6 2/2 Running 0 12h node-exporter-kzt9q 2/2 Running 0 12h node-exporter-vrl7p 2/2 Running 0 12h openshift-state-metrics-8679b4d578-vpd4w 3/3 Running 0 12h prometheus-adapter-5f4bcc7778-pxplf 1/1 Running 0 26m prometheus-adapter-5f4bcc7778-srxrl 1/1 Running 0 26m prometheus-k8s-0 6/6 Running 0 12h prometheus-k8s-1 6/6 Running 0 12h prometheus-operator-658d9c456f-rp948 2/2 Running 0 12h prometheus-operator-admission-webhook-74f7bb977f-5vb2b 1/1 Running 0 12h prometheus-operator-admission-webhook-74f7bb977f-thbrw 1/1 Running 0 12h telemeter-client-9587b6dc7-k4crh 3/3 Running 0 12h thanos-querier-85f555d468-lr6rq 6/6 Running 0 7h6m thanos-querier-85f555d468-vs7jq 6/6 Running 0 7h6m $ oc -n openshift-monitoring logs -c cluster-monitoring-operator cluster-monitoring-operator-b87447c68-b5bd6 | grep "alertingrules.monitoring.openshift.io is forbidden" W0623 12:31:12.562641 1 reflector.go:324] github.com/openshift/cluster-monitoring-operator/pkg/alert/rule_controller.go:113: failed to list *v1alpha1.AlertingRule: alertingrules.monitoring.openshift.io is forbidden: User "system:serviceaccount:openshift-monitoring:cluster-monitoring-operator" cannot list resource "alertingrules" in API group "monitoring.openshift.io" in the namespace "openshift-monitoring" E0623 12:31:12.562668 1 reflector.go:138] github.com/openshift/cluster-monitoring-operator/pkg/alert/rule_controller.go:113: Failed to watch *v1alpha1.AlertingRule: failed to list *v1alpha1.AlertingRule: alertingrules.monitoring.openshift.io is forbidden: User "system:serviceaccount:openshift-monitoring:cluster-monitoring-operator" cannot list resource "alertingrules" in API group "monitoring.openshift.io" in the namespace "openshift-monitoring" W0623 12:32:11.310486 1 reflector.go:324] github.com/openshift/cluster-monitoring-operator/pkg/alert/rule_controller.go:113: failed to list *v1alpha1.AlertingRule: alertingrules.monitoring.openshift.io is forbidden: User "system:serviceaccount:openshift-monitoring:cluster-monitoring-operator" cannot list resource "alertingrules" in API group "monitoring.openshift.io" in the namespace "openshift-monitoring" .... # oc -n openshift-monitoring logs -c cluster-monitoring-operator cluster-monitoring-operator-b87447c68-b5bd6 | grep "alertingrules.monitoring.openshift.io is forbidden" | wc -l 1988 $ oc explain AlertingRule the server doesn't have a resource type "AlertingRule" $ oc get crd | grep -i monitoring alertmanagerconfigs.monitoring.coreos.com 2022-06-22T07:46:48Z alertmanagers.monitoring.coreos.com 2022-06-22T07:46:51Z podmonitors.monitoring.coreos.com 2022-06-22T07:46:53Z probes.monitoring.coreos.com 2022-06-22T07:46:55Z prometheuses.monitoring.coreos.com 2022-06-22T07:46:58Z prometheusrules.monitoring.coreos.com 2022-06-22T07:47:00Z servicemonitors.monitoring.coreos.com 2022-06-22T07:47:02Z thanosrulers.monitoring.coreos.com 2022-06-22T07:47:04Z Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-22-190830 True False 12h Cluster version is 4.11.0-0.nightly-2022-06-22-190830 How reproducible: always Steps to Reproduce: 1. see the description 2. 3. Actual results: "failed to list *v1alpha1.AlertingRule: alertingrules.monitoring.openshift.io is forbidden" in cmo logs Expected results: Additional info:
The fix introduces a regression: platform alerts aren't labeled anymore with openshift_io_alert_source="platform". It is being reverted in https://github.com/openshift/cluster-monitoring-operator/pull/1706.
The previous PR was a revert, the issue is still present.
tested with 4.12.0-0.nightly-2022-07-05-225149, TechPreview feature is not enabled, no errors for alertingrules/alertrelabelconfigs # oc get featuregate cluster -oyaml ... spec: {} # oc -n openshift-monitoring logs -c cluster-monitoring-operator cluster-monitoring-operator-7b6dc644c5-p45jk | grep "alertingrules.monitoring.openshift.io is forbidden" no result # oc -n openshift-monitoring logs -c cluster-monitoring-operator cluster-monitoring-operator-7b6dc644c5-p45jk | grep "alertrelabelconfigs.monitoring.openshift.io is forbidden" no result and no regression issue like comment 5 # token=`oc create token prometheus-k8s -n openshift-monitoring` # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts' | jq ... "labels": { "alertname": "Watchdog", "namespace": "openshift-monitoring", "openshift_io_alert_source": "platform", "prometheus": "openshift-monitoring/k8s", "severity": "none" } }, ... "labels": { "alertname": "AlertmanagerReceiversNotConfigured", "namespace": "openshift-monitoring", "openshift_io_alert_source": "platform", "prometheus": "openshift-monitoring/k8s", "severity": "warning" } } ]
*** Bug 2103033 has been marked as a duplicate of this bug. ***
Removing the requires_doc_text because the bug fix has been backported to 4.11.z already (see bug 2103127).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399