Bug 2100860
Summary: | Users can't silence alerts from the dev console when dedicated UWM Alertmanager is deployed | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Simon Pasquier <spasquie> |
Component: | Monitoring | Assignee: | Simon Pasquier <spasquie> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | medium | Docs Contact: | Brian Burt <bburt> |
Priority: | high | ||
Version: | 4.11 | CC: | anpicker, bburt, hasun |
Target Milestone: | --- | ||
Target Release: | 4.12.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-01-17 19:50:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Simon Pasquier
2022-06-24 13:42:50 UTC
tested with the following PRs(openshift/console-operator#658 is merged, fix is already in the launched cluster-bot cluster) openshift/cluster-monitoring-operator#1690 openshift/console#11712 and follow the steps in Comment 0, user project alert could be silenced from developer console, but the user project alert status is still Firing, not Silenced in administrator console, see the attached picture, maybe this should be fixed in console side(the console API does not consider the dedicate alertmanger, would file another bug after PRs in this bug merged to payload), for this bug, I think the issue is fixed and we could close it when the fixes merged to payload # oc -n openshift-config-managed get cm monitoring-shared-config -oyaml apiVersion: v1 data: alertmanagerPublicURL: https://alertmanager-main-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com alertmanagerTenancyHost: alertmanager-user-workload.openshift-user-workload-monitoring.svc:9092 alertmanagerUserWorkloadHost: alertmanager-user-workload.openshift-user-workload-monitoring.svc:9094 prometheusPublicURL: https://prometheus-k8s-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com thanosPublicURL: https://thanos-querier-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com kind: ConfigMap metadata: creationTimestamp: "2022-09-19T03:20:49Z" name: monitoring-shared-config namespace: openshift-config-managed resourceVersion: "41651" uid: c7e44118-230e-486e-82c8-570c05db3d79 # oc -n openshift-user-workload-monitoring get pod NAME READY STATUS RESTARTS AGE alertmanager-user-workload-0 6/6 Running 1 (25m ago) 25m alertmanager-user-workload-1 6/6 Running 1 (25m ago) 25m prometheus-operator-68458fb66-7pdtm 2/2 Running 0 26m prometheus-user-workload-0 6/6 Running 0 26m prometheus-user-workload-1 6/6 Running 0 26m thanos-ruler-user-workload-0 3/3 Running 0 25m thanos-ruler-user-workload-1 3/3 Running 0 25m the user project alert TestAlert is in dedicated alertmanager, and the alert is silenced, state is suppressed # token=`oc create token prometheus-k8s -n openshift-monitoring` # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-user-workload.openshift-user-workload-monitoring.svc:9095/api/v2/alerts' | jq [ { "annotations": { "message": "This is an alert meant to ensure that the entire alerting pipeline is functional." }, "endsAt": "2022-09-19T04:19:27.799Z", "fingerprint": "348490d73f8513a0", "receivers": [ { "name": "Default" } ], "startsAt": "2022-09-19T03:52:42.799Z", "status": { "inhibitedBy": [], "silencedBy": [ "2cb3e008-c2c6-4ca2-9c3a-8b1e5a53e984" ], "state": "suppressed" }, "updatedAt": "2022-09-19T04:15:27.825Z", "generatorURL": "https://thanos-querier-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com/api/graph?g0.expr=vector%281%29&g0.tab=1", "labels": { "alertname": "TestAlert", "namespace": "ns1", "severity": "none" } } ] the user project alert TestAlert is not found in platform alertmanager, this is expected # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts' | jq '.[] | {alertname: .labels.alertname, state: .status.state}' { "alertname": "Watchdog", "state": "active" } { "alertname": "AlertmanagerReceiversNotConfigured", "state": "active" } from thanos-querier, the user project alert TestAlert is still firing, this is also expected although the TestAlert is silenced # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS' | jq '.data.result[].metric | {alertname, alertstate}' { "alertname": "AlertmanagerReceiversNotConfigured", "alertstate": "firing" } { "alertname": "TestAlert", "alertstate": "firing" } { "alertname": "Watchdog", "alertstate": "firing" } # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules' | jq '.data.groups[].rules[] | select(.state=="firing")' ... { "state": "firing", "name": "TestAlert", "query": "vector(1)", "duration": 0, "labels": { "namespace": "ns1", "severity": "none" }, "annotations": { "message": "This is an alert meant to ensure that the entire alerting pipeline is functional." }, "alerts": [ { "labels": { "alertname": "TestAlert", "namespace": "ns1", "severity": "none" }, "annotations": { "message": "This is an alert meant to ensure that the entire alerting pipeline is functional." }, "state": "firing", "activeAt": "2022-09-19T03:52:57.799592483Z", "value": "1e+00", "partialResponseStrategy": "ABORT" } ], "health": "ok", "evaluationTime": 0.000916317, "lastEvaluation": "2022-09-19T06:01:27.806644449Z", "type": "alerting" } > user project alert could be silenced from developer console, but the user project alert status is still Firing, not Silenced in administrator console, see the attached picture, maybe this should be fixed in console side(the console API does not consider the dedicate alertmanger, would file another bug after PRs in this bug merged to payload), for this bug, I think the issue is fixed and we could close it when the fixes merged to payload
Thanks for pointing out the edge case. I agree with you that it needs to be a new bug.
tested with 4.12.0-0.nightly-2022-09-26-111919, same result as Comment 6, user project alert could be silenced from developer console, but the user project alert status is still Firing, not Silenced in administrator console, this is tracked in https://issues.redhat.com/browse/OCPBUGS-1738. This bug doesn't need to be mentioned in the release notes because it's already been fixed in 4.11.z. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |