Bug 2100860 - Users can't silence alerts from the dev console when dedicated UWM Alertmanager is deployed
Summary: Users can't silence alerts from the dev console when dedicated UWM Alertmanag...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.12.0
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
Brian Burt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-24 13:42 UTC by Simon Pasquier
Modified: 2023-01-17 19:50 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:50:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1690 0 None open Bug 2100860: Pass user-defined Alertmanager service in shared configmap 2022-07-05 08:46:30 UTC
Github openshift console-operator pull 658 0 None open Bug 2100860: Retrieve user-defined Alertmanager services from shared configmap 2022-07-05 08:46:32 UTC
Github openshift console pull 11712 0 None open Bug 2100860: Use Alertmanager services for user-defined alerts from config 2022-07-05 08:46:33 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:50:59 UTC

Description Simon Pasquier 2022-06-24 13:42:50 UTC
Description of problem:


Version-Release number of selected component (if applicable):
4.11

How reproducible:
Always

Steps to Reproduce:
1. Enable UWM + dedicated UWM Alertmanager
2. Deploy an application + service monitor + alerting rule which fires always
3. Go to the OCP dev console and silence the alert.

Actual results:
Nothing happens

Expected results:
The alert notification is muted.


Additional info:

Comment 6 Junqi Zhao 2022-09-19 07:37:45 UTC
tested with the following PRs(openshift/console-operator#658 is merged, fix is already in the launched cluster-bot cluster)
openshift/cluster-monitoring-operator#1690
openshift/console#11712

and follow the steps in Comment 0, user project alert could be silenced from developer console, but the user project alert status is still Firing, not Silenced in administrator console, see the attached picture, maybe this should be fixed in console side(the console API does not consider the dedicate alertmanger, would file another bug after PRs in this bug merged to payload), for this bug, I think the issue is fixed and we could close it when the fixes merged to payload
# oc -n openshift-config-managed get cm monitoring-shared-config -oyaml
apiVersion: v1
data:
  alertmanagerPublicURL: https://alertmanager-main-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com
  alertmanagerTenancyHost: alertmanager-user-workload.openshift-user-workload-monitoring.svc:9092
  alertmanagerUserWorkloadHost: alertmanager-user-workload.openshift-user-workload-monitoring.svc:9094
  prometheusPublicURL: https://prometheus-k8s-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com
  thanosPublicURL: https://thanos-querier-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com
kind: ConfigMap
metadata:
  creationTimestamp: "2022-09-19T03:20:49Z"
  name: monitoring-shared-config
  namespace: openshift-config-managed
  resourceVersion: "41651"
  uid: c7e44118-230e-486e-82c8-570c05db3d79

# oc -n openshift-user-workload-monitoring get pod
NAME                                  READY   STATUS    RESTARTS      AGE
alertmanager-user-workload-0          6/6     Running   1 (25m ago)   25m
alertmanager-user-workload-1          6/6     Running   1 (25m ago)   25m
prometheus-operator-68458fb66-7pdtm   2/2     Running   0             26m
prometheus-user-workload-0            6/6     Running   0             26m
prometheus-user-workload-1            6/6     Running   0             26m
thanos-ruler-user-workload-0          3/3     Running   0             25m
thanos-ruler-user-workload-1          3/3     Running   0             25m

the user project alert TestAlert is in dedicated alertmanager, and the alert is silenced, state is suppressed
# token=`oc create token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-user-workload.openshift-user-workload-monitoring.svc:9095/api/v2/alerts' | jq
[
  {
    "annotations": {
      "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
    },
    "endsAt": "2022-09-19T04:19:27.799Z",
    "fingerprint": "348490d73f8513a0",
    "receivers": [
      {
        "name": "Default"
      }
    ],
    "startsAt": "2022-09-19T03:52:42.799Z",
    "status": {
      "inhibitedBy": [],
      "silencedBy": [
        "2cb3e008-c2c6-4ca2-9c3a-8b1e5a53e984"
      ],
      "state": "suppressed"
    },
    "updatedAt": "2022-09-19T04:15:27.825Z",
    "generatorURL": "https://thanos-querier-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com/api/graph?g0.expr=vector%281%29&g0.tab=1",
    "labels": {
      "alertname": "TestAlert",
      "namespace": "ns1",
      "severity": "none"
    }
  }
]

the user project alert TestAlert is not found in platform alertmanager, this is expected
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts' | jq '.[] | {alertname: .labels.alertname, state: .status.state}'
{
  "alertname": "Watchdog",
  "state": "active"
}
{
  "alertname": "AlertmanagerReceiversNotConfigured",
  "state": "active"
}


from thanos-querier, the user project alert TestAlert is still firing, this is also expected although the TestAlert is silenced
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS' | jq '.data.result[].metric | {alertname, alertstate}'
{
  "alertname": "AlertmanagerReceiversNotConfigured",
  "alertstate": "firing"
}
{
  "alertname": "TestAlert",
  "alertstate": "firing"
}
{
  "alertname": "Watchdog",
  "alertstate": "firing"
}


# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules' | jq '.data.groups[].rules[] | select(.state=="firing")'
...
{
  "state": "firing",
  "name": "TestAlert",
  "query": "vector(1)",
  "duration": 0,
  "labels": {
    "namespace": "ns1",
    "severity": "none"
  },
  "annotations": {
    "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
  },
  "alerts": [
    {
      "labels": {
        "alertname": "TestAlert",
        "namespace": "ns1",
        "severity": "none"
      },
      "annotations": {
        "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
      },
      "state": "firing",
      "activeAt": "2022-09-19T03:52:57.799592483Z",
      "value": "1e+00",
      "partialResponseStrategy": "ABORT"
    }
  ],
  "health": "ok",
  "evaluationTime": 0.000916317,
  "lastEvaluation": "2022-09-19T06:01:27.806644449Z",
  "type": "alerting"
}

Comment 10 Simon Pasquier 2022-09-23 14:16:32 UTC
>  user project alert could be silenced from developer console, but the user project alert status is still Firing, not Silenced in administrator console, see the attached picture, maybe this should be fixed in console side(the console API does not consider the dedicate alertmanger, would file another bug after PRs in this bug merged to payload), for this bug, I think the issue is fixed and we could close it when the fixes merged to payload

Thanks for pointing out the edge case. I agree with you that it needs to be a new bug.

Comment 13 Junqi Zhao 2022-09-27 07:00:37 UTC
tested with 4.12.0-0.nightly-2022-09-26-111919, same result as Comment 6, user project alert could be silenced from developer console, but the user project alert status is still Firing, not Silenced in administrator console, this is tracked in https://issues.redhat.com/browse/OCPBUGS-1738.

Comment 15 Simon Pasquier 2022-12-14 09:07:19 UTC
This bug doesn't need to be mentioned in the release notes because it's already been fixed in 4.11.z.

Comment 17 errata-xmlrpc 2023-01-17 19:50:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.