Bug 2100860

Summary: Users can't silence alerts from the dev console when dedicated UWM Alertmanager is deployed
Product: OpenShift Container Platform Reporter: Simon Pasquier <spasquie>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact: Brian Burt <bburt>
Priority: high    
Version: 4.11CC: anpicker, bburt, hasun
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:50:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Simon Pasquier 2022-06-24 13:42:50 UTC
Description of problem:


Version-Release number of selected component (if applicable):
4.11

How reproducible:
Always

Steps to Reproduce:
1. Enable UWM + dedicated UWM Alertmanager
2. Deploy an application + service monitor + alerting rule which fires always
3. Go to the OCP dev console and silence the alert.

Actual results:
Nothing happens

Expected results:
The alert notification is muted.


Additional info:

Comment 6 Junqi Zhao 2022-09-19 07:37:45 UTC
tested with the following PRs(openshift/console-operator#658 is merged, fix is already in the launched cluster-bot cluster)
openshift/cluster-monitoring-operator#1690
openshift/console#11712

and follow the steps in Comment 0, user project alert could be silenced from developer console, but the user project alert status is still Firing, not Silenced in administrator console, see the attached picture, maybe this should be fixed in console side(the console API does not consider the dedicate alertmanger, would file another bug after PRs in this bug merged to payload), for this bug, I think the issue is fixed and we could close it when the fixes merged to payload
# oc -n openshift-config-managed get cm monitoring-shared-config -oyaml
apiVersion: v1
data:
  alertmanagerPublicURL: https://alertmanager-main-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com
  alertmanagerTenancyHost: alertmanager-user-workload.openshift-user-workload-monitoring.svc:9092
  alertmanagerUserWorkloadHost: alertmanager-user-workload.openshift-user-workload-monitoring.svc:9094
  prometheusPublicURL: https://prometheus-k8s-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com
  thanosPublicURL: https://thanos-querier-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com
kind: ConfigMap
metadata:
  creationTimestamp: "2022-09-19T03:20:49Z"
  name: monitoring-shared-config
  namespace: openshift-config-managed
  resourceVersion: "41651"
  uid: c7e44118-230e-486e-82c8-570c05db3d79

# oc -n openshift-user-workload-monitoring get pod
NAME                                  READY   STATUS    RESTARTS      AGE
alertmanager-user-workload-0          6/6     Running   1 (25m ago)   25m
alertmanager-user-workload-1          6/6     Running   1 (25m ago)   25m
prometheus-operator-68458fb66-7pdtm   2/2     Running   0             26m
prometheus-user-workload-0            6/6     Running   0             26m
prometheus-user-workload-1            6/6     Running   0             26m
thanos-ruler-user-workload-0          3/3     Running   0             25m
thanos-ruler-user-workload-1          3/3     Running   0             25m

the user project alert TestAlert is in dedicated alertmanager, and the alert is silenced, state is suppressed
# token=`oc create token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-user-workload.openshift-user-workload-monitoring.svc:9095/api/v2/alerts' | jq
[
  {
    "annotations": {
      "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
    },
    "endsAt": "2022-09-19T04:19:27.799Z",
    "fingerprint": "348490d73f8513a0",
    "receivers": [
      {
        "name": "Default"
      }
    ],
    "startsAt": "2022-09-19T03:52:42.799Z",
    "status": {
      "inhibitedBy": [],
      "silencedBy": [
        "2cb3e008-c2c6-4ca2-9c3a-8b1e5a53e984"
      ],
      "state": "suppressed"
    },
    "updatedAt": "2022-09-19T04:15:27.825Z",
    "generatorURL": "https://thanos-querier-openshift-monitoring.apps.ci-ln-xzifyhk-76ef8.origin-ci-int-aws.dev.rhcloud.com/api/graph?g0.expr=vector%281%29&g0.tab=1",
    "labels": {
      "alertname": "TestAlert",
      "namespace": "ns1",
      "severity": "none"
    }
  }
]

the user project alert TestAlert is not found in platform alertmanager, this is expected
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts' | jq '.[] | {alertname: .labels.alertname, state: .status.state}'
{
  "alertname": "Watchdog",
  "state": "active"
}
{
  "alertname": "AlertmanagerReceiversNotConfigured",
  "state": "active"
}


from thanos-querier, the user project alert TestAlert is still firing, this is also expected although the TestAlert is silenced
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS' | jq '.data.result[].metric | {alertname, alertstate}'
{
  "alertname": "AlertmanagerReceiversNotConfigured",
  "alertstate": "firing"
}
{
  "alertname": "TestAlert",
  "alertstate": "firing"
}
{
  "alertname": "Watchdog",
  "alertstate": "firing"
}


# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules' | jq '.data.groups[].rules[] | select(.state=="firing")'
...
{
  "state": "firing",
  "name": "TestAlert",
  "query": "vector(1)",
  "duration": 0,
  "labels": {
    "namespace": "ns1",
    "severity": "none"
  },
  "annotations": {
    "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
  },
  "alerts": [
    {
      "labels": {
        "alertname": "TestAlert",
        "namespace": "ns1",
        "severity": "none"
      },
      "annotations": {
        "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
      },
      "state": "firing",
      "activeAt": "2022-09-19T03:52:57.799592483Z",
      "value": "1e+00",
      "partialResponseStrategy": "ABORT"
    }
  ],
  "health": "ok",
  "evaluationTime": 0.000916317,
  "lastEvaluation": "2022-09-19T06:01:27.806644449Z",
  "type": "alerting"
}

Comment 10 Simon Pasquier 2022-09-23 14:16:32 UTC
>  user project alert could be silenced from developer console, but the user project alert status is still Firing, not Silenced in administrator console, see the attached picture, maybe this should be fixed in console side(the console API does not consider the dedicate alertmanger, would file another bug after PRs in this bug merged to payload), for this bug, I think the issue is fixed and we could close it when the fixes merged to payload

Thanks for pointing out the edge case. I agree with you that it needs to be a new bug.

Comment 13 Junqi Zhao 2022-09-27 07:00:37 UTC
tested with 4.12.0-0.nightly-2022-09-26-111919, same result as Comment 6, user project alert could be silenced from developer console, but the user project alert status is still Firing, not Silenced in administrator console, this is tracked in https://issues.redhat.com/browse/OCPBUGS-1738.

Comment 15 Simon Pasquier 2022-12-14 09:07:19 UTC
This bug doesn't need to be mentioned in the release notes because it's already been fixed in 4.11.z.

Comment 17 errata-xmlrpc 2023-01-17 19:50:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399