1734270 – Alerts are marked as "Not grouped" in Alertmanager web console

Bug 1734270 - Alerts are marked as "Not grouped" in Alertmanager web console

Summary: Alerts are marked as "Not grouped" in Alertmanager web console

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.3.0
Assignee:	Andrew Pickering
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-30 06:18 UTC by Junqi Zhao
Modified:	2019-10-08 09:18 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-08 09:18:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Alerts are marked as "Not grouped" (117.82 KB, image/png) 2019-07-30 06:18 UTC, Junqi Zhao	no flags	Details
41 alertmanager console (36.58 KB, image/png) 2019-10-08 03:17 UTC, Junqi Zhao	no flags	Details
View All

Description Junqi Zhao 2019-07-30 06:18:57 UTC

Created attachment 1594510 [details]
Alerts are marked as "Not grouped"

Description of problem:
Watchdog alert is under general.rules group, but in the Alertmanager web console, it is under "Not grouped" , same for other alerts .see the attached picture.

$ oc -n openshift-monitoring logs -c alertmanager alertmanager-main-0
level=info ts=2019-07-29T06:03:41.221Z caller=main.go:197 msg="Starting Alertmanager" version="(version=0.18.0, branch=rhaos-4.2-rhel-7, revision=38c47ef01d3959ca8c72e6c6fc892e3e8fcf8957)"
level=info ts=2019-07-29T06:03:41.221Z caller=main.go:198 build_context="(go=go1.12.6, user=root@8c461faab7bf, date=20190728-18:37:04)"
level=warn ts=2019-07-29T06:03:42.164Z caller=cluster.go:228 component=cluster msg="failed to join cluster" err="3 errors occurred:\n\t* Failed to resolve alertmanager-main-0.alertmanager-operated.openshift-monitoring.svc:9094: lookup alertmanager-main-0.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: no such host\n\t* Failed to resolve alertmanager-main-1.alertmanager-operated.openshift-monitoring.svc:9094: lookup alertmanager-main-1.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: no such host\n\t* Failed to resolve alertmanager-main-2.alertmanager-operated.openshift-monitoring.svc:9094: lookup alertmanager-main-2.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: no such host\n\n"
level=info ts=2019-07-29T06:03:42.164Z caller=cluster.go:230 component=cluster msg="will retry joining cluster every 10s"
level=warn ts=2019-07-29T06:03:42.165Z caller=main.go:287 msg="unable to join gossip mesh" err="3 errors occurred:\n\t* Failed to resolve alertmanager-main-0.alertmanager-operated.openshift-monitoring.svc:9094: lookup alertmanager-main-0.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: no such host\n\t* Failed to resolve alertmanager-main-1.alertmanager-operated.openshift-monitoring.svc:9094: lookup alertmanager-main-1.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: no such host\n\t* Failed to resolve alertmanager-main-2.alertmanager-operated.openshift-monitoring.svc:9094: lookup alertmanager-main-2.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: no such host\n\n"
level=info ts=2019-07-29T06:03:42.165Z caller=cluster.go:623 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2019-07-29T06:03:42.197Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2019-07-29T06:03:42.198Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2019-07-29T06:03:42.202Z caller=main.go:429 msg=Listening address=127.0.0.1:9093
level=info ts=2019-07-29T06:03:44.165Z caller=cluster.go:648 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000656524s
level=info ts=2019-07-29T06:03:52.166Z caller=cluster.go:640 component=cluster msg="gossip settled; proceeding" elapsed=10.00149921s


Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-07-28-222114

How reproducible:
Always

Steps to Reproduce:
1. Check alerts in Alertmanager web console
2.
3.

Actual results:
Alerts are marked as "Not grouped" in Alertmanager web console

Expected results:


Additional info:

Comment 1 Andrew Pickering 2019-10-08 02:35:49 UTC

I think this is just because the rules are grouped by `job` by default and Watchdog does not have a `job` label.

@Junqi Could you confirm?

Comment 2 Andrew Pickering 2019-10-08 02:38:29 UTC

Also, I see that the Watchdog alert is duplicated in the screenshot. That issue has now been fixed upstream (see https://github.com/prometheus/alertmanager/issues/1875).

Comment 4 Junqi Zhao 2019-10-08 03:17:04 UTC

Created attachment 1623385 [details]
41 alertmanager console

Note You need to log in before you can comment on or make changes to this bug.