Bug 1976765
Summary: | AlertmanagerMembersInconsistent fires too quickly, causing serial-test noise | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Filip Petkovski <fpetkovs> |
Component: | Monitoring | Assignee: | Jayapriya Pai <janantha> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.8 | CC: | alegrand, amcdermo, anpicker, aos-bugs, ccoleman, dgrisonn, erooth, juzhao, kakkoyun, lcosic, pkrupa, pnair, rgudimet, spasquie, wking |
Target Milestone: | --- | ||
Target Release: | 4.8.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1936919 | Environment: |
[Late] Alerts shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Suite:openshift/conformance/parallel]
|
Last Closed: | 2021-08-16 18:32:12 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1936919 | ||
Bug Blocks: |
Comment 1
W. Trevor King
2021-07-14 16:37:07 UTC
checked with CI jobs, no firing AlertmanagerMembersInconsistent alert https://search.ci.openshift.org/?search=AlertmanagerMembersInconsistent&maxAge=48h&context=1&type=bug%2Bjunit&name=periodic-ci-openshift-release-master-ci-4.8.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job - alert: AlertmanagerMembersInconsistent annotations: description: Alertmanager {{ $labels.namespace }}/{{ $labels.pod}} has only found {{ $value }} members of the {{$labels.job}} cluster. summary: A member of an Alertmanager cluster has not found all other cluster members. expr: | # Without max_over_time, failed scrapes could create false negatives, see # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details. max_over_time(alertmanager_cluster_members{job="alertmanager-main",namespace="openshift-monitoring"}[5m]) < on (namespace,service) group_left count by (namespace,service) (max_over_time(alertmanager_cluster_members{job="alertmanager-main",namespace="openshift-monitoring"}[5m])) for: 15m labels: severity: critical Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.5 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3121 |