Bug 1846397 - [4.4 upgrade][alert] AlertmanagerConfigInconsistent
Summary: [4.4 upgrade][alert] AlertmanagerConfigInconsistent
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1850466
TreeView+ depends on / blocked
 
Reported: 2020-06-11 13:42 UTC by Hongkai Liu
Modified: 2020-10-27 16:07 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the AlertmanagerConfigInconsistent alert could fire during an upgrade because some of the Alertmanager pods were temporarily not running due to a rolling update of the statefulset. The alert resolved itself once all Alertmanager pods had been updated. Consequence: the firing alert generated noise that was confusing for the cluster admins, especially because there wasn't any inconsistency in the configuration. Fix: the AlertmanagerConfigInconsistent alert has been fixed to not consider the number of running Alertmanager pods. Result: the AlertmanagerConfigInconsistent alert doesn't fire anymore during upgrades when some of the Alertmanager pods are in a not-running transient state.
Clone Of:
: 1850466 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:06:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github coreos kube-prometheus pull 576 0 None closed Fix AlertmanagerConfigInconsistent alert 2020-10-15 16:46:30 UTC
Github openshift cluster-monitoring-operator pull 820 0 None closed Bug 1846397: Fix AlertmanagerConfigInconsistent alert 2020-10-15 16:46:41 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:07:15 UTC

Internal Links: 1824981

Description Hongkai Liu 2020-06-11 13:42:45 UTC
This alert fired once when upgrading CI build cluster. Upgrade has been completed successfully.

We have seen this alert occasionally but it is the first time during the upgrade.

From 4.4.7
To 4.4.8


https://coreos.slack.com/archives/CHY2E1BL4/p1591863399365200

[FIRING:1] AlertmanagerConfigInconsistent (69850230704789 openshift-monitoring/k8s alertmanager-main critical)
The configuration of the instances of the Alertmanager cluster alertmanager-main are out of sync.


Will update link to must-gather later.

Please do not fire alert if the upgrade is considered successful.

Comment 7 W. Trevor King 2020-06-17 02:06:23 UTC
What action is a user supposed to take when they see this alert?  Is there something we should be capturing in the alert message about the diff that will help the admin resolve this when it needs resolving and will help us get a handle on the false-positives when we don't need admin-intervention?

Comment 13 Junqi Zhao 2020-07-02 06:38:03 UTC
upgrade from 4.5.0-rc.4 to 4.6.0-0.nightly-2020-07-01-082733, there is not AlertmanagerConfigInconsistent alert during the upgrade progress and after the cluster is upgraded successfully

Comment 15 errata-xmlrpc 2020-10-27 16:06:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.