Bug 1846397

Summary: [4.4 upgrade][alert] AlertmanagerConfigInconsistent
Product: OpenShift Container Platform Reporter: Hongkai Liu <hongkliu>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania, wking
Target Milestone: ---Keywords: Upgrades
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the AlertmanagerConfigInconsistent alert could fire during an upgrade because some of the Alertmanager pods were temporarily not running due to a rolling update of the statefulset. The alert resolved itself once all Alertmanager pods had been updated. Consequence: the firing alert generated noise that was confusing for the cluster admins, especially because there wasn't any inconsistency in the configuration. Fix: the AlertmanagerConfigInconsistent alert has been fixed to not consider the number of running Alertmanager pods. Result: the AlertmanagerConfigInconsistent alert doesn't fire anymore during upgrades when some of the Alertmanager pods are in a not-running transient state.
Story Points: ---
Clone Of:
: 1850466 (view as bug list) Environment:
Last Closed: 2020-10-27 16:06:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1850466    

Description Hongkai Liu 2020-06-11 13:42:45 UTC
This alert fired once when upgrading CI build cluster. Upgrade has been completed successfully.

We have seen this alert occasionally but it is the first time during the upgrade.

From 4.4.7
To 4.4.8


https://coreos.slack.com/archives/CHY2E1BL4/p1591863399365200

[FIRING:1] AlertmanagerConfigInconsistent (69850230704789 openshift-monitoring/k8s alertmanager-main critical)
The configuration of the instances of the Alertmanager cluster alertmanager-main are out of sync.


Will update link to must-gather later.

Please do not fire alert if the upgrade is considered successful.

Comment 7 W. Trevor King 2020-06-17 02:06:23 UTC
What action is a user supposed to take when they see this alert?  Is there something we should be capturing in the alert message about the diff that will help the admin resolve this when it needs resolving and will help us get a handle on the false-positives when we don't need admin-intervention?

Comment 13 Junqi Zhao 2020-07-02 06:38:03 UTC
upgrade from 4.5.0-rc.4 to 4.6.0-0.nightly-2020-07-01-082733, there is not AlertmanagerConfigInconsistent alert during the upgrade progress and after the cluster is upgraded successfully

Comment 15 errata-xmlrpc 2020-10-27 16:06:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196