Bug 1851281 - [4.4 upgrade][alert]Deployment openshift-machine-config-operator/etcd-quorum-guard has not matched the expected number of replicas for longer than 15 minutes
Summary: [4.4 upgrade][alert]Deployment openshift-machine-config-operator/etcd-quorum-...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-26 02:46 UTC by Hongkai Liu
Modified: 2020-10-27 16:10 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:09:46 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:10:13 UTC

Internal Links: 1824981

Description Hongkai Liu 2020-06-26 02:46:39 UTC
Description of problem:
The alert was fired when upgrading build02 from 4.4.4 to 4.4.10.

[FIRING:1] KubeDeploymentReplicasMismatch kube-state-metrics (etcd-quorum-guard https-main 10.131.2.16:8443 openshift-machine-config-operator kube-state-metrics-7749fc9947-xx4qj openshift-monitoring/k8s kube-state-metrics critical)
Deployment openshift-machine-config-operator/etcd-quorum-guard has not matched the expected number of replicas for longer than 15 minutes.

https://coreos.slack.com/archives/CHY2E1BL4/p1593117580479500

Must-Gather 
http://file.rdu.redhat.com/~hongkliu/dptp2020/bz1825000/


How reproducible: once

Eventually upgrade was completed successfully.

Is this expected during the upgrade?
What should the cluster admin do after seeing this alert?

Comment 2 W. Trevor King 2020-06-26 03:37:34 UTC
etcd-quorum-guard was born in the MCO repo because there was not yet an etcd operator.  Now that we have an etcd operator, these components are maintained by that team, and will be moving over to the etcd operator repo at some point in the future.

Comment 5 Dan Mace 2020-08-18 14:43:58 UTC
Please re-test using the fixes applied to https://bugzilla.redhat.com/show_bug.cgi?id=1829923 and let's see if they helped. Otherwise we need a way to reproduce measure whether this is actually happening with any significant frequency.

Comment 7 ge liu 2020-09-17 03:11:10 UTC
I can't hit this issue in upgrade, @hongkliu, have u ever hit it more, pls let me know or reopen it again, thanks

Comment 8 Hongkai Liu 2020-09-28 14:14:56 UTC
As far I as know, CI clusters hit this only once.

Comment 10 errata-xmlrpc 2020-10-27 16:09:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.