Bug 1840150 - Report event in case of excessive leader changes that include disk metrics
Summary: Report event in case of excessive leader changes that include disk metrics
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd Operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.4.z
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On: 1827585
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-26 13:27 UTC by Suresh Kolichala
Modified: 2020-09-15 17:32 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1827585
Environment:
Last Closed: 2020-09-15 17:32:44 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-etcd-operator pull 369 0 None closed Bug 1840150: operator: add fsync controller 2020-09-21 19:13:54 UTC
Red Hat Product Errata RHBA-2020:3605 0 None None None 2020-09-15 17:32:57 UTC

Comment 11 ge liu 2020-09-09 08:56:44 UTC
Verified with 4.4.0-0.nightly-2020-09-08-111845

change etcd leader by put in a typo in /etc/kubernetes/manifests/etcd-pod.yaml to make etcd pods down, then check event, the warning msg fired:

3m47s       Warning   EtcdLeaderChangeMetrics                 deployment/etcd-operator                         Detected 2.5 leader changes in last 5 minutes on "AWS" disk metrics are: etcd-ip-10-0-148-126.us-east-2.compute.internal=0.001993,etcd-ip-10-0-185-97.us-east-2.compute.internal=0.003454999999999999,etcd-ip-10-0-197-59.us-east-2.compute.internal=0.004200000000000011
2m48s       Warning   EtcdLeaderChangeMetrics                 deployment/etcd-operator                         Detected 2.5 leader changes in last 5 minutes on "AWS" disk metrics are: etcd-ip-10-0-148-126.us-east-2.compute.internal=0.001992999999999999,etcd-ip-10-0-185-97.us-east-2.compute.internal=0.003455000000000009,etcd-ip-10-0-197-59.us-east-2.compute.internal=0.00419999999999997
2m36s       Warning   ClusterMemberControllerUpdatingStatus   deployment/etcd-operator                         rpc error: code = Unknown desc = OK: HTTP status code 200; transport: missing content-type field
89s         Warning   UnhealthyEtcdMember                     deployment/etcd-operator                         unhealthy members: ip-10-0-197-59.us-east-2.compute.internal,ip-10-0-185-97.us-east-2.compute.internal
89s         Warning   EtcdLeaderChangeMetrics                 deployment/etcd-operator                         Detected 5 leader changes in last 5 minutes on "AWS" disk metrics are: etcd-ip-10-0-148-126.us-east-2.compute.internal=0.001992999999999999,etcd-ip-10-0-185-97.us-east-2.compute.internal=0.003454999999999981,etcd-ip-10-0-197-59.us-east-2.compute.internal=0.004199999999999969
48s         Warning   EtcdLeaderChangeMetrics                 deployment/etcd-operator                         Detected 6.25 leader changes in last 5 minutes on "AWS" disk metrics are: etcd-ip-10-0-148-126.us-east-2.compute.internal=0.0019929999999999995,etcd-ip-10-0-185-97.us-east-2.compute.internal=0.003454999999999998,etcd-ip-10-0-197-59.us-east-2.compute.internal=0.004199999999999979

Comment 13 errata-xmlrpc 2020-09-15 17:32:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.4.21 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3605


Note You need to log in before you can comment on or make changes to this bug.