Bug 2006975 - clusteroperator/etcd status condition should not change reasons frequently due to EtcdEndpointsDegraded
Summary: clusteroperator/etcd status condition should not change reasons frequently du...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Haseeb Tariq
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks: 2009016
TreeView+ depends on / blocked
 
Reported: 2021-09-22 18:38 UTC by Haseeb Tariq
Modified: 2022-03-10 16:13 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2009016 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:12:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-etcd-operator pull 660 0 None open Bug 2006975: Suppress noisy logs and improve client errors 2021-09-29 08:30:48 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:13:13 UTC

Description Haseeb Tariq 2021-09-22 18:38:06 UTC
Description of problem:
Seeing the following test failure in recent CI runs
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/664/pull-ci-openshift-cluster-etcd-operator-master-e2e-agnostic/1440433281509101568

```
: [sig-arch] events should not repeat pathologically 

event happened 21 times, something is wrong: ns/openshift-etcd-operator namespace/openshift-etcd-operator - reason/OperatorStatusChanged Status for clusteroperator/etcd changed: Degraded message changed from "NodeControllerDegraded: All master nodes are ready\nEtcdEndpointsDegraded: rpc error: code = Canceled desc = grpc: the client connection is closing\nEtcdMembersDegraded: No unhealthy members found" to "NodeControllerDegraded: All master nodes are ready\nEtcdMembersDegraded: No unhealthy members found"
event happened 21 times, something is wrong: ns/openshift-etcd-operator namespace/openshift-etcd-operator - reason/OperatorStatusChanged Status for clusteroperator/etcd changed: Degraded message changed from "NodeControllerDegraded: All master nodes are ready\nEtcdMembersDegraded: No unhealthy members found" to "NodeControllerDegraded: All master nodes are ready\nEtcdEndpointsDegraded: rpc error: code = Canceled desc = grpc: the client connection is closing\nEtcdMembersDegraded: No unhealthy members found"
```

The status condition flaps on the status message due to the addition/removal of the following reason:
```
EtcdEndpointsDegraded: rpc error: code = Canceled desc = grpc: the client connection is closing
```

Needs to be determined if this is expected behavior (e.g during upgrade jobs) or if there is an issue with how the clusteroperator/etcd status condition is updated.

Version-Release number of selected component (if applicable):
Seen on 4.10 and/or CI runs on master.


Steps to Reproduce:
As seen in CI runs:
https://search.ci.openshift.org/?search=EtcdEndpointsDegraded%3A+rpc+error%3A+code+%3D+Canceled+desc+%3D+grpc%3A+the+client+connection+is+closing&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job&wrap=on

Comment 2 Haseeb Tariq 2021-09-22 19:50:47 UTC
Adding to the known event exceptions list for now: https://github.com/openshift/origin/pull/26475

Comment 5 ge liu 2021-10-08 09:23:47 UTC
This issue still exists in 4.9 according to ci log, 4.9 need to backport after this.

Comment 6 Haseeb Tariq 2021-10-08 18:27:17 UTC
@geliu Thanks for verifying.
The 4.9 backport is ready and waiting on staff-eng-approved labels
https://bugzilla.redhat.com/show_bug.cgi?id=2009016
https://github.com/openshift/cluster-etcd-operator/pull/679

Comment 9 errata-xmlrpc 2022-03-10 16:12:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.