+++ This bug was initially created as a clone of Bug #2006975 +++ Description of problem: Seeing the following test failure in recent CI runs https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/664/pull-ci-openshift-cluster-etcd-operator-master-e2e-agnostic/1440433281509101568 ``` : [sig-arch] events should not repeat pathologically event happened 21 times, something is wrong: ns/openshift-etcd-operator namespace/openshift-etcd-operator - reason/OperatorStatusChanged Status for clusteroperator/etcd changed: Degraded message changed from "NodeControllerDegraded: All master nodes are ready\nEtcdEndpointsDegraded: rpc error: code = Canceled desc = grpc: the client connection is closing\nEtcdMembersDegraded: No unhealthy members found" to "NodeControllerDegraded: All master nodes are ready\nEtcdMembersDegraded: No unhealthy members found" event happened 21 times, something is wrong: ns/openshift-etcd-operator namespace/openshift-etcd-operator - reason/OperatorStatusChanged Status for clusteroperator/etcd changed: Degraded message changed from "NodeControllerDegraded: All master nodes are ready\nEtcdMembersDegraded: No unhealthy members found" to "NodeControllerDegraded: All master nodes are ready\nEtcdEndpointsDegraded: rpc error: code = Canceled desc = grpc: the client connection is closing\nEtcdMembersDegraded: No unhealthy members found" ``` The status condition flaps on the status message due to the addition/removal of the following reason: ``` EtcdEndpointsDegraded: rpc error: code = Canceled desc = grpc: the client connection is closing ``` Needs to be determined if this is expected behavior (e.g during upgrade jobs) or if there is an issue with how the clusteroperator/etcd status condition is updated. Version-Release number of selected component (if applicable): Seen on 4.10 and/or CI runs on master. Steps to Reproduce: As seen in CI runs: https://search.ci.openshift.org/?search=EtcdEndpointsDegraded%3A+rpc+error%3A+code+%3D+Canceled+desc+%3D+grpc%3A+the+client+connection+is+closing&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job&wrap=on --- Additional comment from Haseeb Tariq on 2021-09-22 18:50:43 UTC --- Bumping to a high since this has been failing across multiple release jobs https://search.ci.openshift.org/?search=EtcdEndpointsDegraded%3A+rpc+error%3A+code+%3D+Canceled+desc+%3D+grpc%3A+the+client+connection+is+closing&maxAge=336h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-canary/1436114643499094016 --- Additional comment from Haseeb Tariq on 2021-09-22 19:50:47 UTC --- Adding to the known event exceptions list for now: https://github.com/openshift/origin/pull/26475
Verified with 4.9.0-0.nightly-2021-10-14-182021
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.9.4 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3935