Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1733305

Summary: Status for clusteroperator/kube-apiserver changed: Degraded during upgrade
Product: OpenShift Container Platform Reporter: Petr Muller <pmuller>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED WONTFIX QA Contact: ge liu <geliu>
Severity: high Docs Contact:
Priority: low    
Version: 4.1.0CC: aos-bugs, bparees, ccoleman, mfojtik, nagrawal
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-20 10:51:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Petr Muller 2019-07-25 16:18:06 UTC
Description of problem:

Seeing a lot of error messages in the release-openshift-origin-installer-e2e-aws-upgrade-4.1 job:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1/206

Jul 25 14:26:44.336 I ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator Status for clusteroperator/kube-apiserver changed: Degraded message changed from "" to "StaticPodsDegraded: nodes/ip-10-0-143-150.ec2.internal pods/kube-apiserver-ip-10-0-143-150.ec2.internal container=\"kube-apiserver-7\" is not ready\nStaticPodsDegraded: nodes/ip-10-0-143-150.ec2.internal pods/kube-apiserver-ip-10-0-143-150.ec2.internal container=\"kube-apiserver-7\" is terminated: \"Error\" - \"esetting endpoints for master service \\\"kubernetes\\\" to [10.0.158.95 10.0.163.120]

log.go:172] suppressing panic for copyResponse error in test; copy error: context canceled

Comment 1 Michal Fojtik 2019-07-26 09:01:11 UTC
That test failed because of:

Jul 25 14:27:12.603: INFO: cluster upgrade is failing: Cluster operator machine-config is still updating
Jul 25 14:34:02.602: INFO: cluster upgrade is failing: Could not update deployment "openshift-machine-config-operator/etcd-quorum-guard" (315 of 350)
Jul 25 14:34:12.605: INFO: cluster upgrade is failing: Could not update deployment "openshift-machine-config-operator/etcd-quorum-guard" (315 of 350)
Jul 25 14:34:22.601: INFO: cluster upgrade is failing: Could not update deployment "openshift-machine-config-operator/etcd-quorum-guard" (315 of 350)

The upgrade got stucked at this point and timed out. The degraded transition is expected, it is not a bug. The fact the upgrade got stucked on updating etcd-quorum-guard is.

Sam, is there known bug about this?

Comment 4 Sam Batschelet 2019-08-21 02:14:41 UTC
This is actually not a duplicate of 1742744[1] I am going to reopen this for further review.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1742744#c4

Comment 8 Ben Parees 2020-04-29 20:58:54 UTC
If "resetting endpoints for master service" is the signal on this bug, it is showing up quite a bit in recent searches:

https://search-clayton-ci-search.apps.build01.ci.devcluster.openshift.com/?search=resetting+endpoints+for+master+service&maxAge=336h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job


2.4% of all recent job runs show it.