Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1733305

Summary:	Status for clusteroperator/kube-apiserver changed: Degraded during upgrade
Product:	OpenShift Container Platform	Reporter:	Petr Muller <pmuller>
Component:	Etcd	Assignee:	Sam Batschelet <sbatsche>
Status:	CLOSED WONTFIX	QA Contact:	ge liu <geliu>
Severity:	high	Docs Contact:
Priority:	low
Version:	4.1.0	CC:	aos-bugs, bparees, ccoleman, mfojtik, nagrawal
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-05-20 10:51:16 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Petr Muller 2019-07-25 16:18:06 UTC

Description of problem:

Seeing a lot of error messages in the release-openshift-origin-installer-e2e-aws-upgrade-4.1 job:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1/206

Jul 25 14:26:44.336 I ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator Status for clusteroperator/kube-apiserver changed: Degraded message changed from "" to "StaticPodsDegraded: nodes/ip-10-0-143-150.ec2.internal pods/kube-apiserver-ip-10-0-143-150.ec2.internal container=\"kube-apiserver-7\" is not ready\nStaticPodsDegraded: nodes/ip-10-0-143-150.ec2.internal pods/kube-apiserver-ip-10-0-143-150.ec2.internal container=\"kube-apiserver-7\" is terminated: \"Error\" - \"esetting endpoints for master service \\\"kubernetes\\\" to [10.0.158.95 10.0.163.120]

log.go:172] suppressing panic for copyResponse error in test; copy error: context canceled

Comment 1 Michal Fojtik 2019-07-26 09:01:11 UTC

That test failed because of:

Jul 25 14:27:12.603: INFO: cluster upgrade is failing: Cluster operator machine-config is still updating
Jul 25 14:34:02.602: INFO: cluster upgrade is failing: Could not update deployment "openshift-machine-config-operator/etcd-quorum-guard" (315 of 350)
Jul 25 14:34:12.605: INFO: cluster upgrade is failing: Could not update deployment "openshift-machine-config-operator/etcd-quorum-guard" (315 of 350)
Jul 25 14:34:22.601: INFO: cluster upgrade is failing: Could not update deployment "openshift-machine-config-operator/etcd-quorum-guard" (315 of 350)

The upgrade got stucked at this point and timed out. The degraded transition is expected, it is not a bug. The fact the upgrade got stucked on updating etcd-quorum-guard is.

Sam, is there known bug about this?

Comment 4 Sam Batschelet 2019-08-21 02:14:41 UTC

This is actually not a duplicate of 1742744[1] I am going to reopen this for further review.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1742744#c4

Comment 6 Petr Muller 2019-09-19 13:23:00 UTC

Saw an occurrence today: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/7308

Comment 8 Ben Parees 2020-04-29 20:58:54 UTC

If "resetting endpoints for master service" is the signal on this bug, it is showing up quite a bit in recent searches:

https://search-clayton-ci-search.apps.build01.ci.devcluster.openshift.com/?search=resetting+endpoints+for+master+service&maxAge=336h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job


2.4% of all recent job runs show it.