1732992 – Race in APIServer Registration on Upgrade

Bug 1732992 - Race in APIServer Registration on Upgrade

Summary: Race in APIServer Registration on Upgrade

Keywords:
Status:	CLOSED DUPLICATE of bug 1733015
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Stefan Schimanski
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:	buildcop
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-24 22:08 UTC by Steve Kuznetsov
Modified:	2019-07-31 06:19 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-07-31 06:19:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Steve Kuznetsov 2019-07-24 22:08:19 UTC

Symptoms are errors on types registered. Logs:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/2410#0:build-log.txt%3A27
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/2059/pull-ci-openshift-installer-master-e2e-aws-upgrade/1321

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/2059/pull-ci-openshift-installer-master-e2e-aws-upgrade/1321/artifacts/e2e-aws-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b89cf4958-989l2_cluster-version-operator.log

Context:
https://coreos.slack.com/archives/CEKNRGF25/p1563917488076900
https://coreos.slack.com/archives/CEKNRGF25/p1564005673122400

Comment 1 Abhinav Dahiya 2019-07-24 22:36:20 UTC

another one from openstack install
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/2088/pull-ci-openshift-installer-master-e2e-openstack/1200

`Could not update authentication \"cluster\" (12 of 429): the object is invalid, possibly due to local cluster configuration\n* Could not update oauthclient \"console\" (255 of 429): the server does not recognize this resource, check extension API servers`

Comment 2 Michal Fojtik 2019-07-25 08:54:37 UTC

I think there are multiple issues being mixed together in this BZ:

1) https://github.com/openshift/cluster-config-operator/pull/77 should fix the Proxy CR. It was basically created before we create the CRD, which obviously did not worked well.

2) The ServiceMonitor is more complex and belong to monitoring team. I spoke to @sur and he agreed that they should move the CRD creation out of prometheus operator. The reason we saw that error is not because API server has bug in registration or something like that, it was simply because the prometheus operator failed to run (it require worker nodes) and therefore the ServiceMonitor CRD was not installed and therefore the creation of that resource failed (as API server had no idea what that resource is)

@sur, can you please link the Jira issue from you board here?
@stts, I think we should move this to QA when Standa PR is merged.

Comment 4 Michal Fojtik 2019-07-29 10:28:33 UTC

The rest of this BZ is about proxy being created before the proxy CRD is available which might lead to CVO failing the upgrade.
This was fixed here: https://github.com/openshift/cluster-config-operator/pull/77

No backport required, moving to QA to confirm they don't see issues with the Proxy and upgrade anymore.

Comment 5 Xingxing Xia 2019-07-31 06:19:42 UTC

The actual issue above PR fixed is verified in bug 1733015. The remaining issue is tracked in https://jira.coreos.com/browse/MON-732 's bug. Thus this bug is verified and could be closed.

*** This bug has been marked as a duplicate of bug 1733015 ***

Note You need to log in before you can comment on or make changes to this bug.