Bug 1754651

Summary:	[vsphere] [upi] [ci] Error from server (NotFound): configs.imageregistry.operator.openshift.io "cluster" not found (repeat)
Product:	OpenShift Container Platform	Reporter:	Joseph Callen <jcallen>
Component:	kube-scheduler	Assignee:	Maciej Szulik <maszulik>
Status:	CLOSED ERRATA	QA Contact:	RamaKasturi <knarra>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.2.0	CC:	aos-bugs, calfonso, jokerman, mfojtik, sponnaga, yinzhou
Target Milestone:	---
Target Release:	4.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-05-04 11:13:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Joseph Callen 2019-09-23 19:55:29 UTC

Description of problem:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-vsphere-upi-serial-4.2/180

Apply complete! Resources: 0 added, 2 changed, 2 destroyed.
Approving pending CSRs
Completing UPI setup
level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-jd0lgrjq-14e37.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..."
Error from server (NotFound): configs.imageregistry.operator.openshift.io "cluster" not found
Error from server (NotFound): configs.imageregistry.operator.openshift.io "cluster" not found
Error from server (NotFound): configs.imageregistry.operator.openshift.io "cluster" not found
...
level=fatal msg="failed to initialize the cluster: Multiple errors are preventing progress:\n* Could not update oauthclient \"console\" (266 of 432): the server does not recognize this resource, check extension API servers\n* Could not update role \"openshift-console-operator/prometheus-k8s\" (404 of 432): resource may have been deleted\n* Could not update rolebinding \"openshift/cluster-samples-operator-openshift-edit\" (217 of 432): resource may have been deleted\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (427 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (389 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-cluster-version/cluster-version-operator\" (8 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (431 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (395 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (409 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (413 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (417 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (153 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/machine-api-operator\" (419 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (421 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (398 of 432): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (401 of 432): the server does not recognize this resource, check extension API servers"
2019/09/23 18:23:19 Container setup in pod e2e-vsphere-upi-serial failed, exit code 1, reason Error
Another process exited



Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 2 Joseph Callen 2019-09-25 13:50:23 UTC

This error only happens in vSphere

Caused by:
https://github.com/openshift/release/blob/9dfbc0af7fb257fd7e5c372d0a882fac2c24f719/ci-operator/templates/openshift/installer/cluster-launch-installer-upi-e2e.yaml#L762-L776

But should _not_ be causing an issue since the cluster should be up and responding to the get and patch.

https://ci-search-ci-search-next.svc.ci.openshift.org/?search=configs.imageregistry.operator.openshift.io+%22cluster%22+not+found&maxAge=336h&context=2&type=all

Comment 15 Michal Fojtik 2019-11-06 20:11:23 UTC

Moving to scheduler team because of:

b414613f32a1cef2d67cf2fda85e1f68}] ContainerStatuses:[{Name:scheduler State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:formers/factory.go:133: Failed to list *v1.ReplicationController: Get https://localhost:6443/api


Please do not add more failures to this BZ, instead please create new bugs, there is a lot of red herring here.

Comment 16 Maciej Szulik 2020-01-31 11:30:23 UTC

It looks like this was fixed along the 4.4 development, moving to qa.

Comment 21 errata-xmlrpc 2020-05-04 11:13:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581