1892536 – 4.7.0-0.ci payload not accepted - release-openshift-origin-installer-e2e-aws-upgrade fails waiting on openshift-controller-manager

Bug 1892536 - 4.7.0-0.ci payload not accepted - release-openshift-origin-installer-e2e-aws-upgrade fails waiting on openshift-controller-manager

Summary: 4.7.0-0.ci payload not accepted - release-openshift-origin-installer-e2e-aws-...

Keywords:
Status:	CLOSED DUPLICATE of bug 1882750
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Ryan Phillips
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:	devex
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-29 05:04 UTC by jamo luhrsen
Modified:	2020-11-06 19:59 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-11-06 19:59:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description jamo luhrsen 2020-10-29 05:04:06 UTC

Description of problem:

the 4.7.0-0.ci payload has not been accepted recently and the current
failure is:

level=fatal msg="failed to initialize the cluster: Cluster operator openshift-controller-manager is still updating"

the job is here:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1321100919940583424


Version-Release number of selected component (if applicable):

4.6.1 appears to be the version which fails, although this job upgrades
to 4.7.0 afaik

How reproducible:

unknown. according to [0] the five previous 4.7.0-0.ci payloads were all accepted

[0] https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/#4.7.0-0.ci


Additional info:

I am unsure how to really debug the root cause, but digging through some pod logs I noticed
this in this openshift install log [1]:

time="2020-10-27T15:16:52Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 2"
time="2020-10-27T15:17:12Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.6.1: 100% complete, waiting on openshift-controller-manager"
time="2020-10-27T15:21:42Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator openshift-controller-manager is still updating"
time="2020-10-27T15:48:48Z" level=info msg="Cluster operator insights Disabled is False with AsExpected: "
time="2020-10-27T15:48:48Z" level=info msg="Cluster operator openshift-controller-manager Progressing is True with : "

I could not find anything I knew to be relevant to only 2 of 3 kube-apiservers being available, but possibly that
is a place to dig in the job artifacts?


[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1321100919940583424/artifacts/e2e-aws-upgrade/installer/.openshift_install.log

Comment 4 Ryan Phillips 2020-11-06 19:59:01 UTC

I'm going to dupe this bug to 1882750. We reverted the crio config change for now, and I need to followup with David if the UID for static pods was put in.

*** This bug has been marked as a duplicate of bug 1882750 ***

Note You need to log in before you can comment on or make changes to this bug.