Bug 1729356 - [ci] [aws] The authentication-operator in updating state causes installer to timeout in some CI jobs
Summary: [ci] [aws] The authentication-operator in updating state causes installer to ...
Keywords:
Status: CLOSED DUPLICATE of bug 1743353
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.2.0
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard: buildcop
: 1729355 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-12 04:02 UTC by Nick Hale
Modified: 2019-08-29 20:49 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-29 20:49:53 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Nick Hale 2019-07-12 04:02:30 UTC
Description of problem:

The authentication-operator is one of those stuck in an updating state which causes the installer to timeout in a number of CI jobs:

Installing from release registry.svc.ci.openshift.org/ci-op-38g9c92q/release@sha256:e6d6fa46a8805ee52eaa0e36ec1b9e2a296df6f31725175d74194d5078d0c7d8
level=warning msg="Found override for ReleaseImage. Please be warned, this is not advised"
level=info msg="Consuming \"Install Config\" from target directory"
level=info msg="Creating infrastructure resources..."
level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ci-op-38g9c92q-1d3f3.origin-ci-int-aws.dev.rhcloud.com:6443..."
level=info msg="API v1.14.0+696110f up"
level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
level=info msg="Destroying the bootstrap resources..."
level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-38g9c92q-1d3f3.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..."
level=fatal msg="failed to initialize the cluster: Some cluster operators are still updating: authentication, console"

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/1965/pull-ci-openshift-installer-master-e2e-aws/6307/

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/23360/pull-ci-openshift-origin-master-e2e-aws/10939/

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/1971/pull-ci-openshift-installer-master-e2e-aws/6344/?log#log

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/1968/pull-ci-openshift-installer-master-e2e-aws/6312/

This may be caused by another component, so there could exist a dupe BZ.

A BZ already exists for console-operator: https://bugzilla.redhat.com/show_bug.cgi?id=1729355

Comment 1 Samuel Padgett 2019-07-13 11:48:48 UTC
*** Bug 1729355 has been marked as a duplicate of this bug. ***

Comment 2 Standa Laznicka 2019-07-26 11:02:18 UTC
The authn-operator logs show that GET requests on route + /healtz return EOF all of the time, moving to routing.

Comment 3 Hongkai Liu 2019-07-26 15:17:07 UTC
Saw similar error
level=fatal msg="failed to initialize the cluster: Some cluster operators are still updating: authentication, console, monitoring"
in
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.2/257

Maybe monitoring got involved too?

Comment 4 Dan Mace 2019-08-14 20:42:28 UTC
I don't see any indication of flakiness on master[1].

Can you provide a CI search query which would indicate there's a significant PR flake?

I don't currently see this as a release blocker, but am open to changing my mind if there's some justification. If you have one, please let us know and we'll re-assess.

[1] https://testgrid.k8s.io/redhat-openshift-release-blocking#redhat-release-openshift-origin-installer-e2e-aws-serial-4.2

Comment 5 Dan Mace 2019-08-29 20:49:53 UTC
Trying to consolidate the auth related flakes. Closing as a dupe.

*** This bug has been marked as a duplicate of bug 1743353 ***


Note You need to log in before you can comment on or make changes to this bug.