Description of problem: Around 2019-04-12T13:27Z today, CI e2e-aws success dropped to near 0% with errors like: $ curl -s https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-config-operator/34/pull-ci-openshift-cluster-config-operator-master-e2e-aws/148/ | grep 'Cluster operator console' Apr 11 12:46:40.467 E clusterversion/version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator console has not yet reported success Apr 11 12:46:40.467 E clusterversion/version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator console has not yet reported success $ curl -s https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-config-operator/34/pull-ci-openshift-cluster-config-operator-master-e2e-aws/154/ | grep 'Cluster operator console' Apr 11 14:19:02.495 E clusterversion/version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator console has not yet reported success Apr 11 14:19:02.495 E clusterversion/version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator console has not yet reported success $ curl -s https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-config-operator/34/pull-ci-openshift-cluster-config-operator-master-e2e-aws/159/ | grep 'Cluster operator console' Apr 11 15:43:07.639 E clusterversion/version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator console has not yet reported success Apr 11 15:43:07.639 E clusterversion/version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator console has not yet reported success $ curl -s https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-config-operator/34/pull-ci-openshift-cluster-config-operator-master-e2e-aws/164/ | grep 'Cluster operator console' Apr 11 17:48:05.132 E clusterversion/version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator console has not yet reported success Apr 11 17:55:20.136 E clusterversion/version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator console has not yet reported success Apr 11 17:48:05.132 E clusterversion/version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator console has not yet reported success Apr 11 17:55:20.136 E clusterversion/version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator console has not yet reported success $ curl -s https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-config-operator/34/pull-ci-openshift-cluster-config-operator-master-e2e-aws/167/ | grep 'Cluster operator console' level=fatal msg="failed to initialize the cluster: Cluster operator console has not yet reported success: timed out waiting for the condition" Comparing random jobs across the transition: $ diff -u <(curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_console/1422/pull-ci-openshift-console-master-e2e-aws-console/695/artifacts/release-latest/release-payload-latest/image-references | sed 's|ci-[^/]*/stable|.../stable|') <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/6694/artifacts/release-images-latest/release-images-latest | sed 's|ci-[^/]*/stable|.../stable|') | grep -B3 io.openshift.build.source-location - "io.openshift.build.commit.id": "debd02db8f6c49aa0436d359310e42a32319c2e8", + "io.openshift.build.commit.id": "eed9a57bae31b26f4ed6dd323cc061173d8094ce", "io.openshift.build.commit.ref": "master", "io.openshift.build.source-location": "https://github.com/openshift/cluster-config-operator" -- - "io.openshift.build.commit.id": "df320f64d59b867d6fd0bdd77b5026d3c53083c8", + "io.openshift.build.commit.id": "46e1c20984d134cd04fcb046bc67ed0091edd56c", "io.openshift.build.commit.ref": "master", "io.openshift.build.source-location": "https://github.com/openshift/cluster-kube-controller-manager-operator" Looking at those changes turned up the suspect [1], which has been partially rolled back in [2]. Hopefully that fixes CI. [1]: https://github.com/openshift/cluster-config-operator/pull/34 [2]: https://github.com/openshift/cluster-config-operator/pull/43
$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-kube-scheduler-operator/97/pull-ci-openshift-cluster-kube-scheduler-operator-master-e2e-aws-operator/275/artifacts/e2e-aws-operator/pods/openshift-console_console-785b77b769-6hqmt_console_previous.log.gz | gunzip | head -n1 2019/04/12 16:39:55 auth: error contacting auth provider (retrying in 10s): discovery through endpoint https://172.30.0.1:443/.well-known/oauth-authorization-server failed: 404 Not Found $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-kube-scheduler-operator/97/pull-ci-openshift-cluster-kube-scheduler-operator-master-e2e-aws-operator/275/artifacts/e2e-aws-operator/pods/openshift-console_console-785b77b769-6hqmt_console.log.gz | gunzip | tail -n1 2019/04/12 16:50:12 auth: error contacting auth provider (retrying in 10s): discovery through endpoint https://172.30.0.1:443/.well-known/oauth-authorization-server failed: 404 Not Found ^ where the subject's 404 messages came from.
Created attachment 1554866 [details] Failures related to this issue Looks fixed to me, with the big blue dots being [1,2] (with random y values). [1]: https://github.com/openshift/cluster-config-operator/pull/34#event-2272384982 [2]: https://github.com/openshift/cluster-config-operator/pull/43#event-2273345198