Description of problem: https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/96 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/95 Aug 25 22:33:30.858: INFO: Running 'oc --namespace=e2e-test-new-app-bl59h --config=/tmp/configfile963740453 get dc/a234567890123456789012345678901234567890123456789012345678 -o yaml' Aug 25 22:33:31.221: INFO: Error running &{/usr/bin/oc [oc --namespace=e2e-test-new-app-bl59h --config=/tmp/configfile963740453 get dc/a234567890123456789012345678901234567890123456789012345678 -o yaml] [] Error from server (NotFound): deploymentconfigs.apps.openshift.io "a234567890123456789012345678901234567890123456789012345678" not found Error from server (NotFound): deploymentconfigs.apps.openshift.io "a234567890123456789012345678901234567890123456789012345678" not found [] <nil> 0xc0040d0720 exit status 1 <nil> <nil> true [0xc0018c0020 0xc0018c01a8 0xc0018c01a8] [0xc0018c0020 0xc0018c01a8] [0xc0018c0030 0xc0018c0158] [0x95d670 0x95d7a0] 0xc001fcfb00 <nil>}: Error from server (NotFound): deploymentconfigs.apps.openshift.io "a234567890123456789012345678901234567890123456789012345678" not found Aug 25 22:33:31.222: INFO: Error getting Deployment Config a234567890123456789012345678901234567890123456789012345678: exit status 1 Aug 25 22:33:31.222: INFO: Running 'oc --namespace=e2e-test-new-app-bl59h --config=/tmp/configfile963740453 get dc/a2345678901234567890123456789012345678901234567890123456789 -o yaml' Aug 25 22:33:31.574: INFO: Error running &{/usr/bin/oc [oc --namespace=e2e-test-new-app-bl59h --config=/tmp/configfile963740453 get dc/a2345678901234567890123456789012345678901234567890123456789 -o yaml] [] Error from server (NotFound): deploymentconfigs.apps.openshift.io "a2345678901234567890123456789012345678901234567890123456789" not found Error from server (NotFound): deploymentconfigs.apps.openshift.io "a2345678901234567890123456789012345678901234567890123456789" not found [] <nil> 0xc0040d0cc0 exit status 1 <nil> <nil> true [0xc0018c0240 0xc0018c06b8 0xc0018c06b8] [0xc0018c0240 0xc0018c06b8] [0xc0018c02e8 0xc0018c0648] [0x95d670 0x95d7a0] 0xc001a803c0 <nil>}: Error from server (NotFound): deploymentconfigs.apps.openshift.io "a2345678901234567890123456789012345678901234567890123456789" not found Aug 25 22:33:31.574: INFO: Error getting Deployment Config a2345678901234567890123456789012345678901234567890123456789: exit status 1 [AfterEach] [Feature:Builds][Conformance] oc new-app /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/client.go:101 STEP: Collecting events from namespace "e2e-test-new-app-bl59h". STEP: Found 0 events. Aug 25 22:33:31.657: INFO: POD NODE PHASE GRACE CONDITIONS Aug 25 22:33:31.657: INFO: Aug 25 22:33:31.774: INFO: skipping dumping cluster info - cluster too large Aug 25 22:33:31.839: INFO: Deleted {user.openshift.io/v1, Resource=users e2e-test-new-app-bl59h-user}, err: <nil> Aug 25 22:33:31.895: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-new-app-bl59h}, err: <nil> Aug 25 22:33:31.958: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens qv5dEL_oQSusudXp_d3OUQAAAAAAAAAA}, err: <nil> [AfterEach] [Feature:Builds][Conformance] oc new-app /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:150 Aug 25 22:33:31.958: INFO: Waiting up to 3m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-test-new-app-bl59h" for this suite. Aug 25 22:33:38.130: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered Aug 25 22:33:42.148: INFO: namespace e2e-test-new-app-bl59h deletion completed in 10.146393344s Aug 25 22:33:42.153: INFO: Running AfterSuite actions on all nodes Aug 25 22:33:42.153: INFO: Running AfterSuite actions on node 1 fail [github.com/openshift/origin/test/extended/builds/new_app.go:33]: Unexpected error: <*errors.errorString | 0xc00267d920>: { s: "Failed to import expected imagestreams", } Failed to import expected imagestreams occurred failed: (2m53s) 2019-08-25T22:33:42 "[Feature:Builds][Conformance] oc new-app should succeed with a --name of 58 characters [Suite:openshift/conformance/parallel/minimal]" How reproducible: Some times
Looks like the samples operator imagestreams are not importing. If imagestreams do not import then the "not authorized" failures are expected. Moving to Samples for now, though this may be due to an intermittent flake.
Bottom line, flake related, as Adam notes. Details / tl;dr: So on the https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/95 run, I do see a TBR flake with the fuse imagestream: Get https://registry.redhat.io/v2/fuse7/fuse-console/manifests/1.0: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/articles/3399531 Now at the moment, we are not cross referencing our analysis of failed imports with the list the extended tests compare about, namely {"ruby", "nodejs", "perl", "php", "python", "mysql", "postgresql", "mongodb", "jenkins"} We could ignore EAP related flakes when running our extended tests. But that will require a tweak to the messaging done in the samples operator, followed by an updated to the extended tests in openshift/origin I'll leave this bug open to track that improvement. With https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/96 the failure is not related to samples and the "Failed to import expected imagestreams" (there is no presence of that string in https://storage.googleapis.com/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/96/build-log.txt) ... from the full build log: Aug 26 00:00:24.029: INFO: At 2019-08-25 23:59:52 +0000 UTC - event for a234567890123456789012345678901234567890123456789012345678-1: {build-controller } BuildStarted: Build e2e-test-new-app-t5wwf/a234567890123456789012345678901234567890123456789012345678-1 is now running Aug 26 00:00:24.029: INFO: At 2019-08-26 00:00:17 +0000 UTC - event for a234567890123456789012345678901234567890123456789012345678-1: {build-controller } BuildFailed: Build e2e-test-new-app-t5wwf/a234567890123456789012345678901234567890123456789012345678-1 failed Aug 26 00:00:24.081: INFO: POD NODE PHASE GRACE CONDITIONS Aug 26 00:00:24.081: INFO: a234567890123456789012345678901234567890123456789012345678-1-build ci-op-j9spzp8n-282fe-zj4pp-worker-centralus1-8xpvx Failed [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2019-08-25 23:59:50 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:00:16 +0000 UTC ContainersNotReady containers with unready status: [sti-build]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:00:16 +0000 UTC ContainersNotReady containers with unready status: [sti-build]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2019-08-25 23:59:28 +0000 UTC }] Aug 26 00:00:24.082: INFO: Aug 26 00:00:24.163: INFO: skipping dumping cluster info - cluster too large Aug 26 00:00:24.228: INFO: Deleted {user.openshift.io/v1, Resource=users e2e-test-new-app-t5wwf-user}, err: <nil> Aug 26 00:00:24.295: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-new-app-t5wwf}, err: <nil> Aug 26 00:00:24.358: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens 6QrNSoBTQaW3e_gOYGA15gAAAAAAAAAA}, err: <nil> [AfterEach] [Feature:Builds][Conformance] oc new-app /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:150 Aug 26 00:00:24.358: INFO: Waiting up to 3m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-test-new-app-t5wwf" for this suite. Aug 26 00:00:32.757: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered Aug 26 00:00:36.718: INFO: namespace e2e-test-new-app-t5wwf deletion completed in 12.311802456s Aug 26 00:00:36.722: INFO: Running AfterSuite actions on all nodes Aug 26 00:00:36.722: INFO: Running AfterSuite actions on node 1 fail [github.com/openshift/origin/test/extended/builds/new_app.go:56]: Unexpected error: <*errors.errorString | 0xc0003335c0>: { s: "The build \"a234567890123456789012345678901234567890123456789012345678-1\" status is \"Failed\"", } The build "a234567890123456789012345678901234567890123456789012345678-1" status is "Failed" occurred
@Gabe looks like the latter flake [1] is our known issue around imagestreams where we suspect some caching doesn't pick up that the imagestream is local to the internal registry. [1] https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/96
The related e2e passed. Also when there are imagestream import failure, these images will be listed in openshift-samples clusteroperator. Test with payload 4.2.0-0.nightly-2019-09-02-172410
Another theory on this: If the openshift controller manager (OCM) restarts *before* the openshift api server (OAS), it could re-cache imagestreams that still don't have the internal registry represented (because the internal registry will only be represented if the OAS has picked up the internal registry hostname so it can add it as part of the decoration on a GET of an imagestream). Right now i don't think anything tries to correlate those restarts, I think the OCM restart has to be smarter. We had previously pursued creating a "canary" imagestream that we could periodically "get" to see if it had the internal registry hostname yet (which would mean the OAS had picked up the hostname and restarted). Something along those lines may need to be revisited to ensure the OCM always restarts after the OAS has picked up an internal registry hostname change. The OCM operator itself could probe an imagestream periodically to determine this.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922