Bug 1745418
Summary: | [Feature:Builds][Conformance] oc new-app should succeed with a --name of 58 characters [Suite:openshift/conformance/parallel/minimal] | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Qin Ping <piqin> |
Component: | Samples | Assignee: | Gabe Montero <gmontero> |
Status: | CLOSED ERRATA | QA Contact: | XiuJuan Wang <xiuwang> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.2.0 | CC: | adam.kaplan, aos-bugs, bparees, wzheng |
Target Milestone: | --- | ||
Target Release: | 4.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-10-16 06:37:32 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Qin Ping
2019-08-26 06:12:58 UTC
Looks like the samples operator imagestreams are not importing. If imagestreams do not import then the "not authorized" failures are expected. Moving to Samples for now, though this may be due to an intermittent flake. Bottom line, flake related, as Adam notes. Details / tl;dr: So on the https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/95 run, I do see a TBR flake with the fuse imagestream: Get https://registry.redhat.io/v2/fuse7/fuse-console/manifests/1.0: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/articles/3399531 Now at the moment, we are not cross referencing our analysis of failed imports with the list the extended tests compare about, namely {"ruby", "nodejs", "perl", "php", "python", "mysql", "postgresql", "mongodb", "jenkins"} We could ignore EAP related flakes when running our extended tests. But that will require a tweak to the messaging done in the samples operator, followed by an updated to the extended tests in openshift/origin I'll leave this bug open to track that improvement. With https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/96 the failure is not related to samples and the "Failed to import expected imagestreams" (there is no presence of that string in https://storage.googleapis.com/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/96/build-log.txt) ... from the full build log: Aug 26 00:00:24.029: INFO: At 2019-08-25 23:59:52 +0000 UTC - event for a234567890123456789012345678901234567890123456789012345678-1: {build-controller } BuildStarted: Build e2e-test-new-app-t5wwf/a234567890123456789012345678901234567890123456789012345678-1 is now running Aug 26 00:00:24.029: INFO: At 2019-08-26 00:00:17 +0000 UTC - event for a234567890123456789012345678901234567890123456789012345678-1: {build-controller } BuildFailed: Build e2e-test-new-app-t5wwf/a234567890123456789012345678901234567890123456789012345678-1 failed Aug 26 00:00:24.081: INFO: POD NODE PHASE GRACE CONDITIONS Aug 26 00:00:24.081: INFO: a234567890123456789012345678901234567890123456789012345678-1-build ci-op-j9spzp8n-282fe-zj4pp-worker-centralus1-8xpvx Failed [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2019-08-25 23:59:50 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:00:16 +0000 UTC ContainersNotReady containers with unready status: [sti-build]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:00:16 +0000 UTC ContainersNotReady containers with unready status: [sti-build]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2019-08-25 23:59:28 +0000 UTC }] Aug 26 00:00:24.082: INFO: Aug 26 00:00:24.163: INFO: skipping dumping cluster info - cluster too large Aug 26 00:00:24.228: INFO: Deleted {user.openshift.io/v1, Resource=users e2e-test-new-app-t5wwf-user}, err: <nil> Aug 26 00:00:24.295: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-new-app-t5wwf}, err: <nil> Aug 26 00:00:24.358: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens 6QrNSoBTQaW3e_gOYGA15gAAAAAAAAAA}, err: <nil> [AfterEach] [Feature:Builds][Conformance] oc new-app /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:150 Aug 26 00:00:24.358: INFO: Waiting up to 3m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-test-new-app-t5wwf" for this suite. Aug 26 00:00:32.757: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered Aug 26 00:00:36.718: INFO: namespace e2e-test-new-app-t5wwf deletion completed in 12.311802456s Aug 26 00:00:36.722: INFO: Running AfterSuite actions on all nodes Aug 26 00:00:36.722: INFO: Running AfterSuite actions on node 1 fail [github.com/openshift/origin/test/extended/builds/new_app.go:56]: Unexpected error: <*errors.errorString | 0xc0003335c0>: { s: "The build \"a234567890123456789012345678901234567890123456789012345678-1\" status is \"Failed\"", } The build "a234567890123456789012345678901234567890123456789012345678-1" status is "Failed" occurred @Gabe looks like the latter flake [1] is our known issue around imagestreams where we suspect some caching doesn't pick up that the imagestream is local to the internal registry. [1] https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/96 The related e2e passed. Also when there are imagestream import failure, these images will be listed in openshift-samples clusteroperator. Test with payload 4.2.0-0.nightly-2019-09-02-172410 Another theory on this: If the openshift controller manager (OCM) restarts *before* the openshift api server (OAS), it could re-cache imagestreams that still don't have the internal registry represented (because the internal registry will only be represented if the OAS has picked up the internal registry hostname so it can add it as part of the decoration on a GET of an imagestream). Right now i don't think anything tries to correlate those restarts, I think the OCM restart has to be smarter. We had previously pursued creating a "canary" imagestream that we could periodically "get" to see if it had the internal registry hostname yet (which would mean the OAS had picked up the hostname and restarted). Something along those lines may need to be revisited to ensure the OCM always restarts after the OAS has picked up an internal registry hostname change. The OCM operator itself could probe an imagestream periodically to determine this. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |