Bug 1630537
| Summary: | e2e flake: `docker login` to internal registry fails with "500 Internal Server Error" | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Miciah Dashiel Butler Masters <mmasters> |
| Component: | Image Registry | Assignee: | Oleg Bulatov <obulatov> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Dongbo Yan <dyan> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.11.0 | CC: | agladkov, aos-bugs, bparees, mfojtik, mifiedle, mmasters, obulatov |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-09-19 16:51:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Miciah Dashiel Butler Masters
2018-09-18 21:31:54 UTC
Need links to the failing jobs please, or ideally the registry logs from the failing job. Not sure whether it's helpful to link to more than one, but here is a failure for #21012, one for #21017, and all seven (so far) for #21021: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/21012/test_pull_request_origin_end_to_end/17185/ https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/21017/test_pull_request_origin_end_to_end/17206/ https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/21021/test_pull_request_origin_end_to_end/17210/ https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/21021/test_pull_request_origin_end_to_end/17188/ https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/21021/test_pull_request_origin_end_to_end/17192/ https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/21021/test_pull_request_origin_end_to_end/17196/ https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/21021/test_pull_request_origin_end_to_end/17200/ https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/21021/test_pull_request_origin_end_to_end/17205/ https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/21021/test_pull_request_origin_end_to_end/17210/ Make that "all six"—one link is a duplicate. investigating here: https://github.com/openshift/origin/pull/21029 I find it suspicious this is only failing against 3.11 though. well i got logs from the registry showing the login failure + 500 error but they don't tell me much, hopefully Alexey and Oleg can take a look in the morning. Alexey/Oleg: the test/command that's failing is this: https://github.com/openshift/origin/blob/467b4688976cf0bba62e0de95a844634b09ce9c1/test/end-to-end/core.sh#L158 the full run can be seen here: https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_end_to_end/17225/consoleText and here's the output from my debug pr (used in that run above): time="2018-09-19T02:35:11.220450386Z" level=warning msg="error authorizing context: authorization header required" go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=40b56dd3-12da-4f54-a3c0-393dd029d914 http.request.method=GET http.request.remoteaddr="172.18.5.59:39556" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" time="2018-09-19T02:35:11.220506952Z" level=info msg=response go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=d108a7b5-54a0-4569-9ff0-b21508429d48 http.request.method=GET http.request.remoteaddr="172.18.5.59:39556" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=1.437971ms http.response.status=401 http.response.written=87 time="2018-09-19T02:35:11.230629023Z" level=info msg="response completed" go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=a7bb3224-1839-4881-b1af-3f8dbabcf182 http.request.method=GET http.request.remoteaddr="172.18.5.59:39558" http.request.uri="/openshift/token?account=e2e-user&client_id=docker&offline_token=true" http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" http.response.contenttype=application/json http.response.duration=9.532129ms http.response.status=200 http.response.written=117 time="2018-09-19T02:35:11.230684615Z" level=info msg=response go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=5d72c01d-2768-4f25-a1e3-b73d74429728 http.request.method=GET http.request.remoteaddr="172.18.5.59:39558" http.request.uri="/openshift/token?account=e2e-user&client_id=docker&offline_token=true" http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" http.response.contenttype=application/json http.response.duration=9.639135ms http.response.status=200 http.response.written=117 time="2018-09-19T02:35:11.234897295Z" level=info msg="authorized request" go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=3ec7fc50-574d-45ba-b53c-1fb52a4ad69b http.request.method=GET http.request.remoteaddr="172.18.5.59:39562" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" openshift.auth.user=e2e-user openshift.auth.userid=9f8573e3-bbb4-11e8-9486-0e7a4c6bcb56 time="2018-09-19T02:35:11.234988432Z" level=error msg="error parsing reference from context: invalid reference format" go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=3ec7fc50-574d-45ba-b53c-1fb52a4ad69b http.request.method=GET http.request.remoteaddr="172.18.5.59:39562" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" openshift.auth.user=e2e-user openshift.auth.userid=9f8573e3-bbb4-11e8-9486-0e7a4c6bcb56 time="2018-09-19T02:35:11.235094084Z" level=info msg=response go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=26351f31-9bf4-41ea-8817-2b27cdbf7c6e http.request.method=GET http.request.remoteaddr="172.18.5.59:39562" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=3.636817ms http.response.status=500 http.response.written=91 > time="2018-09-19T02:35:11.234988432Z" level=error msg="error parsing reference from context: invalid reference format" go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=3ec7fc50-574d-45ba-b53c-1fb52a4ad69b http.request.method=GET http.request.remoteaddr="172.18.5.59:39562" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" openshift.auth.user=e2e-user openshift.auth.userid=9f8573e3-bbb4-11e8-9486-0e7a4c6bcb56 It seems that docker/distribution decided that /v2 requires a name and does not got it: https://github.com/openshift/image-registry/blob/release-3.11/vendor/github.com/docker/distribution/registry/handlers/app.go#L685 I do not understand why this happened yet. In the jenkins log I see: + cd origin-dockerregistry-3.11.0 + /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w . + exit 0 Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.DC5vG3 + umask 022 + cd /tmp/openshift/build-rpms/rpm/BUILD + cd origin-dockerregistry-3.11.0 + BUILD_PLATFORM=linux/amd64 + OS_ONLY_BUILD_PLATFORMS=linux/amd64 + OS_GIT_COMMIT=3052f28 + OS_GIT_TREE_STATE=dirty + OS_GIT_VERSION=v3.11.0-alpha.0+3052f28-96-dirty + OS_GIT_MAJOR=3 + OS_GIT_MINOR=11+ + OS_GIT_PATCH=0 + make build-cross https://github.com/openshift/image-registry/commit/3052f28 This is your PR, Ben. The parent of this commant is docker/distribution rebase. Perhaps we have regression after rebase. That PR should not be in a v3.11 image-registry image, so something is building+testing the wrong thing. (though it sounds like it is broken for 4.0 then, which is odd since we're not seeing that failure in the origin master PR testing). After repository split CI does not run end-to-end tests for image-registry repository. So we do not have tests where `docker login` is used. It is very easy to pass tests that do not exist. it looks like 3.11 and 3.10 e2es were being run against origin master, I am attempting to fix that here: https://github.com/openshift/release/pull/1564 So that is why 3.11 and 3.10 were using the 4.0 image-registry image. The 4.0 image-registry image is indeed broken for docker login, Oleg's PR addresses that. The reason PRs against origin master were not failing is because the e2e test was effectively disabled in origin master by making it "exit 0" w/o running any tests, which is why we did not catch this regression when it was introduced. We are pursuing re-enabling the tests that were disabled in origin master. ok i've confirmed the e2e is passing again in https://github.com/openshift/origin/pull/21029 The various changes needed to get it working are noted above, plus https://github.com/openshift/aos-cd-jobs/pull/1590 |