Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1630537

Summary: e2e flake: `docker login` to internal registry fails with "500 Internal Server Error"
Product: OpenShift Container Platform Reporter: Miciah Dashiel Butler Masters <mmasters>
Component: Image RegistryAssignee: Oleg Bulatov <obulatov>
Status: CLOSED CURRENTRELEASE QA Contact: Dongbo Yan <dyan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: agladkov, aos-bugs, bparees, mfojtik, mifiedle, mmasters, obulatov
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-19 16:51:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miciah Dashiel Butler Masters 2018-09-18 21:31:54 UTC
Description of problem:

`docker login` in `test/end-to-end/core.sh` fails:

=== BEGIN TEST CASE ===
test/end-to-end/core.sh:158: executing 'docker login -u e2e-user -p [REDACTED] 172.30.1.1:5000' expecting success
FAILURE after 0.072s: test/end-to-end/core.sh:158: executing 'docker login -u e2e-user -p [REDACTED] 172.30.1.1:5000' expecting success: the command returned the wrong error code
There was no output from the command.
Standard error from the command:
Error response from daemon: login attempt to http://172.30.1.1:5000/v2/ failed with status: 500 Internal Server Error
=== END TEST CASE ===

Version-Release number of selected component (if applicable):

release-3.11 branch.


How reproducible:

I have seen it fail 6 out of 6 times on <https://github.com/openshift/origin/pull/21021>.

I found a couple of other PRs that are encountering the same failure:

https://github.com/openshift/origin/pull/21017
https://github.com/openshift/origin/pull/21012

Comment 1 Ben Parees 2018-09-18 21:34:45 UTC
Need links to the failing jobs please, or ideally the registry logs from the failing job.

Comment 3 Miciah Dashiel Butler Masters 2018-09-18 21:47:27 UTC
Make that "all six"—one link is a duplicate.

Comment 4 Ben Parees 2018-09-18 22:10:21 UTC
investigating here:
https://github.com/openshift/origin/pull/21029

I find it suspicious this is only failing against 3.11 though.

Comment 5 Ben Parees 2018-09-19 02:52:04 UTC
well i got logs from the registry showing the login failure + 500 error but they don't tell me much, hopefully Alexey and Oleg can take a look in the morning.

Alexey/Oleg:  the test/command that's failing is this:
https://github.com/openshift/origin/blob/467b4688976cf0bba62e0de95a844634b09ce9c1/test/end-to-end/core.sh#L158


the full run can be seen here:
https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_end_to_end/17225/consoleText

and here's the output from my debug pr (used in that run above):

time="2018-09-19T02:35:11.220450386Z" level=warning msg="error authorizing context: authorization header required" go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=40b56dd3-12da-4f54-a3c0-393dd029d914 http.request.method=GET http.request.remoteaddr="172.18.5.59:39556" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))"
time="2018-09-19T02:35:11.220506952Z" level=info msg=response go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=d108a7b5-54a0-4569-9ff0-b21508429d48 http.request.method=GET http.request.remoteaddr="172.18.5.59:39556" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=1.437971ms http.response.status=401 http.response.written=87
time="2018-09-19T02:35:11.230629023Z" level=info msg="response completed" go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=a7bb3224-1839-4881-b1af-3f8dbabcf182 http.request.method=GET http.request.remoteaddr="172.18.5.59:39558" http.request.uri="/openshift/token?account=e2e-user&client_id=docker&offline_token=true" http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" http.response.contenttype=application/json http.response.duration=9.532129ms http.response.status=200 http.response.written=117
time="2018-09-19T02:35:11.230684615Z" level=info msg=response go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=5d72c01d-2768-4f25-a1e3-b73d74429728 http.request.method=GET http.request.remoteaddr="172.18.5.59:39558" http.request.uri="/openshift/token?account=e2e-user&client_id=docker&offline_token=true" http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" http.response.contenttype=application/json http.response.duration=9.639135ms http.response.status=200 http.response.written=117
time="2018-09-19T02:35:11.234897295Z" level=info msg="authorized request" go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=3ec7fc50-574d-45ba-b53c-1fb52a4ad69b http.request.method=GET http.request.remoteaddr="172.18.5.59:39562" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" openshift.auth.user=e2e-user openshift.auth.userid=9f8573e3-bbb4-11e8-9486-0e7a4c6bcb56
time="2018-09-19T02:35:11.234988432Z" level=error msg="error parsing reference from context: invalid reference format" go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=3ec7fc50-574d-45ba-b53c-1fb52a4ad69b http.request.method=GET http.request.remoteaddr="172.18.5.59:39562" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" openshift.auth.user=e2e-user openshift.auth.userid=9f8573e3-bbb4-11e8-9486-0e7a4c6bcb56
time="2018-09-19T02:35:11.235094084Z" level=info msg=response go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=26351f31-9bf4-41ea-8817-2b27cdbf7c6e http.request.method=GET http.request.remoteaddr="172.18.5.59:39562" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=3.636817ms http.response.status=500 http.response.written=91

Comment 7 Alexey Gladkov 2018-09-19 08:38:55 UTC
> time="2018-09-19T02:35:11.234988432Z" level=error msg="error parsing reference from context: invalid reference format" go.version=go1.9.2 http.request.host="172.30.1.1:5000" http.request.id=3ec7fc50-574d-45ba-b53c-1fb52a4ad69b http.request.method=GET http.request.remoteaddr="172.18.5.59:39562" http.request.uri=/v2/ http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.3.3.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))" openshift.auth.user=e2e-user openshift.auth.userid=9f8573e3-bbb4-11e8-9486-0e7a4c6bcb56

It seems that docker/distribution decided that /v2 requires a name and does not got it:

https://github.com/openshift/image-registry/blob/release-3.11/vendor/github.com/docker/distribution/registry/handlers/app.go#L685

I do not understand why this happened yet.

Comment 8 Alexey Gladkov 2018-09-19 08:48:43 UTC
In the jenkins log I see:

+ cd origin-dockerregistry-3.11.0
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.DC5vG3
+ umask 022
+ cd /tmp/openshift/build-rpms/rpm/BUILD
+ cd origin-dockerregistry-3.11.0
+ BUILD_PLATFORM=linux/amd64
+ OS_ONLY_BUILD_PLATFORMS=linux/amd64
+ OS_GIT_COMMIT=3052f28
+ OS_GIT_TREE_STATE=dirty
+ OS_GIT_VERSION=v3.11.0-alpha.0+3052f28-96-dirty
+ OS_GIT_MAJOR=3
+ OS_GIT_MINOR=11+
+ OS_GIT_PATCH=0
+ make build-cross

https://github.com/openshift/image-registry/commit/3052f28 

This is your PR, Ben. The parent of this commant is docker/distribution rebase. Perhaps we have regression after rebase.

Comment 9 Ben Parees 2018-09-19 13:37:29 UTC
That PR should not be in a v3.11 image-registry image, so something is building+testing the wrong thing.

(though it sounds like it is broken for 4.0 then, which is odd since we're not seeing that failure in the origin master PR testing).

Comment 10 Alexey Gladkov 2018-09-19 13:49:42 UTC
After repository split CI does not run end-to-end tests for image-registry repository. So we do not have tests where `docker login` is used. It is very easy to pass tests that do not exist.

Comment 12 Ben Parees 2018-09-19 15:02:09 UTC
it looks like 3.11 and 3.10 e2es were being run against origin master, I am attempting to fix that here:
https://github.com/openshift/release/pull/1564

So that is why 3.11 and 3.10 were using the 4.0 image-registry image.

The 4.0 image-registry image is indeed broken for docker login, Oleg's PR addresses that.

The reason PRs against origin master were not failing is because the e2e test was effectively disabled in origin master by making it "exit 0" w/o running any tests, which is why we did not catch this regression when it was introduced.

We are pursuing re-enabling the tests that were disabled in origin master.

Comment 13 Ben Parees 2018-09-19 16:51:19 UTC
ok i've confirmed the e2e is passing again in https://github.com/openshift/origin/pull/21029

The various changes needed to get it working are noted above, plus https://github.com/openshift/aos-cd-jobs/pull/1590