| Summary: | Builds failing with "Failed to push image: unauthorized" when running simultaneous builds | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> | ||||||||
| Component: | Image Registry | Assignee: | Oleg Bulatov <obulatov> | ||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Mike Fiedler <mifiedle> | ||||||||
| Severity: | low | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 3.4.0 | CC: | aos-bugs, haowang, mfojtik, mifiedle, obulatov, tstclair, xtian | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2018-01-11 13:33:18 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Attachments: |
|
||||||||||
|
Description
Mike Fiedler
2016-12-12 21:11:06 UTC
Created attachment 1230948 [details]
Failed build log
Created attachment 1230949 [details]
Application template
(In reply to Mike Fiedler from comment #2) > Created attachment 1230949 [details] > Application template Mike, could you please provide the registry pod log ? @Wang it's already in the attachments [1]. [1] https://bugzilla.redhat.com/attachment.cgi?id=1230947 Can we also get a full build log (BUILD_LOGLEVEL=5) ? I am reproducing later this morning. Will set all builds to loglevel 5. This seems to be the culprit:
level=error msg="OpenShift client error: the server has asked for the client to provide credentials (post localSubjectAccessReviews)" go.version=go1.7.3 http.request.host="172.27.9.119:5000" http.request.id=c79a91bf-d23f-4729-bd9f-1dbc2a93561e http.request.method=PUT http.request. remoteaddr="172.20.6.1:54238" http.request.uri="/v2/svt925/cakephp5/blobs/uploads/6a93eab3-a771-4d5f-9261-6a253d9f71d7?_state=N3qXqw1Bkk16hGTLpcrQtvkBSQL3vyRSB2IW10KSAVV7Ik5hbWUiOiJzdnQ5MjUvY2FrZXBocDUiLCJVVUlEIjoiNmE5M2VhYjMtYTc3MS00ZDVmLTkyNjEtNmEyNTNkOWY3MWQ3IiwiT2Zmc2V0Ijo2ODY0NzU0MCwiU3RhcnRlZEF0IjoiMjAxNi0xMi0xMlQyMDo0MDozM1oifQ%3D%3D&digest=sha256%3Aaad5d4c7bba900248b9ece846ee73cab334f5f2687b0773de9b1f2b634bf0190"
The token is refused (if provided at all) even though the previous request
PATCH /v2/svt925/cakephp5/blobs/uploads/6a93eab3-a771-4d5f-9261-6a253d9f71d7?_state=zabmoUWgHohTaxVZqxJenpQP4M3QAjRfABxMMq-7Zwl7Ik5hbWUiOiJzdnQ5MjUvY2FrZXBocDUiLCJVVUlEIjoiNmE5M2VhYjMtYTc3MS00ZDVmLTkyNjEtNmEyNTNkOWY3MWQ3IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE2LTEyLTEyVDIwOjQwOjMzLjA3ODA5OTAyOVoifQ%3D%3D HTTP/1.1" 202
succeeded. No idea why.
I'd very like to see master log from that time.
I'll try to reproduce locally and see what I can get.
(In reply to Michal Minar from comment #7) > This seems to be the culprit: > > level=error msg="OpenShift client error: the server has asked for the > client to provide credentials (post localSubjectAccessReviews)" > go.version=go1.7.3 http.request.host="172.27.9.119:5000" > http.request.id=c79a91bf-d23f-4729-bd9f-1dbc2a93561e http.request.method=PUT > http.request. remoteaddr="172.20.6.1:54238" > http.request.uri="/v2/svt925/cakephp5/blobs/uploads/6a93eab3-a771-4d5f-9261- > 6a253d9f71d7?_state=N3qXqw1Bkk16hGTLpcrQtvkBSQL3vyRSB2IW10KSAVV7Ik5hbWUiOiJzd > nQ5MjUvY2FrZXBocDUiLCJVVUlEIjoiNmE5M2VhYjMtYTc3MS00ZDVmLTkyNjEtNmEyNTNkOWY3MW > Q3IiwiT2Zmc2V0Ijo2ODY0NzU0MCwiU3RhcnRlZEF0IjoiMjAxNi0xMi0xMlQyMDo0MDozM1oifQ% > 3D%3D&digest=sha256%3Aaad5d4c7bba900248b9ece846ee73cab334f5f2687b0773de9b1f2b > 634bf0190" > > The token is refused (if provided at all) even though the previous request > > PATCH > /v2/svt925/cakephp5/blobs/uploads/6a93eab3-a771-4d5f-9261- > 6a253d9f71d7?_state=zabmoUWgHohTaxVZqxJenpQP4M3QAjRfABxMMq- > 7Zwl7Ik5hbWUiOiJzdnQ5MjUvY2FrZXBocDUiLCJVVUlEIjoiNmE5M2VhYjMtYTc3MS00ZDVmLTky > NjEtNmEyNTNkOWY3MWQ3IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE2LTEyLTEyVDIwOjQwO > jMzLjA3ODA5OTAyOVoifQ%3D%3D HTTP/1.1" 202 > > succeeded. No idea why. > > I'd very like to see master log from that time. > > I'll try to reproduce locally and see what I can get. Maybe it is because the authentication endpoint is stressed and timeouts for some requests. Do we retry the requests that timeout? As David suggested, we should look at the remote authenticator to see if we are timeouting there. Do we have any logs in registry that might reveal the server timeouted? I don't think this is a registry issue and may not be an issue at all. Given comment 7 and comment 8 above, I found a correlation between the errors and some etcd timeouts (I was using a single etcd). I've moved etcd to a 3 node cluster and the same level of stress (and even higher) is not giving these pervasive errors. I have a handful of builds which have failed with 500 errors in the build logs, but like what was seen before. This needs some additional investigation as we need to get to 10K++ builds over 1000 namespaces, but currently this reported error is not seen. I propose closing this bug unless Michal F and Michal M have seen anything that they think needs to be investigated on the registry side. Thanks Mike, moving this off the blocker list. I'd very like to see a different message (e.g. "timeout communicating with master API"). Unfortunately, registry relies on master API and its client library which needs to generate proper errors. I'm switching this to Auth component and hope you don't mind. is this a timeout error encountered when the registry is performing the subjectaccessreview check? if so, the registry can return a different error in that case I think the registry still has the information it needs to return a better error. The API doesn't return unauthorized errors in timeout cases $ ag c2a698bb-3558-4198-a2bb-5edab2658cd8 ./registry1.log 75105:time="2016-12-12T20:40:30.275116212Z" level=error msg="OpenShift access denied: User \"system:serviceaccount:svt925:builder\" cannot get imagestreams/layers in project \"svt581\"" go.version=go1.7.3 http.request.host="172.27.9.119:5000" http.request.id=c2a698bb-3558-4198-a2bb-5edab2658cd8 http.request.method=POST http.request.remoteaddr="172.20.6.1:53966" http.request.uri="/v2/svt925/cakephp5/blobs/uploads/?from=svt581%2Fcakephp0&mount=sha256%3A2cd00c647ebf035b4e71a116ddcce480cfd6c8406b0b39ebdaad50b8edcc557f" http.request.useragent="docker/1.12.3 go/go1.6.2 git-commit/8b91553-redhat kernel/3.10.0-514.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)" instance.id=ede6f3ba-4dd2-444d-a68a-ba0665ff93a6 vars.name="svt925/cakephp5" The error came from the master API side. Is it still reproducible? It also might be a cross-mount problem. It's fixed by:
> commit 860fa7f54f78603891afc4d5918ca615015c44d9
> Author: Jordan Liggitt <jliggitt>
> Date: Tue Jun 28 01:01:53 2016 -0400
>
> Delegate enforcement of cross-repo mount auth errors to blobstore
and a bunch of additional fixes by Michal Minar.
Re-testing on 3.9 today. Verified issue does not occur in 3.9.0-0.16.0. Do you want to answer this as WORKSFORME? UPSTREAM? Or I can mark it VERIFIED if you want to put it ON_QA |