Bug 1335279
Summary: | 504 gateway timeouts on 'docker push' to registry.dev-preview-int | ||
---|---|---|---|
Product: | OpenShift Online | Reporter: | Steve Speicher <sspeiche> |
Component: | Image Registry | Assignee: | Michal Minar <miminar> |
Status: | CLOSED WORKSFORME | QA Contact: | Bing Li <bingli> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.x | CC: | abhgupta, aos-bugs, bingli, dakini, erich, mfojtik, miminar, pep, rjstoneus, twaugh, yinzhou |
Target Milestone: | --- | Keywords: | UpcomingRelease |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-28 13:06:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1303130 |
Description
Steve Speicher
2016-05-11 18:35:24 UTC
Is this still happening? I'm curious if the ELB timeouts I set last week are helping. Here is the release-ticket for reference. https://github.com/openshift/online/issues/131#issuecomment-217921293 This bug was reported after you made these changes. I also just reproduced it right now (as I type). According to the release ticket above, this PR may solve the issue. https://github.com/kubernetes/kubernetes/pull/24142 The mentioned fix won't make into origin until after the next rebase, not the current one mfojtik is doing. I haven't hit this issue in the last week or two. So I'm not sure what has changed. We are using ose version v3.2.1.1-1 in INT now. Maybe that version is new enough to contain the fix? I was referring to registry.preview.openshift.com Moving this bug to ON_QA for QE verification. I still can reproduce this issue on INT: [root@dev-preview-int-master-741a5 ~]# openshift version openshift v3.2.1.10-1-g668ed0a kubernetes v1.2.0-36-g4a3f9c5 etcd 2.2.5 [root@dhcp-136-93 ~]# docker tag openshift/jenkins-1-centos7 registry.dev-preview-int.openshift.com/zhouy/jenkins-1-centos7 [root@dhcp-136-93 ~]# docker push registry.dev-preview-int.openshift.com/zhouy/jenkins-1-centos7 The push refers to a repository [registry.dev-preview-int.openshift.com/zhouy/jenkins-1-centos7] (len: 1) c014669e27a0: Pushed b9cd42fb8607: Pushed c58f3637b60b: Pushing 1.024 kB Received unexpected HTTP status: 504 Gateway Time-out Not yet, I'll be looking into this this week. Steve, Stefanie, can we get registry log out of registry.dev-preview-int.openshift.com for the time corresponding to the failed push? *** Bug 1333978 has been marked as a duplicate of this bug. *** The registry logs are purged each time we do an upgrade, so we'll have to gather logs next time the issue is reproduced. Let's try that after INT has been upgraded in the next day or so. I'm still working on the upgrade, so it's not quite ready for testing yet, and the old logs have already been purged. I ran into this problem today. Setting this seemed to help some: docker@default:~$ cat /var/lib/boot2docker/profile EXTRA_ARGS=' --label provider=virtualbox --max-concurrent-uploads=1 ' ... That is, I added --max-concurrent-uploads=1 (defaults to 5) This did not fix all problems but I was at least able to get an image uploaded after a few retries. Would like to try this again in INT after the upgrade is done. @dakini Any chance we could get a hold of the registry logs from the time of gateway timeout occurrence? (In reply to Michal Minar from comment #16) > @dakini Any chance we could get a hold of the registry logs from the time of > gateway timeout occurrence? We probably can get those logs, yes. But I haven't heard of any occurrence of a gateway timeout since the latest build. We're running 3.4.0.18 in dev-preview-int currently and it's planned to upgrade to 3.4.0.19 later this evening when we get the new build. If someone can ping me when they hit the issue again, I'll retrieve the logs for it. Cool, I'll give it a shot. @abhgupta It seems hard to reproduce to me. I've tried this morning to push 30 images in parallel on two hosts. Their average size was 250MB. I wasn't able to hit this problem. All pushes were successful. Maybe it was due to a low load on the cluster. Nevertheless thanks to a moderately low occurrence and a possibility for simple workaround by just retrying, I wouldn't consider it a blocker. I'll try again afternoon - expecting higher load and higher probability of hitting this. We didn't meet this issue in our test against online 3.4 these days. Will try more. Thanks! I'm not hitting this anymore. I'm just going to close it and reopen if it happens again. |