Description of problem: After 4 tries, receive 504 gateway timeout errors on image pushes (see log below). Version-Release number of selected component (if applicable): penShift Master: v3.2.0.40 Kubernetes Master: v1.2.0-36-g4a3f9c5 registry.dev-preview-int.openshift.com Actual results: 504 Expected results: Push successful Additional info: $ docker pull openshift/jenkins-1-centos7 Using default tag: latest latest: Pulling from openshift/jenkins-1-centos7 a3ed95caeb02: Pull complete 5989106db7fb: Pull complete 9e821b096409: Pull complete 8b7780ce69e9: Pull complete 36b03e5ef2b5: Pull complete b858f60ccfab: Pull complete 079bd8173a11: Pull complete Digest: sha256:2d3529c9a7afa766aa5523d38c65453312414b615ee0f4e24e21a725267abfff Status: Downloaded newer image for openshift/jenkins-1-centos7:latest $ docker tag openshift/jenkins-1-centos7 registry.dev-preview-int.openshift.com/sspeiche-jenkins/jenkins-1-centos7 $ docker push registry.dev-preview-int.openshift.com/sspeiche-jenkins/jenkins-1-centos7 The push refers to a repository [registry.dev-preview-int.openshift.com/sspeiche-jenkins/jenkins-1-centos7] 5f70bf18a086: Pushed d9e9d04904b3: Pushed 3f422360fd61: Pushed 9bd1c68510c8: Pushed 214c631467fc: Pushed cc436dfbd7d3: Pushed 6a6c96337be1: Pushed Received unexpected HTTP status: 504 Gateway Time-out $ docker push registry.dev-preview-int.openshift.com/sspeiche-jenkins/jenkins-1-centos7 The push refers to a repository [registry.dev-preview-int.openshift.com/sspeiche-jenkins/jenkins-1-centos7] 5f70bf18a086: Layer already exists d9e9d04904b3: Layer already exists 3f422360fd61: Layer already exists 9bd1c68510c8: Layer already exists 214c631467fc: Layer already exists cc436dfbd7d3: Layer already exists 6a6c96337be1: Layer already exists Received unexpected HTTP status: 504 Gateway Time-out $ docker push registry.dev-preview-int.openshift.com/sspeiche-jenkins/jenkins-1-centos7 The push refers to a repository [registry.dev-preview-int.openshift.com/sspeiche-jenkins/jenkins-1-centos7] 5f70bf18a086: Layer already exists d9e9d04904b3: Layer already exists 3f422360fd61: Layer already exists 9bd1c68510c8: Layer already exists 214c631467fc: Layer already exists cc436dfbd7d3: Layer already exists 6a6c96337be1: Layer already exists Received unexpected HTTP status: 504 Gateway Time-out $ docker push registry.dev-preview-int.openshift.com/sspeiche-jenkins/jenkins-1-centos7 The push refers to a repository [registry.dev-preview-int.openshift.com/sspeiche-jenkins/jenkins-1-centos7] 5f70bf18a086: Layer already exists d9e9d04904b3: Layer already exists 3f422360fd61: Layer already exists 9bd1c68510c8: Layer already exists 214c631467fc: Layer already exists cc436dfbd7d3: Layer already exists 6a6c96337be1: Layer already exists Received unexpected HTTP status: 504 Gateway Time-out
Is this still happening? I'm curious if the ELB timeouts I set last week are helping. Here is the release-ticket for reference. https://github.com/openshift/online/issues/131#issuecomment-217921293
This bug was reported after you made these changes. I also just reproduced it right now (as I type).
According to the release ticket above, this PR may solve the issue. https://github.com/kubernetes/kubernetes/pull/24142
The mentioned fix won't make into origin until after the next rebase, not the current one mfojtik is doing.
I haven't hit this issue in the last week or two. So I'm not sure what has changed.
We are using ose version v3.2.1.1-1 in INT now. Maybe that version is new enough to contain the fix?
I was referring to registry.preview.openshift.com
Moving this bug to ON_QA for QE verification.
I still can reproduce this issue on INT: [root@dev-preview-int-master-741a5 ~]# openshift version openshift v3.2.1.10-1-g668ed0a kubernetes v1.2.0-36-g4a3f9c5 etcd 2.2.5 [root@dhcp-136-93 ~]# docker tag openshift/jenkins-1-centos7 registry.dev-preview-int.openshift.com/zhouy/jenkins-1-centos7 [root@dhcp-136-93 ~]# docker push registry.dev-preview-int.openshift.com/zhouy/jenkins-1-centos7 The push refers to a repository [registry.dev-preview-int.openshift.com/zhouy/jenkins-1-centos7] (len: 1) c014669e27a0: Pushed b9cd42fb8607: Pushed c58f3637b60b: Pushing 1.024 kB Received unexpected HTTP status: 504 Gateway Time-out
Not yet, I'll be looking into this this week. Steve, Stefanie, can we get registry log out of registry.dev-preview-int.openshift.com for the time corresponding to the failed push?
*** Bug 1333978 has been marked as a duplicate of this bug. ***
The registry logs are purged each time we do an upgrade, so we'll have to gather logs next time the issue is reproduced. Let's try that after INT has been upgraded in the next day or so. I'm still working on the upgrade, so it's not quite ready for testing yet, and the old logs have already been purged.
I ran into this problem today. Setting this seemed to help some: docker@default:~$ cat /var/lib/boot2docker/profile EXTRA_ARGS=' --label provider=virtualbox --max-concurrent-uploads=1 ' ... That is, I added --max-concurrent-uploads=1 (defaults to 5) This did not fix all problems but I was at least able to get an image uploaded after a few retries.
Would like to try this again in INT after the upgrade is done.
@dakini Any chance we could get a hold of the registry logs from the time of gateway timeout occurrence?
(In reply to Michal Minar from comment #16) > @dakini Any chance we could get a hold of the registry logs from the time of > gateway timeout occurrence? We probably can get those logs, yes. But I haven't heard of any occurrence of a gateway timeout since the latest build. We're running 3.4.0.18 in dev-preview-int currently and it's planned to upgrade to 3.4.0.19 later this evening when we get the new build. If someone can ping me when they hit the issue again, I'll retrieve the logs for it.
Cool, I'll give it a shot.
@abhgupta It seems hard to reproduce to me. I've tried this morning to push 30 images in parallel on two hosts. Their average size was 250MB. I wasn't able to hit this problem. All pushes were successful. Maybe it was due to a low load on the cluster. Nevertheless thanks to a moderately low occurrence and a possibility for simple workaround by just retrying, I wouldn't consider it a blocker. I'll try again afternoon - expecting higher load and higher probability of hitting this.
We didn't meet this issue in our test against online 3.4 these days. Will try more. Thanks!
I'm not hitting this anymore. I'm just going to close it and reopen if it happens again.