Description of problem: I am running concurrent builds in scale environment, I saw this bug with 250 concurrent builds. Cloning "https://github.com/redhat-performance/cakephp-ex.git" ... Commit: 0014ddebb91bc7dff3a1dabfbd7b51da762a6677 (made changes to enable database example) Author: ofthecure <robdean.smith> Date: Mon Apr 25 14:33:06 2016 -0400 DEPRECATED: Use .s2i/bin instead of .sti/bin ---> Installing application source... Pushing image 10.202.162.54:5000/proj565/cakephp-mysql-example:latest ... Pushed 3/5 layers, 61% complete Pushed 4/5 layers, 82% complete Version-Release number of selected component (if applicable): # openshift version openshift v3.6.74 kubernetes v1.6.1+5115d708d7 etcd 3.1.0 # docker version Client: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-14.el7.x86_64 Go version: go1.7.4 Git commit: 3a094bd/1.12.6 Built: Thu Mar 16 14:27:53 2017 OS/Arch: linux/amd64 Server: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-14.el7.x86_64 Go version: go1.7.4 Git commit: 3a094bd/1.12.6 Built: Thu Mar 16 14:27:53 2017 OS/Arch: linux/amd64 How reproducible: Start concurrent builds for cakephp app happens with 250 concurrent builds. Actual results: Builds are stuck in Running state Expected results: Build should finish successfully. Additional info: --- Comment #12 from Jim Minter <jminter> --- Different bug. Looking at the environment in question, all the stuck builds are stuck on the final image push. In the sample in c10, s2i is pushing to the Docker daemon and is waiting for the Docker daemon to report completed. I think this is most likely to be an OpenShift registry bug or a Docker daemon bug - I'm not sure which at this point. Please open a new bz, and I suggest capturing: - registry pod goroutines (SIGABRT) - registry pod log - docker daemon goroutines on a node hosting a failed build (SIGABRT) - docker daemon log on same I am going to provide a link for all these logs.
https://github.com/docker/distribution/pull/2299
(In reply to Oleg Bulatov from comment #6) > https://github.com/docker/distribution/pull/2299 Oleg, can we pick this fix for registry to close this bug?
Yes, we can. I expected it would be merged into upstream a little bit faster, but they didn't care.
https://github.com/openshift/origin/pull/14581
The image(devenv-rhel7_6350) is not ready in aws according to PR in comment 9, we will test it after it ready.
Ge Liu, I will test it in scale environment. will assign it to myself.
Rerun the test with 50 concurrent builds, all builds succeeded.
(In reply to Hongkai Liu from comment #13) > Rerun the test with 50 concurrent builds, all builds succeeded. Verified on 3.6.133
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716