Hide Forgot
Created attachment 1272172 [details] docker container logs Description of problem: After running concurrent builds some builds got stuck in Running status for a long time. NAMESPACE NAME TYPE FROM STATUS STARTED DURATION proj11 cakephp-mysql-example-12 Source Git@0014dde Failed (GenericBuildFailed) 43 minutes ago 34m24s proj2 cakephp-mysql-example-19 Source Git@0014dde Running About an hour ago proj33 cakephp-mysql-example-14 Source Git@0014dde Running 37 minutes ago proj48 cakephp-mysql-example-13 Source Git@0014dde Running About an hour ago After stopping docker container attaching docker logs also. Version-Release number of selected component (if applicable): openshift v3.6.27 kubernetes v1.5.2+43a9be4 etcd 3.1.0 Client: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-16.el7.x86_64 Go version: go1.7.4 Git commit: 3a094bd/1.12.6 Built: Tue Mar 21 13:30:59 2017 OS/Arch: linux/amd64 Server: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-16.el7.x86_64 Go version: go1.7.4 Git commit: 3a094bd/1.12.6 Built: Tue Mar 21 13:30:59 2017 OS/Arch: linux/amd64 Steps to Reproduce: 1. create 20 cakephp projects 2. start concurrent builds in those projects 3. After some time builds are stuck in Running state Actual results: Builds are stuck Expected results: Build should finish Additional info: Please see attached docker container logs when container was stopped.
vendor/github.com/openshift/source-to-image/pkg/build/strategies/sti/sti.go:688: I can see that builder.docker.RunContainer(opts) has returned an error; the hang is happening while we're waiting for the container to close its stderr/stdout. Also of note: the source upload ("starting the source uploading ...") has not completed. Vikaas, please could you set BUILD_LOGLEVEL on the builds so we can see the s2i logging? Also we need the docker state (daemon logs, docker ps -a, and container logs for the stuck build containers) would be useful. Or, if you have a running environment that I can log into which is currently exhibiting this issue, please ping me on IRC (NB: I'm on GMT+1).
https://github.com/openshift/origin/pull/13817
Hi Jim, Build is stuck in Running state again, while verifying this issue. I think the problem is something else, please let me know if I need to create another bug. I am attaching information for the build which is stuck again. root@ip-172-31-4-211: ~ # oc logs -n proj18 cakephp-mysql-example-119-build --follow Cloning "https://github.com/redhat-performance/cakephp-ex.git" ... Commit: 0014ddebb91bc7dff3a1dabfbd7b51da762a6677 (made changes to enable database example) Author: ofthecure <robdean.smith> Date: Mon Apr 25 14:33:06 2016 -0400 DEPRECATED: Use .s2i/bin instead of .sti/bin ---> Installing application source... Pushing image 172.24.132.26:5000/proj18/cakephp-mysql-example:latest ... error: Unable to update build status: Get https://172.24.0.1:443/oapi/v1/namespaces/proj18/builds/cakephp-mysql-example-119: dial tcp 172.24.0.1:443: getsockopt: connection refused Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: Unable to update build status: Get https://172.24.0.1:443/oapi/v1/namespaces/proj18/builds/cakephp-mysql-example-119: dial tcp 172.24.0.1:443: getsockopt: connection refused error: build error: Failed to push image: unauthorized: authentication required root@ip-172-31-4-211: ~ # oc get builds -n proj18 | grep -v Complete NAME TYPE FROM STATUS STARTED DURATION cakephp-mysql-example-119 Source Git@0014dde Running 2 hours ago cakephp-mysql-example-120 Source Git New cakephp-mysql-example-121 Source Git New Logs show its failed but the list shows its stuck in Running state. Attaching json for build and pod. root@ip-172-31-4-211: ~ # openshift version openshift v3.6.74 kubernetes v1.6.1+5115d708d7 etcd 3.1.0 root@ip-172-31-4-211: ~ # docker version Client: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-16.el7.x86_64 Go version: go1.7.4 Git commit: 3a094bd/1.12.6 Built: Tue Mar 21 13:30:59 2017 OS/Arch: linux/amd64 Server: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-16.el7.x86_64 Go version: go1.7.4 Git commit: 3a094bd/1.12.6 Built: Tue Mar 21 13:30:59 2017 OS/Arch: linux/amd64
Created attachment 1277984 [details] pod json
Created attachment 1277985 [details] build json
Please ignore comment #3 4 and 5. I created another bug for that since its a different problem. https://bugzilla.redhat.com/show_bug.cgi?id=1450466
I think this should be in ON_QA. Looks like the PR merged to master over a month ago.
Sorry; perhaps it's because I forgot to set the target release? Setting and moving to ON_QA.
Verified on following version, builds are still getting stuck openshift v3.6.74 kubernetes v1.6.1+5115d708d7 etcd 3.1.0 Jim, please let me know if this is the same issue or I need to create a new one.
That's not good. Vikas, do you have an environment exhibiting this issue that I can take a look at?
Different bug. Looking at the environment in question, all the stuck builds are stuck on the final image push. In the sample in c10, s2i is pushing to the Docker daemon and is waiting for the Docker daemon to report completed. I think this is most likely to be an OpenShift registry bug or a Docker daemon bug - I'm not sure which at this point. Please open a new bz, and I suggest capturing: - registry pod goroutines (SIGABRT) - registry pod log - docker daemon goroutines on a node hosting a failed build (SIGABRT) - docker daemon log on same
Verified in following version openshift v3.6.79 kubernetes v1.6.1+5115d708d7 etcd 3.1.0 Completed 100 cycles of 30 concurrent builds. No build was stuck in Running state. Created another bug for the problem mentioned in Comment #12.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188