Description of problem: Build is stuck in Running state, but the pod status shows Error root@ip-172-31-4-211: ~ # oc logs -n proj18 cakephp-mysql-example-119-build --follow Cloning "https://github.com/redhat-performance/cakephp-ex.git" ... Commit: 0014ddebb91bc7dff3a1dabfbd7b51da762a6677 (made changes to enable database example) Author: ofthecure <robdean.smith> Date: Mon Apr 25 14:33:06 2016 -0400 DEPRECATED: Use .s2i/bin instead of .sti/bin ---> Installing application source... Pushing image 172.24.132.26:5000/proj18/cakephp-mysql-example:latest ... error: Unable to update build status: Get https://172.24.0.1:443/oapi/v1/namespaces/proj18/builds/cakephp-mysql-example-119: dial tcp 172.24.0.1:443: getsockopt: connection refused Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: Unable to update build status: Get https://172.24.0.1:443/oapi/v1/namespaces/proj18/builds/cakephp-mysql-example-119: dial tcp 172.24.0.1:443: getsockopt: connection refused error: build error: Failed to push image: unauthorized: authentication required root@ip-172-31-4-211: ~ # oc get builds -n proj18 | grep -v Complete NAME TYPE FROM STATUS STARTED DURATION cakephp-mysql-example-119 Source Git@0014dde Running 2 hours ago cakephp-mysql-example-120 Source Git New cakephp-mysql-example-121 Source Git New Logs show its failed but the list shows its stuck in Running state. Attaching json for build and pod. root@ip-172-31-4-211: ~ # oc get pods -n proj18 | grep -v Complete NAME READY STATUS RESTARTS AGE cakephp-mysql-example-119-build 0/1 Error 0 1d Version-Release number of selected component (if applicable): root@ip-172-31-4-211: ~ # openshift version openshift v3.6.74 kubernetes v1.6.1+5115d708d7 etcd 3.1.0 root@ip-172-31-4-211: ~ # docker version Client: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-16.el7.x86_64 Go version: go1.7.4 Git commit: 3a094bd/1.12.6 Built: Tue Mar 21 13:30:59 2017 OS/Arch: linux/amd64 Server: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-16.el7.x86_64 Go version: go1.7.4 Git commit: 3a094bd/1.12.6 Built: Tue Mar 21 13:30:59 2017 OS/Arch: linux/amd64 Steps to Reproduce: 1. Running concurrent build test, this happened after around 3000 successful builds 2. I am running 30 concurrent builds on 2 m4.xlarge nodes Actual results: Build stuck in Running. Expected results: Like pod build status also should show failed. Additional info:
Created attachment 1278277 [details] pod json
Created attachment 1278278 [details] build json
we're going to need master logs w/ level 5 tracing to be able to see what happened within the pod controller for this.
Marking upcoming release as Cesar's PR that reworks all this logic is going to land at the start of next sprint.
relevant PR: https://github.com/openshift/origin/pull/14289
Rerun the test with 50 concurrent builds, all builds succeeded.
(In reply to Hongkai Liu from comment #7) > Rerun the test with 50 concurrent builds, all builds succeeded. Verified on 3.6.133