Description of problem: After running application reliability tests for 2 days I see a build is stuck in Running state. When I see the pod logs it shows error. Version-Release number of selected component (if applicable): openshift v3.3.0.24-dirty kubernetes v1.3.0+507d3a7 etcd 2.3.0+git Steps to Reproduce: 1. Create few projects 2. keep building apps 3. Notice build is stuck in Running state Actual results: Build is stuck in Running state Expected results: Build state should be Error Additional info: Please see the status of build # 25 in following info root@ip-172-31-31-121: ~ # oc get pods NAME READY STATUS RESTARTS AGE cakephp-mysql-example-16-build 0/1 Completed 0 13h cakephp-mysql-example-17-build 0/1 Completed 0 12h cakephp-mysql-example-18-build 0/1 Completed 0 12h cakephp-mysql-example-19-8fgab 1/1 Running 0 11h cakephp-mysql-example-19-build 0/1 Completed 0 11h cakephp-mysql-example-20-deploy 0/1 Error 0 9h cakephp-mysql-example-23-build 0/1 Completed 0 9h cakephp-mysql-example-24-build 0/1 Error 0 8h cakephp-mysql-example-25-build 1/1 Running 0 8h mysql-1-nswnt 1/1 Running 0 23h root@ip-172-31-31-121: ~ # oc logs cakephp-mysql-example-25-build Cloning "https://github.com/openshift/cakephp-ex.git" ... WARNING: timed out waiting for git server, will wait 1m4s Commit: 701d706b7f2b50ee972d0bf76990042f6c0cda5c (Merge pull request #42 from bparees/recreate) Author: Ben Parees <bparees.github.com> Date: Mon Aug 22 14:44:49 2016 -0400 ERROR: Error copying file "/tmp/s2i-build084323527/upload/src/.git/objects/pack/pack-bf3ad2d17ca9b13710fd1bfa7c7f1868972c33ca.pack" to tar: io: read/write on closed pipe ERROR: Error writing tar: io: read/write on closed pipe root@ip-172-31-31-121: ~ # oc describe build cakephp-mysql-example-25 Name: cakephp-mysql-example-25 Namespace: cakephp-mysql-example-user-2-22-2-55 Created: 8 hours ago Labels: app=cakephp-mysql-example buildconfig=cakephp-mysql-example openshift.io/build-config.name=cakephp-mysql-example openshift.io/build.start-policy=Serial template=cakephp-mysql-example Annotations: openshift.io/build-config.name=cakephp-mysql-example openshift.io/build.number=25 openshift.io/build.pod-name=cakephp-mysql-example-25-build Status: Running Started: Thu, 25 Aug 2016 05:49:56 EDT Duration: running for 8h39m30s Build Config: cakephp-mysql-example Build Pod: cakephp-mysql-example-25-build Strategy: Source URL: https://github.com/openshift/cakephp-ex.git Commit: 701d706 (Merge pull request #42 from bparees/recreate) Author/Committer: Ben Parees / GitHub From Image: DockerImage registry.access.redhat.com/rhscl/php-56-rhel7:latest Output to: ImageStreamTag cakephp-mysql-example:latest Push Secret: builder-dockercfg-wqv6n Build trigger cause: Manually triggered No events. root@ip-172-31-31-121: ~ # oc describe pod cakephp-mysql-example-25-build Name: cakephp-mysql-example-25-build Namespace: cakephp-mysql-example-user-2-22-2-55 Security Policy: privileged Node: ip-172-31-19-54.us-west-2.compute.internal/172.31.19.54 Start Time: Thu, 25 Aug 2016 05:47:47 -0400 Labels: openshift.io/build.name=cakephp-mysql-example-25 Status: Running IP: 172.20.2.2 Controllers: <none> Containers: sti-build: Container ID: docker://9547c6180ec5767c5de148d938ccd040864f241ebcb32444c2ac5cdd86c2e719 Image: registry.ops.openshift.com/openshift3/ose-sti-builder:v3.3.0.24 Image ID: docker://sha256:41ca0e208e82c1fc19e9645d258ef3f995d554407d22767507c50678cec345dc Port: Args: --loglevel=0 State: Running Started: Thu, 25 Aug 2016 05:49:20 -0400 Ready: True Restart Count: 0 Volume Mounts: /var/run/docker.sock from docker-socket (rw) /var/run/secrets/kubernetes.io/serviceaccount from builder-token-dmspg (ro) /var/run/secrets/openshift.io/push from builder-dockercfg-wqv6n-push (ro) Environment Variables: BUILD: {"kind":"Build","apiVersion":"v1","metadata":{"name":"cakephp-mysql-example-25","namespace":"cakephp-mysql-example-user-2-22-2-55","selfLink":"/oapi/v1/namespaces/cakephp-mysql-example-user-2-22-2-55/builds/cakephp-mysql-example-25","uid ":"e662d3af-6aa8-11e6-97a1-02411b9952db","resourceVersion":"130925","creationTimestamp":"2016-08-25T09:47:13Z","labels":{"app":"cakephp-mysql-example","buildconfig":"cakephp-mysql-example","openshift.io/build-config.name":"cakephp-mysql-example","opensh ift.io/build.start-policy":"Serial","template":"cakephp-mysql-example"},"annotations":{"openshift.io/build-config.name":"cakephp-mysql-example","openshift.io/build.number":"25"}},"spec":{"serviceAccount":"builder","source":{"type":"Git","git":{"uri":"ht tps://github.com/openshift/cakephp-ex.git"}},"strategy":{"type":"Source","sourceStrategy":{"from":{"kind":"DockerImage","name":"registry.access.redhat.com/rhscl/php-56-rhel7:latest"},"env":[{"name":"COMPOSER_MIRROR"}]}},"output":{"to":{"kind":"DockerIma ge","name":"172.26.148.88:5000/cakephp-mysql-example-user-2-22-2-55/cakephp-mysql-example:latest"},"pushSecret":{"name":"builder-dockercfg-wqv6n"}},"resources":{},"postCommit":{},"triggeredBy":[{"message":"Manually triggered"}]},"status":{"phase":"New", "outputDockerImageReference":"172.26.148.88:5000/cakephp-mysql-example-user-2-22-2-55/cakephp-mysql-example:latest","config":{"kind":"BuildConfig","namespace":"cakephp-mysql-example-user-2-22-2-55","name":"cakephp-mysql-example"}}} SOURCE_REPOSITORY: https://github.com/openshift/cakephp-ex.git SOURCE_URI: https://github.com/openshift/cakephp-ex.git ORIGIN_VERSION: v3.3.0.24-dirty ALLOWED_UIDS: 1- DROP_CAPS: KILL,MKNOD,SETGID,SETUID,SYS_CHROOT PUSH_DOCKERCFG_PATH: /var/run/secrets/openshift.io/push Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: docker-socket: Type: HostPath (bare host directory volume) Path: /var/run/docker.sock builder-dockercfg-wqv6n-push: Type: Secret (a volume populated by a Secret) SecretName: builder-dockercfg-wqv6n builder-token-dmspg: Type: Secret (a volume populated by a Secret) SecretName: builder-token-dmspg QoS Tier: BestEffort
Goroutine dump: http://pastebin.test.redhat.com/406235
root@ip-172-31-31-121: ~ # oc get builds NAME TYPE FROM STATUS STARTED DURATION cakephp-mysql-example-16 Source Git@701d706 Complete 13 hours ago 25s cakephp-mysql-example-17 Source Git@701d706 Complete 12 hours ago 27s cakephp-mysql-example-18 Source Git@701d706 Complete 12 hours ago 24s cakephp-mysql-example-19 Source Git@701d706 Complete 11 hours ago 25s cakephp-mysql-example-23 Source Git@701d706 Complete 9 hours ago 23s cakephp-mysql-example-24 Source Git@701d706 Failed 8 hours ago 4m32s cakephp-mysql-example-25 Source Git@701d706 Running 8 hours ago 8h18m2s cakephp-mysql-example-26 Source Git New cakephp-mysql-example-27 Source Git New cakephp-mysql-example-28 Source Git New cakephp-mysql-example-29 Source Git New cakephp-mysql-example-30 Source Git New cakephp-mysql-example-31 Source Git New cakephp-mysql-example-32 Source Git New
Multiple problems. Goroutine 1 is the center of attention. - if you look at its stack trace, it calls panic; according to the golang source, this is because "sync: WaitGroup misuse: Add called concurrently with Wait" ... in any even, the wg.Done() call never returns, hence the hang. - Goroutine 1 is down this path because the RunContainer timed out; is that because of a problem in the build container, or did we not wait long enough ? ... btw, with the fsouza client version, you can see the attach/hijack thread still active, alone with the container io processing, where we have another thread blocked waiting for more IO from the container ... is the container still up? I'm not super familiar with golang waitgroups, but in github.com/openshift/source-to-image/pkg/build/strategies/sti/sti.go, I see only 1 call to Add, and multiple threads that can call Done, so it seems conceivable that the count could go negative. Minimally, we can't blindly call wg.Done() on the timeout error. I'll have to think about what the tweak would be. Also, per git blame Rodolfo originally added the waitgroup here, and there was a good amount of effort / review around it ... perhaps we can pull him in as well.
Commit pushed to master at https://github.com/openshift/source-to-image https://github.com/openshift/source-to-image/commit/3c8691b36481a6cc6d48d299b31998751341dc02 increase default session timeout (related to Bug 1370265)
This addresses the panic: https://github.com/openshift/source-to-image/pull/581
The PR https://github.com/openshift/origin/pull/10776 is the origin side update for the source-to-image PR noted n comment 6
OK, the OSE pull has merged. Moving to modified.
Verified in openshift openshift v3.3.0.33 kubernetes v1.3.0+52492b4 etcd 2.3.0+git step: 1. Create an application $ oc new-app cakephp-mysql-example 2. Edit the runPolicy field to Parallel in buildConfig $ oc edit bc -o json 3. Trigger multi builds $ oc start-build cakephp-example Actual results: All builds are complete, no build stuck in running state
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1988