Bug 1442875 - Build stuck in Running status
Summary: Build stuck in Running status
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.7.0
Assignee: Jim Minter
QA Contact: Vikas Laad
URL:
Whiteboard:
Depends On:
Blocks: 1436391 1437121
TreeView+ depends on / blocked
 
Reported: 2017-04-17 20:13 UTC by Vikas Laad
Modified: 2017-11-28 21:53 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Source-to-image was not closing stdin/out/err pipes correctly in some error cases, causing a hang to occur. This was causing some OpenShift Builds to hang in Running status as a knock-on effect.
Clone Of:
Environment:
Last Closed: 2017-11-28 21:53:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
docker container logs (20.49 KB, text/plain)
2017-04-17 20:13 UTC, Vikas Laad
no flags Details
pod json (7.55 KB, text/plain)
2017-05-11 17:29 UTC, Vikas Laad
no flags Details
build json (2.96 KB, text/plain)
2017-05-11 17:29 UTC, Vikas Laad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Vikas Laad 2017-04-17 20:13:09 UTC
Created attachment 1272172 [details]
docker container logs

Description of problem:
After running concurrent builds some builds got stuck in Running status for a long time.

NAMESPACE   NAME                       TYPE      FROM          STATUS                        STARTED             DURATION
proj11      cakephp-mysql-example-12   Source    Git@0014dde   Failed (GenericBuildFailed)   43 minutes ago      34m24s
proj2       cakephp-mysql-example-19   Source    Git@0014dde   Running                       About an hour ago   
proj33      cakephp-mysql-example-14   Source    Git@0014dde   Running                       37 minutes ago      
proj48      cakephp-mysql-example-13   Source    Git@0014dde   Running                       About an hour ago   

After stopping docker container attaching docker logs also.

Version-Release number of selected component (if applicable):
openshift v3.6.27
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-16.el7.x86_64
 Go version:      go1.7.4
 Git commit:      3a094bd/1.12.6
 Built:           Tue Mar 21 13:30:59 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-16.el7.x86_64
 Go version:      go1.7.4
 Git commit:      3a094bd/1.12.6
 Built:           Tue Mar 21 13:30:59 2017
 OS/Arch:         linux/amd64

Steps to Reproduce:
1. create 20 cakephp projects
2. start concurrent builds in those projects
3. After some time builds are stuck in Running state

Actual results:
Builds are stuck

Expected results:
Build should finish

Additional info:
Please see attached docker container logs when container was stopped.

Comment 1 Jim Minter 2017-04-18 09:09:56 UTC
vendor/github.com/openshift/source-to-image/pkg/build/strategies/sti/sti.go:688: I can see that builder.docker.RunContainer(opts) has returned an error; the hang is happening while we're waiting for the container to close its stderr/stdout.  Also of note: the source upload ("starting the source uploading ...") has not completed.

Vikaas, please could you set BUILD_LOGLEVEL on the builds so we can see the s2i logging?  Also we need the docker state (daemon logs, docker ps -a, and container logs for the stuck build containers) would be useful.

Or, if you have a running environment that I can log into which is currently exhibiting this issue, please ping me on IRC (NB: I'm on GMT+1).

Comment 2 Jim Minter 2017-04-27 11:41:42 UTC
https://github.com/openshift/origin/pull/13817

Comment 3 Vikas Laad 2017-05-11 17:28:55 UTC
Hi Jim,

Build is stuck in Running state again, while verifying this issue. I think the problem is something else, please let me know if I need to create another bug. I am attaching information for the build which is stuck again.

root@ip-172-31-4-211: ~ # oc logs -n proj18 cakephp-mysql-example-119-build --follow
Cloning "https://github.com/redhat-performance/cakephp-ex.git" ...
        Commit: 0014ddebb91bc7dff3a1dabfbd7b51da762a6677 (made changes to enable database example)
        Author: ofthecure <robdean.smith>
        Date:   Mon Apr 25 14:33:06 2016 -0400
DEPRECATED: Use .s2i/bin instead of .sti/bin
---> Installing application source...
Pushing image 172.24.132.26:5000/proj18/cakephp-mysql-example:latest ...
error: Unable to update build status: Get https://172.24.0.1:443/oapi/v1/namespaces/proj18/builds/cakephp-mysql-example-119: dial tcp 172.24.0.1:443: getsockopt: connection refused
Registry server Address: 
Registry server User Name: serviceaccount
Registry server Email: serviceaccount
Registry server Password: <<non-empty>>
error: Unable to update build status: Get https://172.24.0.1:443/oapi/v1/namespaces/proj18/builds/cakephp-mysql-example-119: dial tcp 172.24.0.1:443: getsockopt: connection refused
error: build error: Failed to push image: unauthorized: authentication required


root@ip-172-31-4-211: ~ # oc get builds -n proj18 | grep -v Complete                                                                                                                                                                                         
NAME                        TYPE      FROM          STATUS     STARTED        DURATION
cakephp-mysql-example-119   Source    Git@0014dde   Running    2 hours ago    
cakephp-mysql-example-120   Source    Git           New                       
cakephp-mysql-example-121   Source    Git           New                       


Logs show its failed but the list shows its stuck in Running state. Attaching json for build and pod.

root@ip-172-31-4-211: ~ # openshift version
openshift v3.6.74
kubernetes v1.6.1+5115d708d7
etcd 3.1.0
root@ip-172-31-4-211: ~ # docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-16.el7.x86_64
 Go version:      go1.7.4
 Git commit:      3a094bd/1.12.6
 Built:           Tue Mar 21 13:30:59 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-16.el7.x86_64
 Go version:      go1.7.4
 Git commit:      3a094bd/1.12.6
 Built:           Tue Mar 21 13:30:59 2017
 OS/Arch:         linux/amd64

Comment 4 Vikas Laad 2017-05-11 17:29:27 UTC
Created attachment 1277984 [details]
pod json

Comment 5 Vikas Laad 2017-05-11 17:29:59 UTC
Created attachment 1277985 [details]
build json

Comment 6 Vikas Laad 2017-05-12 17:07:59 UTC
Please ignore comment #3 4 and 5. I created another bug for that since its a different problem. https://bugzilla.redhat.com/show_bug.cgi?id=1450466

Comment 7 Mike Fiedler 2017-05-25 14:33:18 UTC
I think this should be in ON_QA.  Looks like the PR merged to master over a month ago.

Comment 8 Jim Minter 2017-05-25 14:57:48 UTC
Sorry; perhaps it's because I forgot to set the target release?  Setting and moving to ON_QA.

Comment 9 Vikas Laad 2017-05-25 15:54:14 UTC
Verified on following version, builds are still getting stuck

openshift v3.6.74
kubernetes v1.6.1+5115d708d7
etcd 3.1.0

Jim, please let me know if this is the same issue or I need to create a new one.

Comment 11 Jim Minter 2017-05-25 16:08:36 UTC
That's not good.  Vikas, do you have an environment exhibiting this issue that I can take a look at?

Comment 12 Jim Minter 2017-05-25 16:49:35 UTC
Different bug.  Looking at the environment in question, all the stuck builds are stuck on the final image push.  In the sample in c10, s2i is pushing to the Docker daemon and is waiting for the Docker daemon to report completed.  I think this is most likely to be an OpenShift registry bug or a Docker daemon bug - I'm not sure which at this point.  Please open a new bz, and I suggest capturing:

- registry pod goroutines (SIGABRT)
- registry pod log
- docker daemon goroutines on a node hosting a failed build (SIGABRT)
- docker daemon log on same

Comment 13 Vikas Laad 2017-06-01 19:30:06 UTC
Verified in following version

openshift v3.6.79
kubernetes v1.6.1+5115d708d7
etcd 3.1.0


Completed 100 cycles of 30 concurrent builds. No build was stuck in Running state. Created another bug for the problem mentioned in Comment #12.

Comment 18 errata-xmlrpc 2017-11-28 21:53:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.