Bug 1455991 - Builds are stuck in Running at the push stage
Summary: Builds are stuck in Running at the push stage
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Oleg Bulatov
QA Contact: Hongkai Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-26 15:13 UTC by Vikas Laad
Modified: 2017-08-16 19:51 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-10 05:25:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Vikas Laad 2017-05-26 15:13:07 UTC
Description of problem:
I am running concurrent builds in scale environment, I saw this bug with 250 concurrent builds.

Cloning "https://github.com/redhat-performance/cakephp-ex.git" ...
        Commit: 0014ddebb91bc7dff3a1dabfbd7b51da762a6677 (made changes to enable database example)
        Author: ofthecure <robdean.smith>
        Date:   Mon Apr 25 14:33:06 2016 -0400
DEPRECATED: Use .s2i/bin instead of .sti/bin
---> Installing application source...
Pushing image 10.202.162.54:5000/proj565/cakephp-mysql-example:latest ...
Pushed 3/5 layers, 61% complete
Pushed 4/5 layers, 82% complete

Version-Release number of selected component (if applicable):
# openshift version
openshift v3.6.74
kubernetes v1.6.1+5115d708d7
etcd 3.1.0
# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-14.el7.x86_64
 Go version:      go1.7.4
 Git commit:      3a094bd/1.12.6
 Built:           Thu Mar 16 14:27:53 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-14.el7.x86_64
 Go version:      go1.7.4
 Git commit:      3a094bd/1.12.6
 Built:           Thu Mar 16 14:27:53 2017
 OS/Arch:         linux/amd64


How reproducible:
Start concurrent builds for cakephp app happens with 250 concurrent builds.

Actual results:
Builds are stuck in Running state

Expected results:
Build should finish successfully.

Additional info:
--- Comment #12 from Jim Minter <jminter> ---
Different bug.  Looking at the environment in question, all the stuck builds
are stuck on the final image push.  In the sample in c10, s2i is pushing to the
Docker daemon and is waiting for the Docker daemon to report completed.  I
think this is most likely to be an OpenShift registry bug or a Docker daemon
bug - I'm not sure which at this point.  Please open a new bz, and I suggest
capturing:

- registry pod goroutines (SIGABRT)
- registry pod log
- docker daemon goroutines on a node hosting a failed build (SIGABRT)
- docker daemon log on same

I am going to provide a link for all these logs.

Comment 6 Oleg Bulatov 2017-06-02 13:43:45 UTC
https://github.com/docker/distribution/pull/2299

Comment 7 Michal Fojtik 2017-06-12 07:38:09 UTC
(In reply to Oleg Bulatov from comment #6)
> https://github.com/docker/distribution/pull/2299

Oleg, can we pick this fix for registry to close this bug?

Comment 8 Oleg Bulatov 2017-06-12 08:13:50 UTC
Yes, we can. I expected it would be merged into upstream a little bit faster, but they didn't care.

Comment 9 Oleg Bulatov 2017-06-12 09:49:04 UTC
https://github.com/openshift/origin/pull/14581

Comment 10 ge liu 2017-06-14 03:11:57 UTC
The image(devenv-rhel7_6350) is not ready in aws according to PR in comment 9, we will test it after it ready.

Comment 11 Vikas Laad 2017-06-14 12:28:20 UTC
Ge Liu, I will test it in scale environment. will assign it to myself.

Comment 13 Hongkai Liu 2017-07-07 19:14:31 UTC
Rerun the test with 50 concurrent builds, all builds succeeded.

Comment 14 Hongkai Liu 2017-07-07 19:29:14 UTC
(In reply to Hongkai Liu from comment #13)
> Rerun the test with 50 concurrent builds, all builds succeeded.

Verified on 3.6.133

Comment 16 errata-xmlrpc 2017-08-10 05:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.