Bug 1436841 - Concurrent build registry push hangs - baremetal cluster with CNS Gluster registry storage
Summary: Concurrent build registry push hangs - baremetal cluster with CNS Gluster reg...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 3.5.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.7.0
Assignee: Michal Minar
QA Contact: Mike Fiedler
URL:
Whiteboard: aos-scalability-35
Depends On:
Blocks: 1465325
TreeView+ depends on / blocked
 
Reported: 2017-03-28 19:56 UTC by Mike Fiedler
Modified: 2017-11-28 21:53 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: A bug in a regulator of concurrent filesystem access could cause a routine to hang. Consequence: When the registry configured with filesystem storage driver had been under heavy load, some could have hang forever. Fix: The regulator has been fixed. Result: The concurrent pushes no longer hang.
Clone Of:
: 1465325 (view as bug list)
Environment:
Last Closed: 2017-11-28 21:53:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
registry and build logs (5.16 MB, application/x-gzip)
2017-03-28 19:56 UTC, Mike Fiedler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Mike Fiedler 2017-03-28 19:56:59 UTC
Created attachment 1267114 [details]
registry and build logs

Description of problem:

80 node bare metal cluster (this is CNCF gear).  Registry is configured to use a pvc for registry storage.   The pvc is backed by Gluster managed by CNS pods.

Doing a few concurrent builds of the cakephp-mysql quick start works great.  Up to around 50 concurrent builds work fine.  At around 60 or more concurrent builds (less than the number of nodes in the cluster), builds start hanging during the registry push.

The cpu profile for the registry and gluster nodes shows a cpu/network spike on the first set of pushes and then idle.   I rsh'ed into the registry pod and checked the registry file system and it was readable and writeable.

The cpu spike is 1.4/46 cores on the gluster nodes and 6/36 cores on the registry

I captured the build logs and registry log.   The registry log is so noisy even on level=warn that I did not really know what to look for.   Start at the bottom as the pod was up for quite a while.   Let me know if you want different loglevel or other info to debug.

I will add comments with links to performance profile data.

Version-Release number of selected component (if applicable): 3.5.0.39


How reproducible: always on this cluster


Steps to Reproduce:

I'll give the steps for this CNCF baremetal cluster, but I doubt it would translate to other configurations.

1.  80 node cluster, 1 registry pod, 3 CNS/gluster and 1 CNS/heketi pod
2.  Registry configured with a pvc backed by cns gluster storage
3.  Verify builds are working in general and that images are pushed successfully
4.  Verify up to 50 concurrent builds work successfully and that images are pushed
5.  Around 60-75 concurrent builds, many (not all) builds hang on registry pushes.   See attached logs

Actual results:

Many builds hang during the registry push.  See attached build logs.



Expected results:

Concurrent builds work at scale of hundreds/thousands of concurrent builds.


Additional info:

Comment 7 Mike Fiedler 2017-04-06 20:14:41 UTC
With REGISTRY_STORAGE_FILESYSTEM_MAXTHREADS=500 set on the docker-registry dc I can successfully run 250+ concurrent builds.   It looks like the locking/race condition theory could be correct.

Comment 8 Mike Fiedler 2017-04-10 12:52:02 UTC
Is this env var something we should put in the documentation for OpenShift?

Comment 12 Michal Minar 2017-06-12 11:21:51 UTC
Here's the fix:

https://github.com/docker/distribution/pull/2299
https://github.com/openshift/origin/pull/14581

Kudos to Oleg Bulatov.

Comment 13 Michal Minar 2017-06-14 12:22:33 UTC
And the fix has been merged.

Do we need to back-port it? How far?

Comment 14 Mike Fiedler 2017-06-20 10:16:00 UTC
re: comment 13.  Probably not a question for me.   Support?  @Eric?

Comment 15 Eric Rich 2017-06-20 10:58:13 UTC
(In reply to Mike Fiedler from comment #14)
> re: comment 13.  Probably not a question for me.   Support?  @Eric?

Gluster only ships on 3.4 and 3.5 (for now) and I don't think (need to confirm with gluster team) that they certify using gluster for the registry until 3.5 or 3.6! 

So if we backports this given that context 3.5 might be as far as we need to go back?

Comment 16 Mike Fiedler 2017-06-20 12:38:22 UTC
This does have the potential of impacting more than Gluster.  It is a general filesystem driver bug for the registry.  I have been unable to trigger it on other technologies like EC2 and Cinder without resorting to artificially lowering the OOTB registry setting for maxthreads.

My recommendation would be to backport to 3.5 (if straightforward) and wait for customer cases, if any, for other releases.

Comment 17 Eric Rich 2017-06-20 13:03:58 UTC
(In reply to Mike Fiedler from comment #8)
> Is this env var something we should put in the documentation for OpenShift?

I am more concerned about documenting any turntables that we might need to discuss with this? 

However: 

(In reply to Mike Fiedler from comment #16)
 
> My recommendation would be to backport to 3.5 (if straightforward) and wait
> for customer cases, if any, for other releases.

I like this approach!

Comment 18 Mike Fiedler 2017-06-20 14:01:46 UTC
>> Is this env var something we should put in the documentation for OpenShift?
>
>I am more concerned about documenting any turntables that we might need to discuss with this? 

I am testing now, but this fix should make it unnecessary to expose any registry internals here.  I think the default value which expedites 100 concurrent builds should be ok.   The fix will delay other pushes a bit until threads free up, but I think that's a fair tradeoff.   The actual tune-ables around this are documented in the docker registry official doc.

Comment 19 Mike Fiedler 2017-06-21 03:42:23 UTC
Verified on 3.6.116.  With the default maxthreads (100), ran 300 concurrent builds with  a filesystem storage driver for the registry.  Ran 15,000 builds like this with no hangs.

One significant difference to note from the original problem reported:  We no longer have access to the baremetal/CNS/Gluster environment.   This verification was performed using PV/PVC backed by AWS EBS volumes instead of PV/PVC backed by CNS.   This could be a factor, so noting it here.

Comment 23 errata-xmlrpc 2017-11-28 21:53:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.