Bug 1436841

Summary: Concurrent build registry push hangs - baremetal cluster with CNS Gluster registry storage
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: Image RegistryAssignee: Michal Minar <miminar>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: aos-bugs, erich, haowang, hchiramm, jeder, mfojtik, mifiedle, miminar, pprakash, vlaad, xtian
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: aos-scalability-35
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: A bug in a regulator of concurrent filesystem access could cause a routine to hang. Consequence: When the registry configured with filesystem storage driver had been under heavy load, some could have hang forever. Fix: The regulator has been fixed. Result: The concurrent pushes no longer hang.
Story Points: ---
Clone Of:
: 1465325 (view as bug list) Environment:
Last Closed: 2017-11-28 21:53:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1465325    
Attachments:
Description Flags
registry and build logs none

Description Mike Fiedler 2017-03-28 19:56:59 UTC
Created attachment 1267114 [details]
registry and build logs

Description of problem:

80 node bare metal cluster (this is CNCF gear).  Registry is configured to use a pvc for registry storage.   The pvc is backed by Gluster managed by CNS pods.

Doing a few concurrent builds of the cakephp-mysql quick start works great.  Up to around 50 concurrent builds work fine.  At around 60 or more concurrent builds (less than the number of nodes in the cluster), builds start hanging during the registry push.

The cpu profile for the registry and gluster nodes shows a cpu/network spike on the first set of pushes and then idle.   I rsh'ed into the registry pod and checked the registry file system and it was readable and writeable.

The cpu spike is 1.4/46 cores on the gluster nodes and 6/36 cores on the registry

I captured the build logs and registry log.   The registry log is so noisy even on level=warn that I did not really know what to look for.   Start at the bottom as the pod was up for quite a while.   Let me know if you want different loglevel or other info to debug.

I will add comments with links to performance profile data.

Version-Release number of selected component (if applicable): 3.5.0.39


How reproducible: always on this cluster


Steps to Reproduce:

I'll give the steps for this CNCF baremetal cluster, but I doubt it would translate to other configurations.

1.  80 node cluster, 1 registry pod, 3 CNS/gluster and 1 CNS/heketi pod
2.  Registry configured with a pvc backed by cns gluster storage
3.  Verify builds are working in general and that images are pushed successfully
4.  Verify up to 50 concurrent builds work successfully and that images are pushed
5.  Around 60-75 concurrent builds, many (not all) builds hang on registry pushes.   See attached logs

Actual results:

Many builds hang during the registry push.  See attached build logs.



Expected results:

Concurrent builds work at scale of hundreds/thousands of concurrent builds.


Additional info:

Comment 7 Mike Fiedler 2017-04-06 20:14:41 UTC
With REGISTRY_STORAGE_FILESYSTEM_MAXTHREADS=500 set on the docker-registry dc I can successfully run 250+ concurrent builds.   It looks like the locking/race condition theory could be correct.

Comment 8 Mike Fiedler 2017-04-10 12:52:02 UTC
Is this env var something we should put in the documentation for OpenShift?

Comment 12 Michal Minar 2017-06-12 11:21:51 UTC
Here's the fix:

https://github.com/docker/distribution/pull/2299
https://github.com/openshift/origin/pull/14581

Kudos to Oleg Bulatov.

Comment 13 Michal Minar 2017-06-14 12:22:33 UTC
And the fix has been merged.

Do we need to back-port it? How far?

Comment 14 Mike Fiedler 2017-06-20 10:16:00 UTC
re: comment 13.  Probably not a question for me.   Support?  @Eric?

Comment 15 Eric Rich 2017-06-20 10:58:13 UTC
(In reply to Mike Fiedler from comment #14)
> re: comment 13.  Probably not a question for me.   Support?  @Eric?

Gluster only ships on 3.4 and 3.5 (for now) and I don't think (need to confirm with gluster team) that they certify using gluster for the registry until 3.5 or 3.6! 

So if we backports this given that context 3.5 might be as far as we need to go back?

Comment 16 Mike Fiedler 2017-06-20 12:38:22 UTC
This does have the potential of impacting more than Gluster.  It is a general filesystem driver bug for the registry.  I have been unable to trigger it on other technologies like EC2 and Cinder without resorting to artificially lowering the OOTB registry setting for maxthreads.

My recommendation would be to backport to 3.5 (if straightforward) and wait for customer cases, if any, for other releases.

Comment 17 Eric Rich 2017-06-20 13:03:58 UTC
(In reply to Mike Fiedler from comment #8)
> Is this env var something we should put in the documentation for OpenShift?

I am more concerned about documenting any turntables that we might need to discuss with this? 

However: 

(In reply to Mike Fiedler from comment #16)
 
> My recommendation would be to backport to 3.5 (if straightforward) and wait
> for customer cases, if any, for other releases.

I like this approach!

Comment 18 Mike Fiedler 2017-06-20 14:01:46 UTC
>> Is this env var something we should put in the documentation for OpenShift?
>
>I am more concerned about documenting any turntables that we might need to discuss with this? 

I am testing now, but this fix should make it unnecessary to expose any registry internals here.  I think the default value which expedites 100 concurrent builds should be ok.   The fix will delay other pushes a bit until threads free up, but I think that's a fair tradeoff.   The actual tune-ables around this are documented in the docker registry official doc.

Comment 19 Mike Fiedler 2017-06-21 03:42:23 UTC
Verified on 3.6.116.  With the default maxthreads (100), ran 300 concurrent builds with  a filesystem storage driver for the registry.  Ran 15,000 builds like this with no hangs.

One significant difference to note from the original problem reported:  We no longer have access to the baremetal/CNS/Gluster environment.   This verification was performed using PV/PVC backed by AWS EBS volumes instead of PV/PVC backed by CNS.   This could be a factor, so noting it here.

Comment 23 errata-xmlrpc 2017-11-28 21:53:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188