Created attachment 1267114 [details]
registry and build logs
Description of problem:
80 node bare metal cluster (this is CNCF gear). Registry is configured to use a pvc for registry storage. The pvc is backed by Gluster managed by CNS pods.
Doing a few concurrent builds of the cakephp-mysql quick start works great. Up to around 50 concurrent builds work fine. At around 60 or more concurrent builds (less than the number of nodes in the cluster), builds start hanging during the registry push.
The cpu profile for the registry and gluster nodes shows a cpu/network spike on the first set of pushes and then idle. I rsh'ed into the registry pod and checked the registry file system and it was readable and writeable.
The cpu spike is 1.4/46 cores on the gluster nodes and 6/36 cores on the registry
I captured the build logs and registry log. The registry log is so noisy even on level=warn that I did not really know what to look for. Start at the bottom as the pod was up for quite a while. Let me know if you want different loglevel or other info to debug.
I will add comments with links to performance profile data.
Version-Release number of selected component (if applicable): 188.8.131.52
How reproducible: always on this cluster
Steps to Reproduce:
I'll give the steps for this CNCF baremetal cluster, but I doubt it would translate to other configurations.
1. 80 node cluster, 1 registry pod, 3 CNS/gluster and 1 CNS/heketi pod
2. Registry configured with a pvc backed by cns gluster storage
3. Verify builds are working in general and that images are pushed successfully
4. Verify up to 50 concurrent builds work successfully and that images are pushed
5. Around 60-75 concurrent builds, many (not all) builds hang on registry pushes. See attached logs
Many builds hang during the registry push. See attached build logs.
Concurrent builds work at scale of hundreds/thousands of concurrent builds.
With REGISTRY_STORAGE_FILESYSTEM_MAXTHREADS=500 set on the docker-registry dc I can successfully run 250+ concurrent builds. It looks like the locking/race condition theory could be correct.
Is this env var something we should put in the documentation for OpenShift?
Here's the fix:
Kudos to Oleg Bulatov.
And the fix has been merged.
Do we need to back-port it? How far?
re: comment 13. Probably not a question for me. Support? @Eric?
(In reply to Mike Fiedler from comment #14)
> re: comment 13. Probably not a question for me. Support? @Eric?
Gluster only ships on 3.4 and 3.5 (for now) and I don't think (need to confirm with gluster team) that they certify using gluster for the registry until 3.5 or 3.6!
So if we backports this given that context 3.5 might be as far as we need to go back?
This does have the potential of impacting more than Gluster. It is a general filesystem driver bug for the registry. I have been unable to trigger it on other technologies like EC2 and Cinder without resorting to artificially lowering the OOTB registry setting for maxthreads.
My recommendation would be to backport to 3.5 (if straightforward) and wait for customer cases, if any, for other releases.
(In reply to Mike Fiedler from comment #8)
> Is this env var something we should put in the documentation for OpenShift?
I am more concerned about documenting any turntables that we might need to discuss with this?
(In reply to Mike Fiedler from comment #16)
> My recommendation would be to backport to 3.5 (if straightforward) and wait
> for customer cases, if any, for other releases.
I like this approach!
>> Is this env var something we should put in the documentation for OpenShift?
>I am more concerned about documenting any turntables that we might need to discuss with this?
I am testing now, but this fix should make it unnecessary to expose any registry internals here. I think the default value which expedites 100 concurrent builds should be ok. The fix will delay other pushes a bit until threads free up, but I think that's a fair tradeoff. The actual tune-ables around this are documented in the docker registry official doc.
Verified on 3.6.116. With the default maxthreads (100), ran 300 concurrent builds with a filesystem storage driver for the registry. Ran 15,000 builds like this with no hangs.
One significant difference to note from the original problem reported: We no longer have access to the baremetal/CNS/Gluster environment. This verification was performed using PV/PVC backed by AWS EBS volumes instead of PV/PVC backed by CNS. This could be a factor, so noting it here.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.