Bug 1314381 - Intermittent long push duration on EC2 using S3 storage for the Docker registry
Intermittent long push duration on EC2 using S3 storage for the Docker registry
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry (Show other bugs)
3.2.0
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Maciej Szulik
Mike Fiedler
:
: 1332285 (view as bug list)
Depends On:
Blocks: OSOPS_V3 1267746 1332285 1382795
  Show dependency treegraph
 
Reported: 2016-03-03 08:27 EST by Mike Fiedler
Modified: 2017-03-08 13 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: S3 communication library was not efficient enough to support high loads of data. Consequence: Some pushes to the docker registry ended up taking relatively long. Fix: Upgrade both docker distribution code along with S3 driver. Result: Improved stability and performance of docker push operations.
Story Points: ---
Clone Of:
: 1382795 (view as bug list)
Environment:
Last Closed: 2016-09-27 05:35:53 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Sample config for Docker S3 (353 bytes, application/x-fluid)
2016-03-03 08:27 EST, Mike Fiedler
no flags Details
pprof profile #1 of slow push event (30.15 KB, application/x-gzip)
2016-04-06 15:19 EDT, Mike Fiedler
no flags Details
pprof #2 of slow push event (27.15 KB, application/x-gzip)
2016-04-06 15:19 EDT, Mike Fiedler
no flags Details
registry log (debug) of slow push event (2.19 MB, application/x-gzip)
2016-04-06 15:21 EDT, Mike Fiedler
no flags Details
s3 request log for slow push event on 6 April 2016 (354.17 KB, application/x-gzip)
2016-04-06 19:45 EDT, Mike Fiedler
no flags Details

  None (edit)
Description Mike Fiedler 2016-03-03 08:27:03 EST
Created attachment 1132789 [details]
Sample config for Docker S3

Description of problem:

During concurrent build testing to put stress on the registry push process with S3 Docker backend storage, tests are seeing two types of push performance:  a) "normal"/fast pushes to the registry and b) extremely slow pushes.   Operations has reported a similar experience with their online test beds.

Configuration (cluster and S3 storage all on us-west-2):
   - Single master
   - Two router nodes
   - One registry node configured for S3 (attached)
   - 10 worker nodes for builds

Example of the performance issue:
   - 10 projects, each with 1 app defined from the django-ex template
   - Build the 10 apps concurrently

Each build results in a push of ~6MB to the registry.   The "normal" pushes average 7-8 seconds/push.   However, without any configuration change I've been able to identify yet, when the problem occurs, all builds average 3-4 minutes/push.   


Version-Release number of selected component (if applicable):

3.1.1.908

How reproducible:  Difficult.  Intermittent problem


Steps to Reproduce:
1.  Create the configuration described above or a configuration capable of concurrent builds
2.  Configure S3 as the backend storage (sample attached)
3.  Create 10 projects and 10 new applications
4.  Build all applications repeatedly and watch the push times in the build logs

Actual results:

Registry push times vary from 7 seconds to 4 minutes fro django-ex.   The times do not vary randomly (from what has been seen so far anyways) but will either all be fast or all slow.

Expected results:

Consistent average push times in line with the quantity of data being transferred.

Additional info:

mmasters and dmace are aware of this.   Origin issue https://github.com/openshift/origin/issues/7740 has been created to add pprof instrumentation to help track this down.
Comment 1 Mike Fiedler 2016-03-03 08:31:48 EST
IMO this is not directly actionable right now.  Further investigation required.  Next steps.

1. Run with the new profiling when available.  Get profiling for fast and slow.
2. Continue trying to determine what triggers the slow behavior.  Is it purely an S3 thing?
3. Gather wireshark/tcpdump data for fast and slow behavior.  In both cases many TCP resets are seen during TLS handshake (mainly in the direction of the registry node -> S3) - relevant?
Comment 2 Mike Fiedler 2016-03-15 13:30:36 EDT
I've confirmed that what I was seeing is the same thing ops is seeing with slow pushes in their staging environment.   When the problem triggers (I've been unable to come up with a guaranteed trigger),  the S3 request logs show a pattern of 10 second idle periods between bursts of requests.   We opened a ticket with Amazon and they see no throttling on their side and have asked that we do tracing/logging on the application side to see what responses we are getting from the S3 API.


@pweil @miminar, I think we should consider removing UpcomingRelease,   This seems to be a blocker or ops that is under sporadic active investigation.
Comment 3 Mike Fiedler 2016-03-15 13:30:46 EDT
Example of suspicious logs from ops:

00178:[14/Mar/2016:21:02:57 +0000] REST.PUT.OBJECT registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-a9f8b/nodejs-example/_uploads/9cbb4570-dd3f-4719-8501-a8361c626d5c/hashstates/sha256/0 "PUT /osostage-docker/registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-a9f8b/nodejs-example/_uploads/9cbb4570-dd3f-4719-8501-a8361c626d5c/hashstates/sha256/0 HTTP/1.1" 200 - - 157 38 10 "-" "Go 1.1 package http" -

00179:[14/Mar/2016:21:02:57 +0000] REST.PUT.OBJECT registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-db8f4/nodejs-example/_uploads/79f748d9-60e0-47af-81e7-1eba2fe406dd/hashstates/sha256/0 "PUT /osostage-docker/registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-db8f4/nodejs-example/_uploads/79f748d9-60e0-47af-81e7-1eba2fe406dd/hashstates/sha256/0 HTTP/1.1" 200 - - 157 55 32 "-" "Go 1.1 package http" -

00180:[14/Mar/2016:21:03:07 +0000] REST.GET.OBJECT registry/docker/registry/v2/blobs/sha256/77/773b0fe047881bbaa322e7ed7294f4fcd8bb1eb95a6bae9f9e76fe5c01b47410/data "GET /osostage-docker/registry/docker/registry/v2/blobs/sha256/77/773b0fe047881bbaa322e7ed7294f4fcd8bb1eb95a6bae9f9e76fe5c01b47410/data HTTP/1.1" 200 - 438 438 22 21 "-" "Go 1.1 package http" -

00181:[14/Mar/2016:21:03:07 +0000] REST.GET.OBJECT registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-a9f8b/nodejs-example/_uploads/9cbb4570-dd3f-4719-8501-a8361c626d5c/startedat "GET /osostage-docker/registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-a9f8b/nodejs-example/_uploads/9cbb4570-dd3f-4719-8501-a8361c626d5c/startedat HTTP/1.1" 200 - 20 20 18 18 "-" "Go 1.1 package http" -

00182:[14/Mar/2016:21:03:07 +0000] REST.GET.OBJECT registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-db8f4/nodejs-example/_uploads/79f748d9-60e0-47af-81e7-1eba2fe406dd/startedat "GET /osostage-docker/registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-db8f4/nodejs-example/_uploads/79f748d9-60e0-47af-81e7-1eba2fe406dd/startedat HTTP/1.1" 200 - 20 20 22 21 "-" "Go 1.1 package http" -

00183:[14/Mar/2016:21:03:07 +0000] REST.PUT.OBJECT registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-7b5c2/nodejs-example/_uploads/ea3d848b-1506-4ac9-b941-3074719431ca/hashstates/sha256/0 "PUT /osostage-docker/registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-7b5c2/nodejs-example/_uploads/ea3d848b-1506-4ac9-b941-3074719431ca/hashstates/sha256/0 HTTP/1.1" 200 - - 157 102 7 "-" "Go 1.1 package http" -

00184:[14/Mar/2016:21:03:17 +0000] REST.GET.BUCKET - "GET /osostage-docker/?delimiter=&marker=&max-keys=1&prefix=registry%2Fdocker%2Fregistry%2Fv2%2Frepositories%2Fops-monitor-appbuildstage-master-7b5c2%2Fnodejs-example%2F_uploads%2Fea3d848b-1506-4ac9-b941-3074719431ca%2Fdata HTTP/1.1" 200 - 383 - 29 28 "-" "Go 1.1 package http" -

00185:[14/Mar/2016:21:03:17 +0000] REST.GET.BUCKET - "GET /osostage-docker/?delimiter=&marker=&max-keys=1&prefix=registry%2Fdocker%2Fregistry%2Fv2%2Frepositories%2Fops-monitor-appbuildstage-master-a9f8b%2Fnodejs-example%2F_uploads%2F9cbb4570-dd3f-4719-8501-a8361c626d5c%2Fdata HTTP/1.1" 200 - 383 - 26 25 "-" "Go 1.1 package http" -

00186:[14/Mar/2016:21:03:17 +0000] REST.GET.BUCKET - "GET /osostage-docker/?delimiter=&marker=&max-keys=1&prefix=registry%2Fdocker%2Fregistry%2Fv2%2Frepositories%2Fops-monitor-appbuildstage-master-db8f4%2Fnodejs-example%2F_uploads%2F79f748d9-60e0-47af-81e7-1eba2fe406dd%2Fdata HTTP/1.1" 200 - 383 - 23 23 "-" "Go 1.1 package http" -
Comment 4 Mike Fiedler 2016-03-16 10:03:43 EDT
After discussion with the QE lead, I am removing the UpcomingRelease tag from this so that it will show up for triage.  The operations team hit this again on 14 March and this would be a blocker in production.   I will continue to attempt to reproduce in a test environment - I've been unsuccessful recently.   The next step in either test or in ops staging would be to turn on docker registry debug and to run with a version of OpenShift which has pprof enabled for the registry.  Dan Mace and Miciah Masters are familiar with this.
Comment 5 Mike Fiedler 2016-03-16 21:26:30 EDT
One relevant point that came up during the Online/Performance meeting today is that the  ops staging env is 3.1.1.6 and the last build where I was able to reproduce the issue was 3.1.1.910 (early 3.2)
Comment 6 Maciej Szulik 2016-03-30 05:36:33 EDT
Waiting for the information from Mike about his finding.
Comment 7 Paul Weil 2016-03-30 10:36:11 EDT
<pweil> mffiedler, any news on https://bugzilla.redhat.com/show_bug.cgi?id=1314381 ?
* jayunit100 (~textual@10.10.52.148) has joined #aos
<mffiedler> pwei, unable to reproduce lately.  you can put that on_qa if you want.  i will be doing more testing later this week.  maybe today
<mffiedler> pweil ^
Comment 8 Mike Fiedler 2016-04-06 15:19 EDT
Created attachment 1144329 [details]
pprof  profile #1 of slow push event
Comment 9 Mike Fiedler 2016-04-06 15:19 EDT
Created attachment 1144330 [details]
pprof #2 of slow push event
Comment 10 Mike Fiedler 2016-04-06 15:21 EDT
Created attachment 1144331 [details]
registry log (debug) of slow push event
Comment 11 Mike Fiedler 2016-04-06 15:21:58 EDT
5.12s of 6.60s total (77.58%)
Dropped 388 nodes (cum <= 0.03s)
Showing top 40 nodes out of 307 (cum >= 0.70s)
      flat  flat%   sum%        cum   cum%
     0.90s 13.64% 13.64%      0.90s 13.64%  crypto/cipher.(*gcm).mul
     0.86s 13.03% 26.67%      0.91s 13.79%  syscall.Syscall
     0.41s  6.21% 32.88%      0.41s  6.21%  scanblock
     0.37s  5.61% 38.48%      0.37s  5.61%  github.com/ugorji/go/codec.(*bytesDecReader).readn1
     0.27s  4.09% 42.58%      0.63s  9.55%  github.com/ugorji/go/codec.(*jsonDecDriver).appendStringAsBytes
     0.27s  4.09% 46.67%      0.29s  4.39%  runtime.mallocgc
     0.23s  3.48% 50.15%      0.23s  3.48%  crypto/aes.encryptBlockAsm
     0.14s  2.12% 52.27%      0.14s  2.12%  crypto/cipher.getUint64
     0.14s  2.12% 54.39%      0.14s  2.12%  runtime.MSpan_Sweep
     0.10s  1.52% 55.91%      0.10s  1.52%  crypto/elliptic.p256ReduceDegree
     0.10s  1.52% 57.42%      0.10s  1.52%  runtime.memmove
     0.09s  1.36% 58.79%      0.56s  8.48%  crypto/cipher.(*gcm).counterCrypt
     0.08s  1.21% 60.00%      1.12s 16.97%  crypto/cipher.(*gcm).updateBlocks
     0.07s  1.06% 61.06%      0.13s  1.97%  crypto/cipher.xorWords
     0.06s  0.91% 61.97%      0.32s  4.85%  crypto/aes.(*aesCipher).Encrypt
     0.06s  0.91% 62.88%      0.06s  0.91%  crypto/cipher.fastXORWords
     0.06s  0.91% 63.79%      0.13s  1.97%  encoding/json.nextValue
     0.06s  0.91% 64.70%      0.06s  0.91%  math/big.divWW
     0.06s  0.91% 65.61%      0.06s  0.91%  math/big.mulAddVWW
     0.06s  0.91% 66.52%      0.25s  3.79%  math/big.nat.divLarge
     0.06s  0.91% 67.42%      0.10s  1.52%  runtime.deferreturn
     0.06s  0.91% 68.33%      0.06s  0.91%  runtime.futex
     0.06s  0.91% 69.24%      0.07s  1.06%  runtime.mapiternext
     0.05s  0.76% 70.00%      0.08s  1.21%  encoding/json.checkValid
     0.05s  0.76% 70.76%      0.05s  0.76%  math/big.addMulVVW
     0.05s  0.76% 71.52%      0.05s  0.76%  math/big.subVV
     0.04s  0.61% 72.12%      0.05s  0.76%  encoding/json.(*encodeState).string
     0.04s  0.61% 72.73%      0.04s  0.61%  encoding/json.stateInString
     0.04s  0.61% 73.33%      0.04s  0.61%  regexp.(*machine).add
     0.04s  0.61% 73.94%      0.05s  0.76%  runtime.mapaccess2
     0.04s  0.61% 74.55%      0.04s  0.61%  runtime.memeqbody
     0.03s  0.45% 75.00%      0.26s  3.94%  crypto/aes.encryptBlock
     0.03s  0.45% 75.45%      0.07s  1.06%  fmt.(*pp).doPrintf
     0.03s  0.45% 75.91%      0.06s  0.91%  reflect.(*structType).FieldByName
     0.02s   0.3% 76.21%      1.48s 22.42%  bufio.(*Writer).flush
     0.02s   0.3% 76.52%      1.39s 21.06%  crypto/tls.(*Conn).readRecord
     0.02s   0.3% 76.82%      0.10s  1.52%  k8s.io/kubernetes/pkg/client/cache.storePodsNamespacer.List
     0.02s   0.3% 77.12%      0.07s  1.06%  math/big.basicMul
     0.02s   0.3% 77.42%      0.19s  2.88%  runtime.newobject
     0.01s  0.15% 77.58%      0.70s 10.61%  bufio.(*Reader).ReadSlice
Comment 12 Mike Fiedler 2016-04-06 16:11:49 EDT
pbench results during slow pushes can be found here:  http://perf-infra.ec2.breakage.org/incoming/ip-172-31-43-59/slow_push/

This should be directly comparable to results for a fast push of the same number of concurrent builds (5) from earlier the same day:  http://perf-infra.ec2.breakage.org/incoming/ip-172-31-43-59/5_builds_2_nodes_ssd/

Legend:

ip-172-31-43-59: master
ip-172-31-43-60: registry (and router)
ip-172-31-43-61: builder #1
ip-172-31-43-62: builder #2

S3 logs are showing the 10-second gap behavior described in comment 3.  I will attach when they are complete and show this time frame in full.
Comment 13 Mike Fiedler 2016-04-06 19:45 EDT
Created attachment 1144478 [details]
s3 request log for slow push event on 6 April 2016

Attached s3 log.  If you search for "19:00" you can see the timeframe that corresponds to the pprof.   Note the ~10 second gaps between batches of entries.  The pbench was later - just add +4hrs to the times in the pbench data to find the S3 logs - the 10 second gaps should be happenig then, as well.

The beginning of this log is representative of what the s3 request logs look like for fast/normal pushes.  No 10 second gaps.
Comment 14 Maciej Szulik 2016-04-07 08:34:19 EDT
Here are my initial finding:


docker registry pushes a layer and that completes immediately:

time="2016-04-06T15:00:08.504911485-04:00" msg="s3.PutContent(...715de692-205b-48d5-8cf9-38295ef5bd6f/hashstates/sha256/3040001
time="2016-04-06T15:00:08.505014056-04:00" msg="response completed" 

accompanying S3 log:

10365:[06/Apr/2016:19:00:08 +0000] REST.PUT.OBJECT ...715de692-205b-48d5-8cf9-38295ef5bd6f/hashstates/sha256/3040001


And the next operation is the one that is the root cause:

again registry:

time="2016-04-06T15:00:08.555967319-04:00" msg="authorizing request"
time="2016-04-06T15:00:08.5564915-04:00" msg="(*quotaRestrictedBlobStore).Resume: starting"
time="2016-04-06T15:00:08.556541043-04:00" level=debug msg="(*linkedBlobStore).Resume"
time="2016-04-06T15:00:08.577438326-04:00" level=debug msg="s3.Stat(...3c0b203b-f626-4fa1-af4f-f8a6c154da77/data

--- 10 seconds hang ---

172.20.2.1 - - [06/Apr/2016:15:00:17 -0400] "GET /healthz HTTP/1.1" 200 0 "" "Go 1.1 package http"
172.20.2.1 - - [06/Apr/2016:15:00:17 -0400] "GET /healthz HTTP/1.1" 200 0 "" "Go 1.1 package http"
time="2016-04-06T15:00:18.709968262-04:00" level=debug msg="s3.List(...3c0b203b-f626-4fa1-af4f-f8a6c154da77/hashstates/sha256

accompanying S3 log:

10367:[06/Apr/2016:19:00:18 +0000] REST.COPY.OBJECT_GET registry/docker/registry/v2/repositories/t1/cakephp-example/_uploads/3c0b203b-f626-4fa1-af4f-f8a6c154da77/data - 200 - - 3039950 - - - - -


Which means it takes almost 10 seconds for the request to get from registry to s3. Which is completely related to what Andy noticed in bug 1321968.

@Mike can you sync with Andy how to debug the DNS resolution part?
Comment 16 Paul Weil 2016-04-08 09:11:18 EDT
I am re-adding this to upcoming release to remove this from the 3.2 blocker list.  

Justification: while this has a high impact on performance the functionality is still working.  This is not an absolute blocker and doesn't seem to be a regression.  

We will continue to treat this as high priority.
Comment 17 Maciej Szulik 2016-04-08 10:01:40 EDT
Current finding are that the problems are cause ONLY in s3.GetContent or s3.Stat, from the logs we have. Looking into s3 driver code we've identified [1] as possible source of the problem. I'll keep investigating why we would be getting those retires. 

[1] https://github.com/openshift/origin/blob/9eaae7c94774c415f227ed5b187a260b1d0b9dbf/Godeps/_workspace/src/github.com/AdRoll/goamz/s3/s3.go#L91-L95
Comment 18 Dan McPherson 2016-05-06 17:08:56 EDT
*** Bug 1332285 has been marked as a duplicate of this bug. ***
Comment 19 Stefanie Forrester 2016-05-23 15:03:27 EDT
On one customer cluster, Ops has found that restarting the docker-registry pods (scale down and then back up) temporarily speeds up docker push times. Frequent pruning helps too, but when pruning is not enough, a restart puts it back into a good state.
Comment 20 Mike Fiedler 2016-05-23 15:41:10 EDT
I have seen docker-registry restarts via scale down/up clear the issue as well.
Comment 21 Troy Dawson 2016-05-26 10:34:54 EDT
As an end user this get's really frustrating.  A push of several images that originally took 45 minutes (After a registry trim and restart of the nodes), is now up to 4 hours.
It's gradual, it took 2 months to get to this point, but it's at that point.

I know this isn't a blocker, the registries are still usable.  But as an end user, it's easier for us to find something else than continue to use a registry with an s3 backend.
Comment 24 Maciej Szulik 2016-06-28 05:16:47 EDT
With the new docker registry [1] comes different s3 driver, specifically it's using AWS driver instead of external one. According to tests done by Michal the performance should be greatly improved. Mike can I ask you to test it again with the tests you have? I remember you were quite good with reproducing this bug in almost no time ;)

[1] https://github.com/openshift/origin/pull/8938
Comment 25 Mike Fiedler 2016-06-28 11:38:17 EDT
This needs to be in a build to go ON_QA if I understand the process.   I think it should go to POST status until it is available in an OpenShift puddle.
Comment 26 Mike Fiedler 2016-06-28 21:29:18 EDT
The correct status is MODIFIED until it hits a puddle.  Learned something new...
Comment 27 Maciej Szulik 2016-06-29 04:35:24 EDT
So did I ;)
Comment 32 Maciej Szulik 2016-07-20 08:30:15 EDT
Based on previous comment moving to QA.
Comment 34 Mike Fiedler 2016-08-22 07:48:23 EDT
Should be resolved by https://bugzilla.redhat.com/show_bug.cgi?id=1368024.

Making vlaad the QA contact - I will help him verify.
Comment 35 Mike Fiedler 2016-09-01 12:45:59 EDT
No slow S3 pushes have been seen during 3.3 SVT concurrent build testing or in repeated build testing specifically intended to hit this S3 slow push issue.   Marking verified.
Comment 37 errata-xmlrpc 2016-09-27 05:35:53 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.