Created attachment 1132789 [details]
Sample config for Docker S3
Description of problem:
During concurrent build testing to put stress on the registry push process with S3 Docker backend storage, tests are seeing two types of push performance: a) "normal"/fast pushes to the registry and b) extremely slow pushes. Operations has reported a similar experience with their online test beds.
Configuration (cluster and S3 storage all on us-west-2):
- Single master
- Two router nodes
- One registry node configured for S3 (attached)
- 10 worker nodes for builds
Example of the performance issue:
- 10 projects, each with 1 app defined from the django-ex template
- Build the 10 apps concurrently
Each build results in a push of ~6MB to the registry. The "normal" pushes average 7-8 seconds/push. However, without any configuration change I've been able to identify yet, when the problem occurs, all builds average 3-4 minutes/push.
Version-Release number of selected component (if applicable):
How reproducible: Difficult. Intermittent problem
Steps to Reproduce:
1. Create the configuration described above or a configuration capable of concurrent builds
2. Configure S3 as the backend storage (sample attached)
3. Create 10 projects and 10 new applications
4. Build all applications repeatedly and watch the push times in the build logs
Registry push times vary from 7 seconds to 4 minutes fro django-ex. The times do not vary randomly (from what has been seen so far anyways) but will either all be fast or all slow.
Consistent average push times in line with the quantity of data being transferred.
mmasters and dmace are aware of this. Origin issue https://github.com/openshift/origin/issues/7740 has been created to add pprof instrumentation to help track this down.
IMO this is not directly actionable right now. Further investigation required. Next steps.
1. Run with the new profiling when available. Get profiling for fast and slow.
2. Continue trying to determine what triggers the slow behavior. Is it purely an S3 thing?
3. Gather wireshark/tcpdump data for fast and slow behavior. In both cases many TCP resets are seen during TLS handshake (mainly in the direction of the registry node -> S3) - relevant?
I've confirmed that what I was seeing is the same thing ops is seeing with slow pushes in their staging environment. When the problem triggers (I've been unable to come up with a guaranteed trigger), the S3 request logs show a pattern of 10 second idle periods between bursts of requests. We opened a ticket with Amazon and they see no throttling on their side and have asked that we do tracing/logging on the application side to see what responses we are getting from the S3 API.
@pweil @miminar, I think we should consider removing UpcomingRelease, This seems to be a blocker or ops that is under sporadic active investigation.
Example of suspicious logs from ops:
00178:[14/Mar/2016:21:02:57 +0000] REST.PUT.OBJECT registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-a9f8b/nodejs-example/_uploads/9cbb4570-dd3f-4719-8501-a8361c626d5c/hashstates/sha256/0 "PUT /osostage-docker/registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-a9f8b/nodejs-example/_uploads/9cbb4570-dd3f-4719-8501-a8361c626d5c/hashstates/sha256/0 HTTP/1.1" 200 - - 157 38 10 "-" "Go 1.1 package http" -
00179:[14/Mar/2016:21:02:57 +0000] REST.PUT.OBJECT registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-db8f4/nodejs-example/_uploads/79f748d9-60e0-47af-81e7-1eba2fe406dd/hashstates/sha256/0 "PUT /osostage-docker/registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-db8f4/nodejs-example/_uploads/79f748d9-60e0-47af-81e7-1eba2fe406dd/hashstates/sha256/0 HTTP/1.1" 200 - - 157 55 32 "-" "Go 1.1 package http" -
00180:[14/Mar/2016:21:03:07 +0000] REST.GET.OBJECT registry/docker/registry/v2/blobs/sha256/77/773b0fe047881bbaa322e7ed7294f4fcd8bb1eb95a6bae9f9e76fe5c01b47410/data "GET /osostage-docker/registry/docker/registry/v2/blobs/sha256/77/773b0fe047881bbaa322e7ed7294f4fcd8bb1eb95a6bae9f9e76fe5c01b47410/data HTTP/1.1" 200 - 438 438 22 21 "-" "Go 1.1 package http" -
00181:[14/Mar/2016:21:03:07 +0000] REST.GET.OBJECT registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-a9f8b/nodejs-example/_uploads/9cbb4570-dd3f-4719-8501-a8361c626d5c/startedat "GET /osostage-docker/registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-a9f8b/nodejs-example/_uploads/9cbb4570-dd3f-4719-8501-a8361c626d5c/startedat HTTP/1.1" 200 - 20 20 18 18 "-" "Go 1.1 package http" -
00182:[14/Mar/2016:21:03:07 +0000] REST.GET.OBJECT registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-db8f4/nodejs-example/_uploads/79f748d9-60e0-47af-81e7-1eba2fe406dd/startedat "GET /osostage-docker/registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-db8f4/nodejs-example/_uploads/79f748d9-60e0-47af-81e7-1eba2fe406dd/startedat HTTP/1.1" 200 - 20 20 22 21 "-" "Go 1.1 package http" -
00183:[14/Mar/2016:21:03:07 +0000] REST.PUT.OBJECT registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-7b5c2/nodejs-example/_uploads/ea3d848b-1506-4ac9-b941-3074719431ca/hashstates/sha256/0 "PUT /osostage-docker/registry/docker/registry/v2/repositories/ops-monitor-appbuildstage-master-7b5c2/nodejs-example/_uploads/ea3d848b-1506-4ac9-b941-3074719431ca/hashstates/sha256/0 HTTP/1.1" 200 - - 157 102 7 "-" "Go 1.1 package http" -
00184:[14/Mar/2016:21:03:17 +0000] REST.GET.BUCKET - "GET /osostage-docker/?delimiter=&marker=&max-keys=1&prefix=registry%2Fdocker%2Fregistry%2Fv2%2Frepositories%2Fops-monitor-appbuildstage-master-7b5c2%2Fnodejs-example%2F_uploads%2Fea3d848b-1506-4ac9-b941-3074719431ca%2Fdata HTTP/1.1" 200 - 383 - 29 28 "-" "Go 1.1 package http" -
00185:[14/Mar/2016:21:03:17 +0000] REST.GET.BUCKET - "GET /osostage-docker/?delimiter=&marker=&max-keys=1&prefix=registry%2Fdocker%2Fregistry%2Fv2%2Frepositories%2Fops-monitor-appbuildstage-master-a9f8b%2Fnodejs-example%2F_uploads%2F9cbb4570-dd3f-4719-8501-a8361c626d5c%2Fdata HTTP/1.1" 200 - 383 - 26 25 "-" "Go 1.1 package http" -
00186:[14/Mar/2016:21:03:17 +0000] REST.GET.BUCKET - "GET /osostage-docker/?delimiter=&marker=&max-keys=1&prefix=registry%2Fdocker%2Fregistry%2Fv2%2Frepositories%2Fops-monitor-appbuildstage-master-db8f4%2Fnodejs-example%2F_uploads%2F79f748d9-60e0-47af-81e7-1eba2fe406dd%2Fdata HTTP/1.1" 200 - 383 - 23 23 "-" "Go 1.1 package http" -
After discussion with the QE lead, I am removing the UpcomingRelease tag from this so that it will show up for triage. The operations team hit this again on 14 March and this would be a blocker in production. I will continue to attempt to reproduce in a test environment - I've been unsuccessful recently. The next step in either test or in ops staging would be to turn on docker registry debug and to run with a version of OpenShift which has pprof enabled for the registry. Dan Mace and Miciah Masters are familiar with this.
One relevant point that came up during the Online/Performance meeting today is that the ops staging env is 18.104.22.168 and the last build where I was able to reproduce the issue was 22.214.171.1240 (early 3.2)
Waiting for the information from Mike about his finding.
<pweil> mffiedler, any news on https://bugzilla.redhat.com/show_bug.cgi?id=1314381 ?
* jayunit100 (~firstname.lastname@example.org) has joined #aos
<mffiedler> pwei, unable to reproduce lately. you can put that on_qa if you want. i will be doing more testing later this week. maybe today
<mffiedler> pweil ^
Created attachment 1144329 [details]
pprof profile #1 of slow push event
Created attachment 1144330 [details]
pprof #2 of slow push event
Created attachment 1144331 [details]
registry log (debug) of slow push event
5.12s of 6.60s total (77.58%)
Dropped 388 nodes (cum <= 0.03s)
Showing top 40 nodes out of 307 (cum >= 0.70s)
flat flat% sum% cum cum%
0.90s 13.64% 13.64% 0.90s 13.64% crypto/cipher.(*gcm).mul
0.86s 13.03% 26.67% 0.91s 13.79% syscall.Syscall
0.41s 6.21% 32.88% 0.41s 6.21% scanblock
0.37s 5.61% 38.48% 0.37s 5.61% github.com/ugorji/go/codec.(*bytesDecReader).readn1
0.27s 4.09% 42.58% 0.63s 9.55% github.com/ugorji/go/codec.(*jsonDecDriver).appendStringAsBytes
0.27s 4.09% 46.67% 0.29s 4.39% runtime.mallocgc
0.23s 3.48% 50.15% 0.23s 3.48% crypto/aes.encryptBlockAsm
0.14s 2.12% 52.27% 0.14s 2.12% crypto/cipher.getUint64
0.14s 2.12% 54.39% 0.14s 2.12% runtime.MSpan_Sweep
0.10s 1.52% 55.91% 0.10s 1.52% crypto/elliptic.p256ReduceDegree
0.10s 1.52% 57.42% 0.10s 1.52% runtime.memmove
0.09s 1.36% 58.79% 0.56s 8.48% crypto/cipher.(*gcm).counterCrypt
0.08s 1.21% 60.00% 1.12s 16.97% crypto/cipher.(*gcm).updateBlocks
0.07s 1.06% 61.06% 0.13s 1.97% crypto/cipher.xorWords
0.06s 0.91% 61.97% 0.32s 4.85% crypto/aes.(*aesCipher).Encrypt
0.06s 0.91% 62.88% 0.06s 0.91% crypto/cipher.fastXORWords
0.06s 0.91% 63.79% 0.13s 1.97% encoding/json.nextValue
0.06s 0.91% 64.70% 0.06s 0.91% math/big.divWW
0.06s 0.91% 65.61% 0.06s 0.91% math/big.mulAddVWW
0.06s 0.91% 66.52% 0.25s 3.79% math/big.nat.divLarge
0.06s 0.91% 67.42% 0.10s 1.52% runtime.deferreturn
0.06s 0.91% 68.33% 0.06s 0.91% runtime.futex
0.06s 0.91% 69.24% 0.07s 1.06% runtime.mapiternext
0.05s 0.76% 70.00% 0.08s 1.21% encoding/json.checkValid
0.05s 0.76% 70.76% 0.05s 0.76% math/big.addMulVVW
0.05s 0.76% 71.52% 0.05s 0.76% math/big.subVV
0.04s 0.61% 72.12% 0.05s 0.76% encoding/json.(*encodeState).string
0.04s 0.61% 72.73% 0.04s 0.61% encoding/json.stateInString
0.04s 0.61% 73.33% 0.04s 0.61% regexp.(*machine).add
0.04s 0.61% 73.94% 0.05s 0.76% runtime.mapaccess2
0.04s 0.61% 74.55% 0.04s 0.61% runtime.memeqbody
0.03s 0.45% 75.00% 0.26s 3.94% crypto/aes.encryptBlock
0.03s 0.45% 75.45% 0.07s 1.06% fmt.(*pp).doPrintf
0.03s 0.45% 75.91% 0.06s 0.91% reflect.(*structType).FieldByName
0.02s 0.3% 76.21% 1.48s 22.42% bufio.(*Writer).flush
0.02s 0.3% 76.52% 1.39s 21.06% crypto/tls.(*Conn).readRecord
0.02s 0.3% 76.82% 0.10s 1.52% k8s.io/kubernetes/pkg/client/cache.storePodsNamespacer.List
0.02s 0.3% 77.12% 0.07s 1.06% math/big.basicMul
0.02s 0.3% 77.42% 0.19s 2.88% runtime.newobject
0.01s 0.15% 77.58% 0.70s 10.61% bufio.(*Reader).ReadSlice
pbench results during slow pushes can be found here: http://perf-infra.ec2.breakage.org/incoming/ip-172-31-43-59/slow_push/
This should be directly comparable to results for a fast push of the same number of concurrent builds (5) from earlier the same day: http://perf-infra.ec2.breakage.org/incoming/ip-172-31-43-59/5_builds_2_nodes_ssd/
ip-172-31-43-60: registry (and router)
ip-172-31-43-61: builder #1
ip-172-31-43-62: builder #2
S3 logs are showing the 10-second gap behavior described in comment 3. I will attach when they are complete and show this time frame in full.
Created attachment 1144478 [details]
s3 request log for slow push event on 6 April 2016
Attached s3 log. If you search for "19:00" you can see the timeframe that corresponds to the pprof. Note the ~10 second gaps between batches of entries. The pbench was later - just add +4hrs to the times in the pbench data to find the S3 logs - the 10 second gaps should be happenig then, as well.
The beginning of this log is representative of what the s3 request logs look like for fast/normal pushes. No 10 second gaps.
Here are my initial finding:
docker registry pushes a layer and that completes immediately:
time="2016-04-06T15:00:08.505014056-04:00" msg="response completed"
accompanying S3 log:
10365:[06/Apr/2016:19:00:08 +0000] REST.PUT.OBJECT ...715de692-205b-48d5-8cf9-38295ef5bd6f/hashstates/sha256/3040001
And the next operation is the one that is the root cause:
time="2016-04-06T15:00:08.555967319-04:00" msg="authorizing request"
time="2016-04-06T15:00:08.5564915-04:00" msg="(*quotaRestrictedBlobStore).Resume: starting"
time="2016-04-06T15:00:08.556541043-04:00" level=debug msg="(*linkedBlobStore).Resume"
time="2016-04-06T15:00:08.577438326-04:00" level=debug msg="s3.Stat(...3c0b203b-f626-4fa1-af4f-f8a6c154da77/data
--- 10 seconds hang ---
172.20.2.1 - - [06/Apr/2016:15:00:17 -0400] "GET /healthz HTTP/1.1" 200 0 "" "Go 1.1 package http"
172.20.2.1 - - [06/Apr/2016:15:00:17 -0400] "GET /healthz HTTP/1.1" 200 0 "" "Go 1.1 package http"
time="2016-04-06T15:00:18.709968262-04:00" level=debug msg="s3.List(...3c0b203b-f626-4fa1-af4f-f8a6c154da77/hashstates/sha256
accompanying S3 log:
10367:[06/Apr/2016:19:00:18 +0000] REST.COPY.OBJECT_GET registry/docker/registry/v2/repositories/t1/cakephp-example/_uploads/3c0b203b-f626-4fa1-af4f-f8a6c154da77/data - 200 - - 3039950 - - - - -
Which means it takes almost 10 seconds for the request to get from registry to s3. Which is completely related to what Andy noticed in bug 1321968.
@Mike can you sync with Andy how to debug the DNS resolution part?
I am re-adding this to upcoming release to remove this from the 3.2 blocker list.
Justification: while this has a high impact on performance the functionality is still working. This is not an absolute blocker and doesn't seem to be a regression.
We will continue to treat this as high priority.
Current finding are that the problems are cause ONLY in s3.GetContent or s3.Stat, from the logs we have. Looking into s3 driver code we've identified  as possible source of the problem. I'll keep investigating why we would be getting those retires.
*** Bug 1332285 has been marked as a duplicate of this bug. ***
On one customer cluster, Ops has found that restarting the docker-registry pods (scale down and then back up) temporarily speeds up docker push times. Frequent pruning helps too, but when pruning is not enough, a restart puts it back into a good state.
I have seen docker-registry restarts via scale down/up clear the issue as well.
As an end user this get's really frustrating. A push of several images that originally took 45 minutes (After a registry trim and restart of the nodes), is now up to 4 hours.
It's gradual, it took 2 months to get to this point, but it's at that point.
I know this isn't a blocker, the registries are still usable. But as an end user, it's easier for us to find something else than continue to use a registry with an s3 backend.
With the new docker registry  comes different s3 driver, specifically it's using AWS driver instead of external one. According to tests done by Michal the performance should be greatly improved. Mike can I ask you to test it again with the tests you have? I remember you were quite good with reproducing this bug in almost no time ;)
This needs to be in a build to go ON_QA if I understand the process. I think it should go to POST status until it is available in an OpenShift puddle.
The correct status is MODIFIED until it hits a puddle. Learned something new...
So did I ;)
Based on previous comment moving to QA.
Should be resolved by https://bugzilla.redhat.com/show_bug.cgi?id=1368024.
Making vlaad the QA contact - I will help him verify.
No slow S3 pushes have been seen during 3.3 SVT concurrent build testing or in repeated build testing specifically intended to hit this S3 slow push issue. Marking verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.