Bug 1358536

Summary: [extras-rhel-7.2.7] All layers unexpectedly pushed/longer push times when running multiple builds on a node
Product: Red Hat Enterprise Linux 7 Reporter: Mike Fiedler <mifiedle>
Component: dockerAssignee: Lokesh Mandvekar <lsm5>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: agladkov, aos-bugs, cewong, jokerman, lsm5, lsu, mfojtik, mifiedle, mmccomas, vlaad, wmeng, xtian
Target Milestone: rcKeywords: Extras, TestBlocker, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: docker-1.10.3-46.el7.11.x86_64 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1368024 (view as bug list) Environment:
Last Closed: 2016-09-15 08:30:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1368024    
Bug Blocks:    
Attachments:
Description Flags
cakephp-example build log at log level 5
none
Registry log none

Description Mike Fiedler 2016-07-20 23:28:34 UTC
Description of problem:

Not sure if this is registry or build causing this.   When builds are run individually, they only push the updated layers, they do not push all layers.   Using cakephp as an example, the push in the build log normally looks like this:

2016-07-20T22:22:08.673631596Z Pushing image 172.25.20.45:5000/u16/cakephp-example:latest ...
2016-07-20T22:22:09.161466330Z Pushed 0/4 layers, 1% complete
2016-07-20T22:22:12.810609054Z Pushed 1/4 layers, 25% complete
2016-07-20T22:22:13.654967047Z Push successful

A 5 second push.

rWhen the exact same build is run concurrently with 2 other builds, some builds will randomly have short push times (above) while others will push all layers.  The same build as above did this on the next build after 3 consecutive short pushes:

2016-07-20T22:23:16.332945754Z Pushing image 172.25.20.45:5000/u16/cakephp-example:latest ...
2016-07-20T22:23:16.924879475Z Pushed 0/4 layers, 3% complete
2016-07-20T22:23:21.812414323Z Pushed 1/4 layers, 37% complete
2016-07-20T22:23:33.359228731Z Pushed 2/4 layers, 74% complete
2016-07-20T22:23:49.704269168Z Pushed 3/4 layers, 100% complete
2016-07-20T22:23:52.086482322Z Pushed 4/4 layers, 100% complete
2016-07-20T22:23:53.141130722Z Push successful

A 37 second push.   The next build it was back to 5 seconds.

The more concurrent builds running in the cluster.    The cluster is on AWS with S3 storage for the registry.  1 master, 1 registry/router and 2 application nodes.   Everything is m4.xlarge.

The longer pushes always show in the log with all layers pushed instead of the expected push of only changed layers.

Running 1 or 2 builds concurrently (1/node) builds have short pushes.
Running 3  builds concurrently, 1 build usually shows the longer push time.
Running 4 builds concurrently (2/node) , all builds usually show the longer push time with an occasional build showing short times.

Version-Release number of selected component (if applicable): 3.3.0.7


How reproducible: Always when running in the above configuration


Steps to Reproduce:
1.  Install a cluster with master, registry/router, 2 application nodes.   Use selectors to force builds to the 2 application nodes.   Configure S3 for backend registry storage
2.  Create 4 projects and oc new-app --template=cakephp-example in all 4
3.  (Warmup) Run each build a few times to make sure it has pushed all layers to the registry from both of the application nodes 
4. Run one build and check the log.  It should only push 2 layers and the push should take 4-6 seconds
5. Run two builds simultaneously.   The pushes should still be 4-6 seconds
6. Run three builds simultaneously.  Likely 1 push will be 30-40 seconds and be a push of 4 layers.   If it is not, repeat running 3 builds a few times
7. Run 4 or more builds simultaneously.  Usually the builds will be long and the logs will show all layers pushed

Actual results:

See above, builds with layers already pushed to the registry still try to push all layers.

Expected results:

Consistent behavior where only modified layers are pushed.

Additional info:


Let me know what additional info is required.  I'd be happy to demo this if needed.

Comment 1 Michal Fojtik 2016-07-21 15:14:11 UTC
Cesar, can you check if this is a builder issues? If not, please let me know and I will add Michal Minar to investigate possible registry problem.

Comment 2 Cesar Wong 2016-07-21 15:55:33 UTC
Michal, it is very unlikely that the build logic itself is affecting how images are pushed, given that we delegate the push to the Docker daemon which interacts directly with the registry. However, what we do control is the logging of the progress. It'd be good to rule out that it's just not incorrect reporting of the push progress by switching the log level to 5 (oc set env bc/[bcname] BUILD_LOGLEVEL=5).

Comment 3 Cesar Wong 2016-07-21 15:56:47 UTC
Also, I tested briefly on a local cluster by launching 5 builds concurrently and couldn't get the same behavior as reported in this bug. All build logs reported only 2 layers getting pushed.

Comment 4 Mike Fiedler 2016-07-21 16:31:56 UTC
Cesar,  What storage were you using?  NFS/S3/ephemeral?   I was going to compare behavior there next.

Comment 5 Cesar Wong 2016-07-21 17:07:36 UTC
It was ephemeral

Comment 6 Mike Fiedler 2016-07-21 20:43:37 UTC
Created attachment 1182663 [details]
cakephp-example build log at log level 5

Re-created with ephemeral storage and build log level = 5.   An app which pushed 2 layers consistently pushed all layers when built concurrently with other apps.

Comment 8 Michal Minar 2016-08-12 13:51:03 UTC
One clarification: each build config pushes to the same repository. Push times are still unpredictable even if the layer exists in the repository for quite some time.

Comment 9 Mike Fiedler 2016-08-12 19:19:07 UTC
Simplified recreate steps - it doesn't even have to be concurrent builds to get the strange results, building different namespaces seems to do it.

1. Create 3 projects:  test1, test2, test3
2. In each project, oc new-app --template=cakephp-example
3. After initial build, rebuild each app individually a couple of times.   Verify in the build logs that all layers are not being pushed.
4. Run the builds sequentially like this:
    for i in {1..3}; do oc start-build -n test$i cakephp-example; sleep 5; oc logs -f -n test$i bc/cakephp-example; done

Repeat step 4 a couple of times.  Sometimes the builds push 2 layers, sometimes all layers.

Comment 10 Mike Fiedler 2016-08-12 19:27:14 UTC
Attached logs are for me running step 4 above 3 times (3 builds in each namespace after initial build was successful):

1:

test1 namespace: pushed 4 layers
test2: pushed 2 layers
test3: pushed 4 layers

2:

test1: pushed 2 layers
test2: pushed 4 layers
test3: pushed 2 layers

3:

test1: pushed 4 layers
test2: pushed 2 layers
test3: pushed 4 layers

Note the pattern, not sure if that is coincidence.  Will have to test it some more.

Comment 11 Mike Fiedler 2016-08-12 19:27:46 UTC
Created attachment 1190551 [details]
Registry log

Comment 12 Michal Minar 2016-08-14 09:58:35 UTC
(In reply to Mike Fiedler from comment #9)

> Repeat step 4 a couple of times.  Sometimes the builds push 2 layers, sometimes all layers.

Mike, this is actually expected behaviour, although not very convenient. What
happens under the cover is:

    # oc project test1
    $ oc start-build cakephp-example

Docker pushed all the layers and remembered associations:
    <blob> <-> <registry>/test1/cakephp-example
For each blob

    $ oc project test2
    $ oc start-build cakephp-example

Docker found associations for all but the last layer and attempted to cross-repo mount each such associated blob from
    <registry>/test1/cakephp-example
with a url like this:
    /v2/test2/cakephp-example/blobs/uploads/?from=test1/cakephp-example&mount=sha256:3d8673bd162aafaf70e94029801a409fa3210363707ea1e7956b6f8ff9351cc8
Docker registry checked whether the user
    system:serviceaccount:test2:builder
is authorized to pull from `test1` namespace and determined that it's NOT. So
it returned 202 status indicating that cross-repo mount failed. Blob needs to
be re-uploaded.

If you give system:serviceaccount:test2:builder a permission to pull images
from test1 namespace like this:
    oc policy add-role-to-user -n test1 system:image-puller system:serviceaccount:test2:builder
the blobs will be mounted so only the last layer will be pushed.

However, if docker remembers multiple associations for single blob (like in
this case), the role needs to be given to the same user in each namespace
because docker can pick any association and it picks just one. If the
cross-mount fails (builder cannot pull from test3), the association will be
forgotten. So during the next push, the cross mount from test1 will be tried.

So to be sure that build in namespace test2 won't push any layer already
existing in other 2 namespaces, this would have to be run:

    oc policy add-role-to-user -n test1 system:image-puller system:serviceaccount:test2:builder
    oc policy add-role-to-user -n test3 system:image-puller system:serviceaccount:test2:builder

There's not much we can do to address this on OpenShift side. It's correct to
refuse cross-repository mount if the user is not authorized to pull from
source repository. What we could do is let cross-repo mount succeed if the
target and source repository are in the same namespace. This will be possible
once we have index of blobs in etcd. There's a related discussion [1] going
on.

[1] https://github.com/openshift/origin/pull/10120

There're few things Docker daemon could do to prevent blob uploads:

1. It could try to mount from each remembered association before falling back to full blob upload.
2. Verify if the blob really doesn't exist in the source repository before forgetting its association.

Comment 13 Michal Minar 2016-08-14 10:00:40 UTC
Nevertheless, the blob re-uploads must not happen if the same user system:serviceaccount:<namespace>:builder pushes to the same repository. Can you please double check this was the case with concurrent pushes?

Comment 15 Michal Minar 2016-08-16 17:06:19 UTC
So I have verified the issue with concurrent builds. If the buildconfigs are
configured to push to the very same image stream, the blobs are not uploaded
on subsequent pushes.

If there are multiple image streams, there's a high chance of blob re-uploads
due to the above.

If builders are given permissions to pull from other namespaces, the mounts
will succeed, thus no re-uploads again.

To summarize again:

1. Push of the image may be speeded up by a cross-repo mount feature as long
as:

  - the same docker daemon pushed or pulled the base image from some other
    (origin) repository of internal registry
  - builder service account is authorized to read images from the origin
    repository

2. associations between blobs and repositories are remember upon each
successful pull or push to/from repository

3. associations are forgotten when the pull or cross-repository mount fails
for the blob

4. Docker daemon does NOT pick source repositories to mount from randomly. It
picks the first (oldest) remembered. The problem is that the consequent build
may be requested by other builder service account not having permissions to
the first remembered namespace and thus the association is forgotten and blob re-uploaded at the same time.

What can we do about it:

- OpenShift can allow the cross-mount to succeed if the requested blob already
exists in the destination namespace. Currently blocked.

- For improvements on Docker side, see above comment 12.


Mike, based on this information, are you able to make some progress with the
performance tests? Can I help you somehow?

Comment 16 Mike Fiedler 2016-08-16 20:32:10 UTC
Michal, I should be un-blocked.  We will verify first thing tomorrow.   Thanks for the detailed information.

Comment 17 Vikas Laad 2016-08-17 14:59:24 UTC
after providing cross project permissions push are really fast, average push time before permissions was around 115 sec for 5 concurrent builds, now its 3.4 sec.

Build logs for eapproj9 build

Pushing image 172.24.155.200:5000/eapproj9/eap-app:latest ...
The push refers to a repository [172.24.155.200:5000/eapproj9/eap-app]
Preparing
Preparing
Preparing
Preparing
Preparing
Preparing
Waiting
Mounted from eapproj38/eap-app
Mounted from eapproj38/eap-app
Mounted from eapproj38/eap-app
Mounted from eapproj38/eap-app
Mounted from eapproj19/eap-app
Mounted from eapproj19/eap-app
Mounted from eapproj31/eap-app
Mounted from eapproj31/eap-app
Mounted from eapproj31/eap-app
Mounted from eapproj31/eap-app
Pushing [>                                                  ] 37.35 kB/3.478 MB
Pushing [=>                                                 ] 75.43 kB/3.478 MB
Pushing [=>                                                 ] 113.2 kB/3.478 MB
Pushing [==>                                                ] 178.7 kB/3.478 MB
Pushing [===>                                               ] 244.2 kB/3.478 MB
Pushing [====>                                              ] 309.8 kB/3.478 MB
Pushing [=====>                                             ] 375.3 kB/3.478 MB
Pushing [======>                                            ] 440.8 kB/3.478 MB
Pushing [=======>                                           ] 506.4 kB/3.478 MB
Pushing [========>                                          ] 562.6 kB/3.478 MB
Pushing [========>                                          ] 606.7 kB/3.478 MB

As we can see it is mounting layers from different projects.

Comment 21 Michal Minar 2016-08-18 07:12:44 UTC
That's because previously with OpenShift 3.2 and docker 1.9, Docker used to Stat() the blob in remote registry all the time before pushing:

https://github.com/docker/docker/blob/2057d3f3f254534a8c7253ae76204312d3882a82/graph/push_v2.go#L159

In more recent Docker 1.10.3, the Stat() call is done only if the association exists between the blob being pushed and the destination repository [2]. Sadly, due to associtions being forgotten, the Docker now often looses this assocation, so the Stat() call for the destination repository isn't done always.

[2] https://github.com/docker/docker/blob/f9d4a2c183cb4ba202babc9f8649ea043d8c84d0/distribution/push_v2.go#L429

This is not something we can fix on OpenShift side.

Let me clone this for Docker.

I'd certainly keep this open for OpenShift as well to:

> OpenShift can allow the cross-mount to succeed if the requested blob already exists in the destination namespace. Currently blocked.

Comment 26 Lokesh Mandvekar 2016-08-23 18:41:59 UTC
Moving to docker as per Comment 22

Comment 28 Vikas Laad 2016-08-23 20:51:22 UTC
Tested again, here are results
Before the upgrade timings

Build: p0:cakephp-example
	Total builds: 1 Failures: 0
	Avg build time: 23.0 Max build time: 23.0
	Avg push time: 51.0 Max push time: 51.0

Build: p1:cakephp-example
	Total builds: 1 Failures: 0
	Avg build time: 24.0 Max build time: 24.0
	Avg push time: 61.0 Max push time: 61.0

Build: p2:cakephp-example
	Total builds: 1 Failures: 0
	Avg build time: 34.0 Max build time: 34.0
	Avg push time: 75.0 Max push time: 75.0

Build: p3:cakephp-example
	Total builds: 1 Failures: 0
	Avg build time: 18.0 Max build time: 18.0
	Avg push time: 50.0 Max push time: 50.0

Build: p4:cakephp-example
	Total builds: 1 Failures: 0
	Avg build time: 31.0 Max build time: 31.0
	Avg push time: 57.0 Max push time: 57.0


Failed builds: 0

Invalid builds: 0

Good builds included in stats: 5

Average build time, all good builds: 26.0

Average push time, all good builds: 58.8


After upgrading docker on master and all the nodes to 1.10.3-46.el7.11 here are timings
Build: p0:cakephp-example
        Total builds: 1 Failures: 0
        Avg build time: 14.0 Max build time: 14.0
        Avg push time: 2.0 Max push time: 2.0

Build: p1:cakephp-example
        Total builds: 1 Failures: 0
        Avg build time: 15.0 Max build time: 15.0
        Avg push time: 1.0 Max push time: 1.0

Build: p2:cakephp-example
        Total builds: 1 Failures: 0
        Avg build time: 28.0 Max build time: 28.0
        Avg push time: 5.0 Max push time: 5.0

Build: p3:cakephp-example
        Total builds: 1 Failures: 0
        Avg build time: 28.0 Max build time: 28.0
        Avg push time: 5.0 Max push time: 5.0

Build: p4:cakephp-example
        Total builds: 1 Failures: 0
        Avg build time: 32.0 Max build time: 32.0
        Avg push time: 3.0 Max push time: 3.0


Failed builds: 0

Invalid builds: 0

Good builds included in stats: 5

Average build time, all good builds: 23.4

Average push time, all good builds: 3.2

Comment 29 Vikas Laad 2016-08-23 20:53:00 UTC
Average push time for 5 concurrent builds came down from 58 sec to 3.2 sec. Closing the bug.

Comment 30 Vikas Laad 2016-08-23 22:24:46 UTC
Detailed build logs

Pushing image 172.27.190.236:5000/p1/cakephp-example:latest ...
The push refers to a repository [172.27.190.236:5000/p1/cakephp-example]
Preparing
Preparing
Preparing
Preparing
Layer already exists
Layer already exists
Layer already exists
Layer already exists
Layer already exists
Layer already exists
Pushing [>                                                  ]   126 kB/10.66 MB
Pushing [=>                                                 ] 236.5 kB/10.66 MB
Pushing [=>                                                 ] 367.6 kB/10.66 MB
Pushing [==>                                                ] 498.7 kB/10.66 MB
Pushing [==>                                                ] 629.8 kB/10.66 MB
Pushing [===>                                               ] 760.8 kB/10.66 MB
Pushing [====>                                              ] 891.9 kB/10.66 MB
Pushing [====>                                              ] 1.023 MB/10.66 MB
Pushing [=====>                                             ] 1.154 MB/10.66 MB
Pushing [======>                                            ] 1.285 MB/10.66 MB

Comment 34 errata-xmlrpc 2016-09-15 08:30:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1827.html