Bug 1589994

Summary:	Large image stream unable to pull image
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Image Registry	Assignee:	Ben Parees <bparees>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Mike Fiedler <mifiedle>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	3.10.0	CC:	aos-bugs, bparees, mifiedle, wsun, wzheng
Target Milestone:	---
Target Release:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Pulling an image from the registry requires scanning all images associated with the image's imagestream to find the correct layers. Consequence: When an imagestream contains a large number of images, this operator can take an excessive amount of time and result in a failure to pull the image due to timeouts. Fix: The mechanism of scanning the imagestream to find the relevant layers has been improved to reduce the number of api calls as well as scan more efficiently, speeding up the operation. Result: Timeouts should no longer be seen when pulling images from imagestreams that contain a large number of images.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-12-21 15:16:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2018-06-11 19:29:58 UTC

A 3.10 cluster running the latest origin v3.10.0 image is unable to serve an image stream image that was tagged into it.

Scenario:

1. Large image stream (40 tags, 355 images) that has many scheduled import tags and a few explicit tags
2. Explicit tags were images created in another namespace and then tagged in as ImageStreamImage
3. Registry has external registry name (registry.svc.ci.openshift.org)

Try to pull a scheduled import image (registry.svc.ci.openshift.org/openshift/origin-v3.10) works quickly.

Try to pull an image that was tagged by tag or digest (registry.svc.ci.openshift.org/openshift/origin-v3.10:docker-registry or registry.svc.ci.openshift.org/openshift/origin-v3.10@sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379) times out.

Looking at the logs, the image stream HasBlob call takes ~1m to complete on this image, but eventually succeeds.  However, the intervening proxy has timed out.

This means the image is unpullable due to a linear algorithm in this registry.  A larger image stream would also timeout.

Make request (beginning):

time="2018-06-11T19:17:51.267565425Z" level=debug msg="authorizing request" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha25
6:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 vars.name=openshift/origin-v3.10 vars.reference
="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.28248698Z" level=debug msg="Origin auth: checking for access to repository:openshift/origin-v3.10:pull" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.285667071Z" level=info msg="Using \"docker-registry.default.svc:5000\" as Docker Registry URL" go.version=go1.9.2 instance.id=127a676c-1932-4791-8125-49af25199ce6
time="2018-06-11T19:17:51.285787829Z" level=debug msg=GetImageManifest go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.285898937Z" level=debug msg="(*pullthroughManifestService).Get: starting with dgst=sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.285950658Z" level=debug msg="(*manifestService).Get" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.303191399Z" level=debug msg="(*cachedImageStreamGetter).getImageStream: got image stream openshift/origin-v3.10" go.version=go1.9.2 instance.id=127a676c-1932-4791-8125-49af25199ce6
time="2018-06-11T19:17:51.309678392Z" level=info msg="(*cachedImageGetter).Get: got image sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379 from server" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.31023945Z" level=debug msg="(*manifestStore).Get" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.310347292Z" level=debug msg="(*blobDescriptorService).Stat: starting with digest=sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.580394836Z" level=debug msg="gcs.GetContent(\"/docker/registry/v2/repositories/openshift/origin-v3.10/_manifests/revisions/sha256/abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379/link\")" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 trace.duration=269.824808ms trace.file=/tmp/openshift/build-rpms/rpm/BUILD/origin-dockerregistry-3.10.0/_output/local/go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base/base.go trace.func="github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base.(*Base).GetContent" trace.id=db5d4bb0-a960-44ae-bd18-991728654993 trace.line=82 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.694299884Z" level=debug msg="gcs.GetContent(\"/docker/registry/v2/repositories/openshift/origin-v3.10/_layers/sha256/abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379/link\")" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 trace.duration=113.606424ms trace.file=/tmp/openshift/build-rpms/rpm/BUILD/origin-dockerregistry-3.10.0/_output/local/go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base/base.go trace.func="github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base.(*Base).GetContent" trace.id=60310db2-0329-4ddd-b13d-3778dbc5e801 trace.line=82 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.694426041Z" level=debug msg="(*blobDescriptorService).Stat: could not stat layer link sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379 in repository openshift/origin-v3.10: unknown blob" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.792696027Z" level=debug msg="gcs.Stat(\"/docker/registry/v2/blobs/sha256/ab/abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379/data\")" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 trace.duration=98.144717ms trace.file=/tmp/openshift/build-rpms/rpm/BUILD/origin-dockerregistry-3.10.0/_output/local/go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base/base.go trace.func="github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=1feecf3b-5dd7-4702-8d75-66e5879127ce trace.line=137 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.792798768Z" level=debug msg="(*blobDescriptorService).Stat: blob sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379 exists in the global blob store" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.792840843Z" level=debug msg="verifying presence of blob \"sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379\" in image stream openshift/origin-v3.10" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.79289825Z" level=debug msg="(*cachedImageStreamGetter).getImageStream: returning cached copy" go.version=go1.9.2 instance.id=127a676c-1932-4791-8125-49af25199ce6
time="2018-06-11T19:17:51.793075658Z" level=debug msg="getting image sha256:b5fe4fd8be9c2b04b570c27a16f2403825edf0bddef0a88fa01048c8b2639b01" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.796963202Z" level=info msg="(*cachedImageGetter).Get: got image sha256:b5fe4fd8be9c2b04b570c27a16f2403825edf0bddef0a88fa01048c8b2639b01 from server" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.797188258Z" level=debug msg="getting image sha256:9246e6b02673ec2b57efb30c322593e1eb347da82180ef9e79a0b7ffd5989455" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.80022693Z" level=info msg="(*cachedImageGetter).Get: got image sha256:9246e6b02673ec2b57efb30c322593e1eb347da82180ef9e79a0b7ffd5989455 from server" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:17:51.800450706Z" level=debug msg="getting image sha256:99ca4ace2271d3e2785a1eefd4d628f00444c6f5928df9a37a0d220e3bcb5c94" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"

...


~1 minute later 

...

time="2018-06-11T19:18:53.516933257Z" level=info msg="(*cachedImageGetter).Get: found image sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379 in cache" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GE
T http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))
" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:18:53.517136983Z" level=debug msg="blob found under istag openshift/origin-v3.10:docker-registry in image sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d
7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-
Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:18:53.517203674Z" level=debug msg="verified presence of blob \"sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379\" in image stream openshift/origin-v3.10 after 1m1.724314314s" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"
time="2018-06-11T19:18:53.592473899Z" level=debug msg="gcs.Reader(\"/docker/registry/v2/blobs/sha256/ab/abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379/data\", 0)" go.version=go1.9.2 http.request.host=registry.svc.ci.openshift.org http.request.id=7b1fa407-d9ee-49bd-a934-14eaa0473d7e http.request.method=GET http.request.remoteaddr=172.16.12.1 http.request.uri="/v2/openshift/origin-v3.10/manifests/sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379" http.request.useragent="docker/1.13.1 go/go1.8.3 kernel/3.10.0-693.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(darwin\))" instance.id=127a676c-1932-4791-8125-49af25199ce6 openshift.auth.user=smarterclayton openshift.auth.userid=06420fc6-3224-11e8-9696-42010a8e0003 trace.duration=75.041795ms trace.file=/tmp/openshift/build-rpms/rpm/BUILD/origin-dockerregistry-3.10.0/_output/local/go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base/base.go trace.func="github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base.(*Base).Reader" trace.id=e9ddc1c3-7036-4b4e-9d3f-35b5e4356d1f trace.line=107 vars.name=openshift/origin-v3.10 vars.reference="sha256:abf34d7bb2c6b73c2946470b2f1c141b2c5195c115995b7562a84b2040793379"


Given the size of this image stream, it's possible we're breaking the repo cache (2k entries, and 335 images with ~10 layers each is bigger than that).  Either way, the image is unpullable due to the linear scan.

I'll test increasing the cache to 10x the current size.

Given that we now always assume we're filling layer links, it may make sense for us to materialize the repo->layer in storage by creating a layer link after we perform the expensive traversal.  That would still break the first pull, but subsequent pulls would eventually succeed.  Another option is to implement the layer -> repo index as an API call as we have discussed before.

CI servers are broken (we started using the registry to store images) and we can't make progress.

Comment 1 Clayton Coleman 2018-06-11 19:39:56 UTC

The current search algorithm in HasBlob penalizes less frequently updated tags.  Sorting all tag events by generation will allow one frequently updated tag to make a less frequently updated tag be last on the list.

I.e.:

tag1: 1h, 2h, 3h, 4h, 5h, ...
tag2: 1d, 2d, 3d

The first 24 tag1 will be scanned before the first tag2 is considered.

The algorithm should probably be:

1. All current tagged images first
2. All non-current tagged images in order of generation.

Comment 2 Clayton Coleman 2018-06-11 19:41:09 UTC

In the case above we have the equivalent of images pushed from origin builds (every hour) being checked against images pushed from docker-registry builds (every week), so docker-registry is "losing" to all origin images.

Comment 3 Clayton Coleman 2018-06-11 20:50:24 UTC

Something is really wrong with caching.  I confirmed repository caching was on.  On a ~5 minute run with several accesses, I got the following stats from blobDescriptor services

○ grep "starting with digest"
    5635
○ grep "could not stat layer link"
     332
○ grep "exists in the global blob store"
     269
○ grep "found cached blob"
       9
○ grep "verifying presence"
     260
○ grep "neither empty nor referenced"
     238


9/332 cache hits on the repository seems really low.  Of the 260 times we attempted to look up the blob in the image stream, we only found an image 22 times (probably head requests?).

Comment 4 Ben Parees 2018-06-12 18:11:46 UTC

https://github.com/openshift/image-registry/pull/100 will hopefully make this manageable while additional solutions are pursued post 3.10.

Comment 5 Ben Parees 2018-06-13 14:33:54 UTC

https://github.com/openshift/image-registry/pull/100 has merged to help mitigate this in 3.10, but there are some longer term efforts underway for 3.11:
https://github.com/openshift/image-registry/pull/101

Comment 6 Clayton Coleman 2018-08-17 20:41:41 UTC

Fixed in 3.11

Comment 7 Wenjing Zheng 2018-08-30 09:45:29 UTC

Ben, could you give some suggestions on how to verify this bug?

Comment 8 Ben Parees 2018-08-30 14:25:21 UTC

I'd recreate the scenario Clayton described in the initial report.  Create an imagestream that has a large number of tags (~100).  And make each tag have a large number of SHAs (~10 each, representing the image tag having been updated/pushed 10 times).  This results in an imagestream with a total of around 1000 images in its history.

Once you have that imagestream created, attempt to pull various SHAs from the imagestream.  All pulls should succeed, and not timeout.

Comment 9 Wei Sun 2018-09-12 01:56:26 UTC

Hi Mike,could you help check if this bug could be verified?
Thanks!

Comment 10 Mike Fiedler 2018-09-12 12:36:04 UTC

Verified on 3.11.0-0.28.0 per comment 8.  Created an imagestream with 100 tags each with 10 SHAs.  The imagestream showed a total of 1000 images.   Then did a loop to docker pull all 1000 SHAs from the registry and all pulls were successful and took the expected amount of time.

Comment 11 Luke Meyer 2018-12-21 15:16:43 UTC

Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.