Bug 1608410 - Bulk tagging an image stream (creating large numbers of spec tags) results in some tags not being created
Summary: Bulk tagging an image stream (creating large numbers of spec tags) results in...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: ImageStreams
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.11.0
Assignee: Ben Parees
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-25 13:26 UTC by Clayton Coleman
Modified: 2018-10-11 07:22 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-11 07:22:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:22:40 UTC

Description Clayton Coleman 2018-07-25 13:26:10 UTC
ci-operator creates a single image stream with 50+ tags in a single call.  On 3.10 this works fine, but when we tried to upgrade api.ci to 3.11, some of the tags did not get imported (the spec tag was set, but the status tag was missing).

Investigating now, my first suspect is the collapseStatusTags function added when image layers were added.  But since everything happens in one call, I somewhat doubt that is the cause.

Blocks upgrading api.ci to 3.11.

Comment 1 Clayton Coleman 2018-07-25 15:45:05 UTC
Appears to be a timeout on image stream import.  I suspect the long running request exception for imagestreamimport got lost in the rebase, OR we suddenly got a bit slower and then hit the long running request timeout (but that shouldn't ever fire, so I think it's the former).

I0725 15:37:09.364025       1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable...
I0725 15:37:09.364134       1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true...
E0725 15:37:39.372550       1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration
I0725 15:37:39.453068       1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable...
I0725 15:37:39.453092       1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true...
E0725 15:38:09.464802       1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration
I0725 15:38:09.625433       1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable...
I0725 15:38:09.625484       1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true...
E0725 15:38:39.645567       1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration
I0725 15:38:39.965799       1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable...
I0725 15:38:39.965830       1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true...

Comment 2 Clayton Coleman 2018-07-25 15:46:08 UTC
That's a 30s timeout, which shouldn't be applied to image stream import.  I also don't see it showing up in long running requests:

$ oc get --raw /metrics | grep longrunning | grep -v WATCH
# HELP apiserver_longrunning_gauge Gauge of all active long-running apiserver requests broken out by verb, API resource, and scope. Not all requests are tracked this way.
# TYPE apiserver_longrunning_gauge gauge
apiserver_longrunning_gauge{resource="pods",scope="namespace",subresource="log",verb="GET"} 0

Comment 3 Clayton Coleman 2018-07-25 17:28:09 UTC
This is back to the image team - we are not setting a default, so we're timing out at 30s on create because ?timeout= defaults to 30s.  We can set a longer timeout.  The client doesn't make that easy today.

Comment 4 Clayton Coleman 2018-07-25 17:30:22 UTC
It looks like it takes a lot longer to import than before.

Comment 5 Ben Parees 2018-07-25 17:55:52 UTC
is it possible we *should* make it treated as a long running request instead?

regardless of whether there is a regression in import time, it seems like imagestreamimport can take an arbitrary length of time depending on the number of tags and the speed of the registry we have to pull metadata from.

Comment 6 Clayton Coleman 2018-07-26 01:37:09 UTC
https://github.com/openshift/origin/pull/20419 increases the timeout.

I made changes to ci-operator to bypass this use case for most flows.

The image layers change should reduce the amount of time to find the manifest by digest.

Comment 7 Ben Parees 2018-07-26 01:44:27 UTC
i'd still like to understand what would have taken us from "occasionally breaching 30s" to "takes 3 minutes", assuming that behavior is consistent. 

Has the imagestream itself grown (in number of tags)?

Comment 8 Ben Parees 2018-08-03 14:29:36 UTC
I believe all the performance improvements are merged now, primarily:
https://github.com/openshift/image-registry/pull/101

Comment 10 Mike Fiedler 2018-08-30 19:25:59 UTC
Tested creation (via oc create) of imagestream with 100 imagestreamtags for a docker image with 100 layers.   Creation was successful and subsequent oc import-image successful.  Any other tests you'd like to see?

Comment 11 Mike Fiedler 2018-08-30 19:31:56 UTC
Imagestream + all imagestreamtags in comment 10 created in 2.5 seconds - oc create command itself was 0.2 seconds.   This is on 3.11.0-0.25.0

Comment 12 Ben Parees 2018-08-30 19:38:35 UTC
Sounds great to me, thanks Mike.

Comment 13 Mike Fiedler 2018-08-31 12:21:44 UTC
Verified on 3.11.0-0.25.0.   See comment 10 and comment 11 for the verification scenario.

Comment 15 errata-xmlrpc 2018-10-11 07:22:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.