Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1608410 - Bulk tagging an image stream (creating large numbers of spec tags) results in some tags not being created
Bulk tagging an image stream (creating large numbers of spec tags) results in...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image (Show other bugs)
3.11.0
Unspecified Unspecified
unspecified Severity urgent
: ---
: 3.11.0
Assigned To: Ben Parees
Mike Fiedler
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-07-25 09:26 EDT by Clayton Coleman
Modified: 2018-10-11 03:22 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-10-11 03:22:15 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 None None None 2018-10-11 03:22 EDT

  None (edit)
Description Clayton Coleman 2018-07-25 09:26:10 EDT
ci-operator creates a single image stream with 50+ tags in a single call.  On 3.10 this works fine, but when we tried to upgrade api.ci to 3.11, some of the tags did not get imported (the spec tag was set, but the status tag was missing).

Investigating now, my first suspect is the collapseStatusTags function added when image layers were added.  But since everything happens in one call, I somewhat doubt that is the cause.

Blocks upgrading api.ci to 3.11.
Comment 1 Clayton Coleman 2018-07-25 11:45:05 EDT
Appears to be a timeout on image stream import.  I suspect the long running request exception for imagestreamimport got lost in the rebase, OR we suddenly got a bit slower and then hit the long running request timeout (but that shouldn't ever fire, so I think it's the former).

I0725 15:37:09.364025       1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable...
I0725 15:37:09.364134       1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true...
E0725 15:37:39.372550       1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration
I0725 15:37:39.453068       1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable...
I0725 15:37:39.453092       1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true...
E0725 15:38:09.464802       1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration
I0725 15:38:09.625433       1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable...
I0725 15:38:09.625484       1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true...
E0725 15:38:39.645567       1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration
I0725 15:38:39.965799       1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable...
I0725 15:38:39.965830       1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true...
Comment 2 Clayton Coleman 2018-07-25 11:46:08 EDT
That's a 30s timeout, which shouldn't be applied to image stream import.  I also don't see it showing up in long running requests:

$ oc get --raw /metrics | grep longrunning | grep -v WATCH
# HELP apiserver_longrunning_gauge Gauge of all active long-running apiserver requests broken out by verb, API resource, and scope. Not all requests are tracked this way.
# TYPE apiserver_longrunning_gauge gauge
apiserver_longrunning_gauge{resource="pods",scope="namespace",subresource="log",verb="GET"} 0
Comment 3 Clayton Coleman 2018-07-25 13:28:09 EDT
This is back to the image team - we are not setting a default, so we're timing out at 30s on create because ?timeout= defaults to 30s.  We can set a longer timeout.  The client doesn't make that easy today.
Comment 4 Clayton Coleman 2018-07-25 13:30:22 EDT
It looks like it takes a lot longer to import than before.
Comment 5 Ben Parees 2018-07-25 13:55:52 EDT
is it possible we *should* make it treated as a long running request instead?

regardless of whether there is a regression in import time, it seems like imagestreamimport can take an arbitrary length of time depending on the number of tags and the speed of the registry we have to pull metadata from.
Comment 6 Clayton Coleman 2018-07-25 21:37:09 EDT
https://github.com/openshift/origin/pull/20419 increases the timeout.

I made changes to ci-operator to bypass this use case for most flows.

The image layers change should reduce the amount of time to find the manifest by digest.
Comment 7 Ben Parees 2018-07-25 21:44:27 EDT
i'd still like to understand what would have taken us from "occasionally breaching 30s" to "takes 3 minutes", assuming that behavior is consistent. 

Has the imagestream itself grown (in number of tags)?
Comment 8 Ben Parees 2018-08-03 10:29:36 EDT
I believe all the performance improvements are merged now, primarily:
https://github.com/openshift/image-registry/pull/101
Comment 10 Mike Fiedler 2018-08-30 15:25:59 EDT
Tested creation (via oc create) of imagestream with 100 imagestreamtags for a docker image with 100 layers.   Creation was successful and subsequent oc import-image successful.  Any other tests you'd like to see?
Comment 11 Mike Fiedler 2018-08-30 15:31:56 EDT
Imagestream + all imagestreamtags in comment 10 created in 2.5 seconds - oc create command itself was 0.2 seconds.   This is on 3.11.0-0.25.0
Comment 12 Ben Parees 2018-08-30 15:38:35 EDT
Sounds great to me, thanks Mike.
Comment 13 Mike Fiedler 2018-08-31 08:21:44 EDT
Verified on 3.11.0-0.25.0.   See comment 10 and comment 11 for the verification scenario.
Comment 15 errata-xmlrpc 2018-10-11 03:22:15 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652

Note You need to log in before you can comment on or make changes to this bug.