ci-operator creates a single image stream with 50+ tags in a single call. On 3.10 this works fine, but when we tried to upgrade api.ci to 3.11, some of the tags did not get imported (the spec tag was set, but the status tag was missing). Investigating now, my first suspect is the collapseStatusTags function added when image layers were added. But since everything happens in one call, I somewhat doubt that is the cause. Blocks upgrading api.ci to 3.11.
Appears to be a timeout on image stream import. I suspect the long running request exception for imagestreamimport got lost in the rebase, OR we suddenly got a bit slower and then hit the long running request timeout (but that shouldn't ever fire, so I think it's the former). I0725 15:37:09.364025 1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable... I0725 15:37:09.364134 1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true... E0725 15:37:39.372550 1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration I0725 15:37:39.453068 1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable... I0725 15:37:39.453092 1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true... E0725 15:38:09.464802 1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration I0725 15:38:09.625433 1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable... I0725 15:38:09.625484 1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true... E0725 15:38:39.645567 1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration I0725 15:38:39.965799 1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable... I0725 15:38:39.965830 1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true...
That's a 30s timeout, which shouldn't be applied to image stream import. I also don't see it showing up in long running requests: $ oc get --raw /metrics | grep longrunning | grep -v WATCH # HELP apiserver_longrunning_gauge Gauge of all active long-running apiserver requests broken out by verb, API resource, and scope. Not all requests are tracked this way. # TYPE apiserver_longrunning_gauge gauge apiserver_longrunning_gauge{resource="pods",scope="namespace",subresource="log",verb="GET"} 0
This is back to the image team - we are not setting a default, so we're timing out at 30s on create because ?timeout= defaults to 30s. We can set a longer timeout. The client doesn't make that easy today.
It looks like it takes a lot longer to import than before.
is it possible we *should* make it treated as a long running request instead? regardless of whether there is a regression in import time, it seems like imagestreamimport can take an arbitrary length of time depending on the number of tags and the speed of the registry we have to pull metadata from.
https://github.com/openshift/origin/pull/20419 increases the timeout. I made changes to ci-operator to bypass this use case for most flows. The image layers change should reduce the amount of time to find the manifest by digest.
i'd still like to understand what would have taken us from "occasionally breaching 30s" to "takes 3 minutes", assuming that behavior is consistent. Has the imagestream itself grown (in number of tags)?
I believe all the performance improvements are merged now, primarily: https://github.com/openshift/image-registry/pull/101
Tested creation (via oc create) of imagestream with 100 imagestreamtags for a docker image with 100 layers. Creation was successful and subsequent oc import-image successful. Any other tests you'd like to see?
Imagestream + all imagestreamtags in comment 10 created in 2.5 seconds - oc create command itself was 0.2 seconds. This is on 3.11.0-0.25.0
Sounds great to me, thanks Mike.
Verified on 3.11.0-0.25.0. See comment 10 and comment 11 for the verification scenario.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652