Bug 1608410
| Summary: | Bulk tagging an image stream (creating large numbers of spec tags) results in some tags not being created | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
| Component: | ImageStreams | Assignee: | Ben Parees <bparees> |
| Status: | CLOSED ERRATA | QA Contact: | Mike Fiedler <mifiedle> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.11.0 | CC: | aos-bugs, jokerman, mifiedle, mmccomas |
| Target Milestone: | --- | ||
| Target Release: | 3.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-11 07:22:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Clayton Coleman
2018-07-25 13:26:10 UTC
Appears to be a timeout on image stream import. I suspect the long running request exception for imagestreamimport got lost in the rebase, OR we suddenly got a bit slower and then hit the long running request timeout (but that shouldn't ever fire, so I think it's the former). I0725 15:37:09.364025 1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable... I0725 15:37:09.364134 1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true... E0725 15:37:39.372550 1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration I0725 15:37:39.453068 1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable... I0725 15:37:39.453092 1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true... E0725 15:38:09.464802 1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration I0725 15:38:09.625433 1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable... I0725 15:38:09.625484 1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true... E0725 15:38:39.645567 1 imagestream_controller.go:133] Error syncing image stream "ci-op-i380vmvm/stable": Timeout: request did not complete within allowed duration I0725 15:38:39.965799 1 imagestream_controller.go:158] Queued import of stream ci-op-i380vmvm/stable... I0725 15:38:39.965830 1 imagestream_controller.go:235] Importing stream ci-op-i380vmvm/stable partial=true... That's a 30s timeout, which shouldn't be applied to image stream import. I also don't see it showing up in long running requests:
$ oc get --raw /metrics | grep longrunning | grep -v WATCH
# HELP apiserver_longrunning_gauge Gauge of all active long-running apiserver requests broken out by verb, API resource, and scope. Not all requests are tracked this way.
# TYPE apiserver_longrunning_gauge gauge
apiserver_longrunning_gauge{resource="pods",scope="namespace",subresource="log",verb="GET"} 0
This is back to the image team - we are not setting a default, so we're timing out at 30s on create because ?timeout= defaults to 30s. We can set a longer timeout. The client doesn't make that easy today. It looks like it takes a lot longer to import than before. is it possible we *should* make it treated as a long running request instead? regardless of whether there is a regression in import time, it seems like imagestreamimport can take an arbitrary length of time depending on the number of tags and the speed of the registry we have to pull metadata from. https://github.com/openshift/origin/pull/20419 increases the timeout. I made changes to ci-operator to bypass this use case for most flows. The image layers change should reduce the amount of time to find the manifest by digest. i'd still like to understand what would have taken us from "occasionally breaching 30s" to "takes 3 minutes", assuming that behavior is consistent. Has the imagestream itself grown (in number of tags)? I believe all the performance improvements are merged now, primarily: https://github.com/openshift/image-registry/pull/101 Tested creation (via oc create) of imagestream with 100 imagestreamtags for a docker image with 100 layers. Creation was successful and subsequent oc import-image successful. Any other tests you'd like to see? Imagestream + all imagestreamtags in comment 10 created in 2.5 seconds - oc create command itself was 0.2 seconds. This is on 3.11.0-0.25.0 Sounds great to me, thanks Mike. Verified on 3.11.0-0.25.0. See comment 10 and comment 11 for the verification scenario. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652 |