Bug 1785390
| Summary: | [api.ci 4.x blocker] containers/image: Per-image pull locking for bandwidth efficiency | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> | |
| Component: | Node | Assignee: | Valentin Rothberg <vrothber> | |
| Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.4 | CC: | aos-bugs, ccoleman, jokerman, lsm5, nagrawal, pthomas, rphillips, vrothber | |
| Target Milestone: | --- | |||
| Target Release: | 4.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | cri-o-1.18.1-1.dev.rhaos4.5.git60ac541.el8 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1838167 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-13 17:12:48 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1838167 | |||
This is going to completely destroy api.ci when we move to 4.x clusters (I didn't realize we didn't have this). We currently pull 5-15GB of images each PR job, if we're doubling or tripling those big layers in pulls we could completely explode the cluster. This needs serious attention - bumping to high, I need some attention given to this in 4.4. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |
Promoting [1,2] to a bug, for easier Red Hat prioritization/tracking. Description If there is an existing libpod pull in flight for a remote image, new pulls of that image should block until the in-flight pull completes (it may error out) to avoid shipping the same bits over the network twice. Steps to reproduce the issue: In one terminal: $ podman pull docker.io/library/centos:6 Trying to pull docker.io/library/centos:6...Getting image source signatures Copying blob sha256:9bfcefca2b8da38bbfb8b6178a75f05245688b83fda45578bcdf51f56e4a5a9e 66.60 MB / 66.60 MB [=====================================================] 13s Copying config sha256:0cbf37812bff083eb2325468c10aaf82011527c049d66106c3c74298ed239aaf 2.60 KB / 2.60 KB [========================================================] 0s Writing manifest to image destination Storing signatures 0cbf37812bff083eb2325468c10aaf82011527c049d66106c3c74298ed239aaf In another terminal, launched once blob sha256:9bfce... is maybe 10MB into it's pull: $ podman run --rm docker.io/library/centos:6 echo hi Trying to pull docker.io/library/centos:6...Getting image source signatures Copying blob sha256:9bfcefca2b8da38bbfb8b6178a75f05245688b83fda45578bcdf51f56e4a5a9e 66.60 MB / 66.60 MB [======================================================] 8s Copying config sha256:0cbf37812bff083eb2325468c10aaf82011527c049d66106c3c74298ed239aaf 2.60 KB / 2.60 KB [========================================================] 0s Writing manifest to image destination Storing signatures hi Describe the results you received: As you can see from the console output, both commands seem to have pulled both layers in parallel. Describe the results you expected: I'd rather have seen the second command print a message about blocking on an existing pull, idle while that pull went through, and then run the command using the blobs pushed into local storage by that first pull. [1]: https://github.com/containers/libpod/issues/1911 [2]: https://github.com/containers/image/pull/611