Bug 2044633

Summary:	[RFE] Ensure layer isn't unnecessarily re-pushed in c/image
Product:	Red Hat Enterprise Linux 8	Reporter:	Robb Manes <rmanes>
Component:	buildah	Assignee:	Aditya R <arajan>
Status:	CLOSED MIGRATED	QA Contact:	atomic-bugs <atomic-bugs>
Severity:	low	Docs Contact:
Priority:	low
Version:	8.5	CC:	cpippin, dornelas, dwalsh, ltitov, mitr, nalin, pthomas, tsweeney, umohnani, vrothber
Target Milestone:	rc	Keywords:	FutureFeature, MigratedToJIRA
Target Release:	---	Flags:	pm-rhel: mirror+
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-09-11 18:37:44 UTC	Type:	Story
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Robb Manes 2022-01-24 20:53:52 UTC

Description of problem:
As described in (https://github.com/containers/buildah/issues/1048) when an image is already present with buildah in a seperate registry, it will force-push it anyway the first time.

This is better described in this unfinished commit here:

https://github.com/containers/image/pull/550 

In essence, in CI/CD pipelines where buildah is run in a fresh environment every time, it forces a lot of extra traffic and uploading if an image is pushed to a second location where it already exists.  Docker does not do this and will query the second registry for the same image hashes and decide not to push it, whereas c/image will at the moment currently still re-push exactly the same content (the first time).

An example.  Pull an image, tag it, and push it again:

$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/robbmanes/trashcan-image
$ buildah push quay.io/robbmanes/trashcan-image

This does exactly what you expect, and if you try to re-push the image from this system it will skip the appropriate layers:

$ buildah push quay.io/robbmanes/trashcan-image
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists  
Copying config f7e02de757 done  
Writing manifest to image destination
Storing signatures

However, if you run the exact same operation from a new environment (such as in a container or in a CI/CD pipeline) it will re-push the same image despite absolutely nothing changing on the destination:

$ sudo podman run -it --rm --name buildah registry.redhat.io/ubi8/buildah bash
[root@1b1f838c9d5b /]# buildah pull registry.fedoraproject.org/fedora-minimal
[root@1b1f838c9d5b /]# buildah tag registry.fedoraproject.org/fedora-minimal quay.io/robbmanes/trashcan-image
[root@1b1f838c9d5b /]# buildah push quay.io/robbmanes/trashcan-image
Getting image source signatures
Copying blob a3497ca15bbf done  
Copying config f7e02de757 [--------------------------------------] 0.0b / 1.3KiB
Writing manifest to image destination
Storing signatures

Having c/image behave in such a way that it checks the blobs on upload against the destination would prevent this from happening.  Unfortunately, https://github.com/containers/image/pull/550 looks abandoned and may not be the best method moving forward from my (and others) review.

Version-Release number of selected component (if applicable):
Any

How reproducible:
Every time

Actual results:
Layer is not checked for matching content and is re-pushed despite no changes.

Expected results:
Matching layer should be detected and should not force a re-push of the same content to the destination registry.

Additional info:
This is reported in buildah as the component because I didn't know how to file against the main library for images in our container-tools suite (https://github.com/containers/image).

I'll be happy to take a shot at it, time permitting as well.

Comment 4 Nalin Dahyabhai 2022-01-25 21:40:58 UTC

The layer blobs have to be reconstructed when we push them, and while we can reconstruct the layer blob pretty reliably, the copy of a layer blob that we pull from a registry is compressed, and recompressing a layer blob will frequently produce a compressed blob that has a different digest than the compressed blob which we originally pulled down had.  When base images and built images are in different registries, it's possible that each registry's compressed version of a given uncompressed layer blob will have a different digest.

To attempt to handle cases like this, and the general "is the blob already there" case, clients maintain a blob info cache, where they cache digests of the blobs that they've "seen", which repositories they saw those blobs in, and the correlation between their uncompressed and (possibly multiple) compressed digests.

When the first client made its second push attempt, the first client had a cache record indicating that, even if the first client would ordinarily need to recompress the uncompressed blob that the first client had locally, there was already a blob in the registry that had the same content as the uncompressed blob that the first client needed to push, except in an already-compressed form.  The first client updated the manifest that the first client was preparing to write to have the manifest reference the compressed blob, and the first client then skipped uploading and recompressing the uncompressed blob.

The second client had none of that information about the destination repository, so once the second client had checked for the presence of a blob in the repository with the digest of the uncompressed blob and didn't find such a blob, the second client had exhausted the set of information it had about that repository, so the second client pushed the blob it had, recompressing as it went.

We could tweak the cache so that a client would, instead of checking the registry for a blob with the same digest as the uncompressed layer blob it wants to push, and then checking the registry for compressed blobs that it's previously seen in that registry whose digests correspond to the uncompressed layer blob's digest, the client would check the registry for the presence of a blobs with digests matching any compressed version of that blob that the client had seen, anywhere, including the registry from which it had pulled a base image that had been used to build an image that it was now attempting to push to a different registry.  That check could fail more often, of course, so it could be slower for more people, but I think it would avoid pushing the layer in the case you're laying out.

Comment 5 Miloslav Trmač 2022-01-28 15:53:13 UTC

Yes ; a simple existence check for known-compressed-digests (with some heuristic limit on the number of checks) would work in this case.

(Alternatively, right now, `types.SystemContext.DockerRegistryPushPrecomputeDigests` could _in fewer cases_ avoid an upload (when we happen to compress the blob to exactly the on-registry representation — and that circumstance can change at any time for any reason, so it can’t be relied upon if users _neeed_ the efficiency for some reason), but on a fast network that’s _slower_ than uploading the data and having the registry detect a duplicate. So I don’t recommend using or exposing this option.)

Comment 22 Leonid Titov 2022-08-30 11:00:30 UTC

@mitr My customer is proposing this way:

- before actual layer push perform a query to check this layer existence in target repo, as described in HTTP API docs: https://docs.docker.com/registry/spec/api/#pushing-an-image See under "Existing Layers": "The existence of a layer can be checked via a HEAD request to the blob store API. ... When this response is received, the client can assume that the layer is already available in the registry under the given name and should take no further action to upload the layer." 

This check should make sense for 'compressed-on-destination-registry' blob. 

Also, can you elaborate more on subtleties you mentioned? Thanks in advance!

Comment 28 RHEL Program Management 2023-09-11 18:33:03 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 29 RHEL Program Management 2023-09-11 18:37:44 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.

Comment 30 Red Hat Bugzilla 2024-01-10 04:25:08 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days