Bug 1806707
| Summary: | podman pull doesn't maintain manifest information on disk correctly | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Alex Schultz <aschultz> | ||||
| Component: | podman | Assignee: | Matthew Heon <mheon> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | atomic-bugs <atomic-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 8.1 | CC: | bbaude, bdobreli, dornelas, dwalsh, jligon, jnovy, lsm5, mheon, mitr, nalin, sbaker | ||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-06-03 19:51:00 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1186913, 1804045 | ||||||
| Attachments: |
|
||||||
|
Description
Alex Schultz
2020-02-24 19:29:14 UTC
I think this may be confusion between digest and image ID? The digest is SHA256 of the layer, but it is not the identifier that Podman and c/storage use to identify the image; that's a random 256-bit integer. You can pull and refer to images by their SHA256 digest, but most output (inspect, images, etc) will refer to them by the image ID. So we're taking the manifest and looking for the digest in the layers.json and not finding it in compressed-diff-digest or diff-digest. In this case "sha256:933041e5135b6d2775e5def32a99ce221eefab64b05ce9b300f51612b434b4b8" is not in the layers.json Can I ask why you're directly working with layers.json? I don't believe we document that file as a public interface, so I don't think we're prepared to make any stability guarantees about it and its contents. Is what you're trying to do not available from the CLI? Are there performance reasons to avoid Podman or Buildah commands here? (In reply to Matthew Heon from comment #1) > I think this may be confusion between digest and image ID? The digest is > SHA256 of the layer, but it is not the identifier that Podman and c/storage > use to identify the image; that's a random 256-bit integer. You can pull and > refer to images by their SHA256 digest, but most output (inspect, images, > etc) will refer to them by the image ID. For Docker v2s2 and OCI images, the image ID is computed as the digest of the image's configuration blob, and the image's digest (so far as there is such a thing) is the digest of the image's manifest. The computations are more involved for Docker v2s1 images, but for a given image the values for each should always be consistent. We've had to implement out own container registry which requires us to serve up layers. In order to upload the layers from a locally stored container, we're using the existing manifest on disk to find and export the layers. Alex, First of all, why do you care? The data stored in /var/lib/containers/storage is not a documented ABI with any expectation of stability. What is the end-user-visible impact of this situation (whether or not it is correct)? - https://bugzilla.redhat.com/show_bug.cgi?id=1804045#c0 shows a docker:/… syntax that causes an incorrect upload URL to be formed (by some client that is not checking for a valid tag format, / are invalid in tags) - https://bugzilla.redhat.com/show_bug.cgi?id=1804045#c2 shows a blob upload initiation at POST /v2/trilio/trilio-datamover/blobs/uploads/ failing with 404 The two are fairly different failures, and it’s not obvious to me how they relate at all to layer IDs. --- Second, the assumption that layer blob (“compressed”) digests are maintained in the layer metadata so that layers can be found using that data is, as of the current implementation, plainly incorrect. Layers are deduplicated by their IDs (~ usually the sequence of DiffIDs of the layer and all its parents), so the “same” layer can easily be pulled from two or more different sources, with two or more layer blob (“compressed”) digests; but there is only one "compressed-diff-digest" value per layer, so if any differently-compressed layer duplicate is downloaded, that duplicate different compressed digest will not be recorded anywhere. (And we are intentionally not editing the stored manifests on deduplication, we want to preserve the originals along with signatures. The manifests don’t exist in local storage to look up local layers.) --- (In reply to Alex Schultz from comment #5) > We've had to implement out own container registry which requires us to serve > up layers. In order to upload the layers from a locally stored container, > we're using the existing manifest on disk to find and export the layers. Do I understand correctly that there is have a docker/distribution server that serves the contents of a local containers-storage store? 1) Still, the only reasonable way to access the store is via the containers/storage library. So, that implies Go, and at that point you might as well use containers/image — ideally via the top-level copy.Image pipeline to copy images as units, or if not that and you have to serve the docker/distribution protocol, doing everything that copy.Image does (in this area, note LayerInfosForCopy at least). 2) For such a registry, the compressed digests seem very useless as a concept anyway, because the c/storage store only stores uncompressed layers, and it’s not in general possible to recreate the original compressed layers from the uncompressed state (as compression implementations change over time). > What is the end-user-visible impact of this situation (whether or not it is > correct)? > - https://bugzilla.redhat.com/show_bug.cgi?id=1804045#c0 shows a docker:/… > syntax that causes an incorrect upload URL to be formed (by some client that > is not checking for a valid tag format, / are invalid in tags) This was due to improper usage of cli parameters. > - https://bugzilla.redhat.com/show_bug.cgi?id=1804045#c2 shows a blob upload > initiation at POST /v2/trilio/trilio-datamover/blobs/uploads/ failing with > 404 > Yes the code checks if we can perform an upload, if it gets a 404 then it cannot upload it's the local implementation where we need to construct blobs and manage them on the local file system. > The two are fairly different failures, and it’s not obvious to me how they > relate at all to layer IDs. Neither of these errors are related to this specific bug. The issue is later in that we need to be able to take a local container and publish it. > > > --- > > Second, the assumption that layer blob (“compressed”) digests are maintained > in the layer metadata so that layers can be found using that data is, as of > the current implementation, plainly incorrect. > > Layers are deduplicated by their IDs (~ usually the sequence of DiffIDs of > the layer and all its parents), so the “same” layer can easily be pulled > from two or more different sources, with two or more layer blob > (“compressed”) digests; but there is only one "compressed-diff-digest" value > per layer, so if any differently-compressed layer duplicate is downloaded, > that duplicate different compressed digest will not be recorded anywhere. > This has been working fine for all of the other containers we build/manage when fetching from the red hat registry. I'm uncertain what is different about the reported container. IN this instance we didn't already have any of the layers for this image so I'm not sure why the digests would have changed. > (And we are intentionally not editing the stored manifests on deduplication, > we want to preserve the originals along with signatures. The manifests don’t > exist in local storage to look up local layers.) I'm not sure why the de-deplication wouldn't also be based on layers themselves (thus reusing the same digest ids). Are you folks doing further de-duplication on the data itself vs at a layer/blob level? > > --- > > (In reply to Alex Schultz from comment #5) > > We've had to implement out own container registry which requires us to serve > > up layers. In order to upload the layers from a locally stored container, > > we're using the existing manifest on disk to find and export the layers. > > Do I understand correctly that there is have a docker/distribution server > that serves the contents of a local containers-storage store? No. We have implemented a registry that serves blobs up in a read-only format similar to the docker registry api. However because it's read-only (no push is allowed), we load it by managing blobs and using the manifest metadata for lookups. > > 1) Still, the only reasonable way to access the store is via the > containers/storage library. So, that implies Go, and at that point you > might as well use containers/image — ideally via the top-level copy.Image > pipeline to copy images as units, or if not that and you have to serve the > docker/distribution protocol, doing everything that copy.Image does (in this > area, note LayerInfosForCopy at least). > The project is python only. Much of this code was written prior to the existence of many of the export abilities of buildah/podman. > 2) For such a registry, the compressed digests seem very useless as a > concept anyway, because the c/storage store only stores uncompressed layers, > and it’s not in general possible to recreate the original compressed layers > from the uncompressed state (as compression implementations change over > time). That's not been our experience. FWIW, here's our export code to write our blobs. https://github.com/openstack/tripleo-common/blob/master/tripleo_common/image/image_export.py#L80 Our code to extract the layers is https://github.com/openstack/tripleo-common/blob/bbb3d193b7bbc96f9aad77fdd5d050715d54f121/tripleo_common/image/image_uploader.py#L2000 I'm well aware this is not ideal, we need to be able to export the local images that may be built so that we can provide them with a docker api interface. (In reply to Alex Schultz from comment #8) > FWIW, here's our export code to write our blobs. > https://github.com/openstack/tripleo-common/blob/master/tripleo_common/image/ > image_export.py#L80 > > Our code to extract the layers is > https://github.com/openstack/tripleo-common/blob/ > bbb3d193b7bbc96f9aad77fdd5d050715d54f121/tripleo_common/image/image_uploader. > py#L2000 > > I'm well aware this is not ideal, we need to be able to export the local > images that may be built so that we can provide them with a docker api > interface. If you can migrate this to running `skopeo copy containers-storage:image-name-or-id ...`, I would highly recommend that. The current implementation makes some assumptions that I don't think can or should be depended on. We previously had a skopeo version and it was unusable due to slow speed. (In reply to Alex Schultz from comment #8) > > Second, the assumption that layer blob (“compressed”) digests are maintained > > in the layer metadata so that layers can be found using that data is, as of > > the current implementation, plainly incorrect. > > > > Layers are deduplicated by their IDs (~ usually the sequence of DiffIDs of > > the layer and all its parents), so the “same” layer can easily be pulled > > from two or more different sources, with two or more layer blob > > (“compressed”) digests; but there is only one "compressed-diff-digest" value > > per layer, so if any differently-compressed layer duplicate is downloaded, > > that duplicate different compressed digest will not be recorded anywhere. > > This has been working fine for all of the other containers we build/manage > when fetching from the red hat registry. The implementation got lucky. --- > I'm uncertain what is different > about the reported container. I think the implementation just isn’t lucky any longer. But if you want to recreate this at leisure: > $ skopeo copy --dest-compress --dest-compress-format=gzip docker://busybox dir:t1 > $ skopeo copy --dest-compress --dest-compress-format=zstd docker://busybox dir:t2 > $ podman pull dir:t1 > $ podman pull dir:t2 In the current implementation (which is NOT a commitment and NOT a behavior to be relied upon), the layer metadata comes from t1 (gzip compression) and the manifest from t2 (Zstd compression). --- > IN this instance we didn't already have any > of the layers for this image so I'm not sure why the digests would have > changed. (If you want to dig into this, a collecting all image manifests and the full layer database might be useful — and even that might not be sufficient if some of the layers were previously downloaded by pulling an image that was later deleted without deleting the layers. If this really happens when the layer is not duplicated, i.e. you can reproduce this starting with an empty container storage, pulling a single image from a remote registry, and doing nothing else to the storage, that would be surprising and somewhat interesting. But the ultimate answer is very likely going to be “don’t read those files directly” all the same.) --- > > (And we are intentionally not editing the stored manifests on deduplication, > > we want to preserve the originals along with signatures. The manifests don’t > > exist in local storage to look up local layers.) > > I'm not sure why the de-deplication wouldn't also be based on layers > themselves (thus reusing the same digest ids). I’m not sure what “based on layers themselves” means. A single DiffID == uncompressed layer digest can correspond to many different compressed digests. If you are suggesting that the deduplication should happen based on compressed digest values instead of uncompressed digest values, 1) that would waste space for no benefit 2) locally-built images don’t have compressed digests; they only start to exist after compression = when pushing the image to a registry. --- > > (In reply to Alex Schultz from comment #5) > No. We have implemented a registry that serves blobs up in a read-only > format similar to the docker registry api. However because it's read-only > (no push is allowed), we load it by managing blobs and using the manifest > metadata for lookups. As mentioned before, the manifest metadata is not maintained to be authoritative for references to image’s layers in local storage. > > 1) Still, the only reasonable way to access the store is via the > > containers/storage library. So, that implies Go, and at that point you > > might as well use containers/image — ideally via the top-level copy.Image > > pipeline to copy images as units, or if not that and you have to serve the > > docker/distribution protocol, doing everything that copy.Image does (in this > > area, note LayerInfosForCopy at least). > > The project is python only. Much of this code was written prior to the > existence of many of the export abilities of buildah/podman. Buildah/Podman push/pull images via containers/image, so the code to copy images in/out has necessarily existed before. Sure, various CLI options and push/pull optimizations have only been added over time. > > 2) For such a registry, the compressed digests seem very useless as a > > concept anyway, because the c/storage store only stores uncompressed layers, > > and it’s not in general possible to recreate the original compressed layers > > from the uncompressed state (as compression implementations change over > > time). > > That's not been our experience. You’ve been lucky, and you aren’t lucky any more. > FWIW, here's our export code to write our blobs. > https://github.com/openstack/tripleo-common/blob/master/tripleo_common/image/ > image_export.py#L80 > > Our code to extract the layers is > https://github.com/openstack/tripleo-common/blob/ > bbb3d193b7bbc96f9aad77fdd5d050715d54f121/tripleo_common/image/image_uploader. > py#L2000 If I understand _copy_layer_local_to_registry , that is actually prepared for the result having a different digest (it calls _copy_stream_to_registry with verify_digest=false, and export_stream renames the created file in that case to match). It’s ”just” _copy_local_to_registry that 1) makes invalid assumptions about the relationships between images/manifests/layers (which seem to happen in other places as well) 2) throws away the digest returned by _copy_layer_local_to_registry. Looking at https://github.com/openstack/tripleo-common/blob/bbb3d193b7bbc96f9aad77fdd5d050715d54f121/tripleo_common/image/image_uploader.py#L1971 , that’s just not going to work. It ISN’T POSSIBLE to make this work. Compare https://github.com/containers/image/pull/157 . (In reply to Alex Schultz from comment #10) > We previously had a skopeo version and it was unusable due to slow speed. FWIW there is https://github.com/NicolasT/static-container-registry ; but if Skopeo is slow for some reason, this probably isn’t going to be any faster. (In reply to Alex Schultz from comment #8) > FWIW, here's our export code to write our blobs. > https://github.com/openstack/tripleo-common/blob/master/tripleo_common/image/ > image_export.py#L80 For the record, I can’t see anything locking the files against concurrent modification; this may well be faster, but it’s also unsafe unless you can somehow ensure that nothing else is accessing the store. |