Bug 1850781 - Cincinnati memory consumption does not scale with multiple releases
Summary: Cincinnati memory consumption does not scale with multiple releases
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OpenShift Update Service
Version: 4.6
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
: ---
Assignee: Pratik Mahajan
QA Contact: Yang Yang
URL: https://issues.redhat.com/browse/OTA-214
Whiteboard:
: 1924648 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-24 21:11 UTC by Chad Crum
Modified: 2023-03-09 01:27 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-09 01:27:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cincinnati pull 475 0 None open Bug 1850781: [memory] define scope while running graph-builder to free up memory 2021-07-02 22:41:48 UTC

Comment 3 W. Trevor King 2020-11-20 17:13:50 UTC
Moving under OCP, now that we are back to being a sub-component instead of a parallel (in Bugzilla) project.

Comment 5 W. Trevor King 2021-03-03 20:51:47 UTC
*** Bug 1924648 has been marked as a duplicate of this bug. ***

Comment 9 Yang Yang 2021-09-29 01:42:10 UTC
The OOM issue is not reproduced with osus-v4.9.0-1. But dropping both release images and the referenced images into the same repository causes graph-builder creation issue. Re-opening it.

Comment 10 W. Trevor King 2021-10-20 16:59:41 UTC
Copying out from [1]:

This [2] seems like the spot to insert the label guard, unless we need to drop down into the manifest-referenced config.  Looks like dkregistry 0.5.0 doesn't provide convenient access to the config structure [3].  Poking around on Quay:

$ curl -s https://quay.io/api/v1/repository/openshift-release-dev/ocp-release/tag/ | jq -r '.tags[] | select(.expiration == null) | .manifest_digest + " " + .name' | head -n1
sha256:77f249996a21dac32ca466d9b846753c9e5c96bf0898216dc5c0b0093d86508a 4.6.22-ppc64le
$ curl -s https://quay.io/api/v1/repository/openshift-release-dev/ocp-release/manifest/sha256:77f249996a21dac32ca466d9b846753c9e5c96bf0898216dc5c0b0093d86508a | jq '.manifest_data | fromjson | {mediaType, config}'
{
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": 1715,
    "digest": "sha256:13351fed391fcc6aad6360d7238899ca0c75bb759a84443577160deab0bc7ee6"
  }
}
$ curl -sL https://quay.io/v2/openshift-release-dev/ocp-release/blobs/sha256:13351fed391fcc6aad6360d7238899ca0c75bb759a84443577160deab0bc7ee6 | jq -r '.config.Labels["io.openshift.release"]'
4.6.22

So, yes, this is available via the config structure, but we may need to teach dkregistry about config-fetching in order to make use of it.  Then we teach Cincinnati to only walk layers when the io.openshift.release label is set, and that saves us the memory hit of walking all the non-release-image layers, and will be able to handle registry repos that include arbitrary numbers of those non-release images alongside the release-images Cincinnati actually cares about.

[1]: https://issues.redhat.com/browse/OTA-214
[2]: https://github.com/openshift/cincinnati/blob/9a55fde2c8a7746d1b8e99d72e3ce5daa7aba837/cincinnati/src/plugins/internal/graph_builder/release_scrape_dockerv2/registry/mod.rs#L294
[3]: https://docs.rs/dkregistry/0.5.0/dkregistry/v2/manifest/enum.Manifest.html#implementations

Comment 12 Ash Westbrook 2022-07-21 18:06:34 UTC
This is still present on the current release. While the skopeo workaround, https://github.com/openshift/cincinnati-operator/blob/master/docs/disconnected-updateservice-operator.md#oomkilled<, works for but will need to be run after every oc-mirror sync to grab the latest release images and will eventually fail for the same reasons as the number of release images increases.

When can we expect this to be fixed?

Comment 13 W. Trevor King 2022-07-28 08:00:02 UTC
(In reply to Ash Westbrook from comment #12)
> ...will eventually fail for the same reasons as the number of release images increases.

Our current production Cincinnati pods are consuming around 400 MiB of memory, and that's running in quay.io/openshift-release-dev/ocp-release with something like 4k tagged release images (and very few tags pointing at non-release images).  The scaling for non-release images is so bad because Cincinnati is walking each layer hoping to find the files that mark that image as a release image, and release layers can be large.  But for actual release images, the metadata is (almost) always in the top-most layer, and that layer is very small.

So we still want to fix this because isolating release images in their own registry repo is annoying, but I'm not too worried about melting down under the weight of the release images alone.  Am I missing something?

Comment 15 Shiftzilla 2023-03-09 01:27:49 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9687


Note You need to log in before you can comment on or make changes to this bug.