Bug 1850781
| Summary: | Cincinnati memory consumption does not scale with multiple releases | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Chad Crum <ccrum> |
| Component: | OpenShift Update Service | Assignee: | Pratik Mahajan <pmahajan> |
| OpenShift Update Service sub component: | operand | QA Contact: | Yang Yang <yanyang> |
| Status: | CLOSED DEFERRED | Docs Contact: | |
| Severity: | medium | ||
| Priority: | low | CC: | awestbro, bleanhar, ccrum, jiajliu, lmohanty, wking, yanyang |
| Version: | 4.6 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| URL: | https://issues.redhat.com/browse/OTA-214 | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-03-09 01:27:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 3
W. Trevor King
2020-11-20 17:13:50 UTC
*** Bug 1924648 has been marked as a duplicate of this bug. *** The OOM issue is not reproduced with osus-v4.9.0-1. But dropping both release images and the referenced images into the same repository causes graph-builder creation issue. Re-opening it. Copying out from [1]: This [2] seems like the spot to insert the label guard, unless we need to drop down into the manifest-referenced config. Looks like dkregistry 0.5.0 doesn't provide convenient access to the config structure [3]. Poking around on Quay: $ curl -s https://quay.io/api/v1/repository/openshift-release-dev/ocp-release/tag/ | jq -r '.tags[] | select(.expiration == null) | .manifest_digest + " " + .name' | head -n1 sha256:77f249996a21dac32ca466d9b846753c9e5c96bf0898216dc5c0b0093d86508a 4.6.22-ppc64le $ curl -s https://quay.io/api/v1/repository/openshift-release-dev/ocp-release/manifest/sha256:77f249996a21dac32ca466d9b846753c9e5c96bf0898216dc5c0b0093d86508a | jq '.manifest_data | fromjson | {mediaType, config}' { "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "config": { "mediaType": "application/vnd.docker.container.image.v1+json", "size": 1715, "digest": "sha256:13351fed391fcc6aad6360d7238899ca0c75bb759a84443577160deab0bc7ee6" } } $ curl -sL https://quay.io/v2/openshift-release-dev/ocp-release/blobs/sha256:13351fed391fcc6aad6360d7238899ca0c75bb759a84443577160deab0bc7ee6 | jq -r '.config.Labels["io.openshift.release"]' 4.6.22 So, yes, this is available via the config structure, but we may need to teach dkregistry about config-fetching in order to make use of it. Then we teach Cincinnati to only walk layers when the io.openshift.release label is set, and that saves us the memory hit of walking all the non-release-image layers, and will be able to handle registry repos that include arbitrary numbers of those non-release images alongside the release-images Cincinnati actually cares about. [1]: https://issues.redhat.com/browse/OTA-214 [2]: https://github.com/openshift/cincinnati/blob/9a55fde2c8a7746d1b8e99d72e3ce5daa7aba837/cincinnati/src/plugins/internal/graph_builder/release_scrape_dockerv2/registry/mod.rs#L294 [3]: https://docs.rs/dkregistry/0.5.0/dkregistry/v2/manifest/enum.Manifest.html#implementations This is still present on the current release. While the skopeo workaround, https://github.com/openshift/cincinnati-operator/blob/master/docs/disconnected-updateservice-operator.md#oomkilled<, works for but will need to be run after every oc-mirror sync to grab the latest release images and will eventually fail for the same reasons as the number of release images increases. When can we expect this to be fixed? (In reply to Ash Westbrook from comment #12) > ...will eventually fail for the same reasons as the number of release images increases. Our current production Cincinnati pods are consuming around 400 MiB of memory, and that's running in quay.io/openshift-release-dev/ocp-release with something like 4k tagged release images (and very few tags pointing at non-release images). The scaling for non-release images is so bad because Cincinnati is walking each layer hoping to find the files that mark that image as a release image, and release layers can be large. But for actual release images, the metadata is (almost) always in the top-most layer, and that layer is very small. So we still want to fix this because isolating release images in their own registry repo is annoying, but I'm not too worried about melting down under the weight of the release images alone. Am I missing something? OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-9687 |