Bug 1851543
Summary: | Image digest written to mirror for the elasticsearch operator does not match expected image digest | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Courtney Ruhm <cruhm> |
Component: | Documentation | Assignee: | Michael Burke <mburke> |
Status: | CLOSED NOTABUG | QA Contact: | Anping Li <anli> |
Severity: | medium | Docs Contact: | Latha S <lmurthy> |
Priority: | high | ||
Version: | 4.3.z | CC: | adellape, akhaire, asheth, hfukumot, jcoscia, jdelft, johnny.vanveen, lmeyer, lmurthy, nrevo, pkovar, yuxzhu |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | 4.6.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: |
Feature:
OCP supports multi-arch operators by having operator manifests refer to container images at a manifest list (the target system resolves this to the image manifest for its own platform).
(Warning, "manifest" is overloaded here. Operator manifests are metadata about the operator including its CSV. A "manifest list" or "image manifest" refers to docker entities.)
Reason:
This enables supporting clusters with multiple platforms with a single operator manifest (rather than releasing and consuming a different operator manifest for each platform).
Result:
Mirroring operators to a private location requires mirroring the manifest list, not just images for the target cluster platform. Most registries requires container images for all of the platforms to be mirrored as well in order to mirror the manifest list.
*
*** These changes are relevant for 4.2 through 4.5 (using latter as base): ***
*
https://docs.openshift.com/container-platform/4.5/operators/admin/olm-restricted-networks.html
No change in "Building an Operator catalog image" step 2. "oc adm catalog build" does not appear to handle manifest lists, so a filter is still required there.
Configuring OperatorHub for restricted networks step 2. "oc adm catalog mirror" also does not appear to handle manifest lists (there is an error with --filter-by-os=".*") so I believe it will implicitly filter according to the client system if no filter is given.
But someone who can actually run this should tell us whether it's even possible to use this without --manifests-only since it doesn't appear to directly support mirroring manifest lists at all and they must use "oc image mirror" in the next step to get the job done. In that case, this command doc should be altered to remove [--filter-by-os="<os>/<arch>"] entirely and require --manifests-only.
Step 3b "oc image mirror" should look like this:
$ oc image mirror \
--filter-by-os=".*" \
[-a ${REG_CREDS}] \
-f ./redhat-operators-manifests/mapping.txt
This may be a good place to note that images for all arches are mirrored even if you only need one arch. The only case where they could get away with not doing that is if they don't need any of our multi-arch operators, and I'm not sure that's a use case worth restructuring this whole doc for.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-05-23 15:10:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Courtney Ruhm
2020-06-26 21:39:42 UTC
Putting to low, as not a blocker for 4.5. This issue relates to digests for 4.3.x EO images. registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:150902be5584eb0f4f5fb1e367a95dc829e3e6d46779cf5a3257eb2c5fbbf80a is not found because we didn't ship https://errata.devel.redhat.com/advisory/55232. However due to the current faulty OLM operator metadata push workflow, the reference had been put into distgit then later 4.4 or 4.2 OLM operator metadata push updated that reference to distgit. I think this issue can be closed because later OLM operator metadata pushes have updated the reference so it is no longer current. later 4.4 or 4.2 OLM operator metadata push updated that reference to distgit -> later 4.4 or 4.2 OLM operator metadata push updated that reference in app registry. As mentioned above this is not a bug. Reopening this bug since we have found another instance for the same error. In this case, the issue happened under a OCP 4.5.4 cluster while trying to install the CL stack. Customer validated yesterday that `ose-elasticsearch-operator` with SHA 772ade6ee79fd75d04a9a1c2bb8c92c2900c305e5f5c868e59b99643c9b515d9 was available from `registry.redhat.io`, although, a few days back, when case was opened on 7/8, the image was not available/found This is the message from the case when customer tried to pull the ES operator image. ~~~ ~ % docker pull registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:772ade6ee79fd75d04a9a1c2bb8c92c2900c305e5f5c868e59b99643c9b515d9 Error response from daemon: error parsing HTTP 404 response body: invalid character '<' looking for beginning of value: "<HTML><HEAD><TITLE>Error</TITLE></HEAD><BODY>\nAn error occurred while processing your request.<p>\nReference #132.8ce50b17.1596815551.81f218\n</BODY></HTML>\n" ~~~ After the image was available yesterday, customer noticed that the image Digest changed for that image when mirrored to their internal registry (NEXUS). Images for elasticsearch-operator on channel 4.5 [ "registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:772ade6ee79fd75d04a9a1c2bb8c92c2900c305e5f5c868e59b99643c9b515d9", "registry.redhat.io/openshift4/ose-elasticsearch-proxy@sha256:c3dcaabf92984f305b5374018c58cac18525418969b9ae2122bfdb9d9e7f81d5", "registry.redhat.io/openshift4/ose-logging-elasticsearch6@sha256:76a2e6073f0cd2a9c02fcdd4578e477136da51c955dc7583fe72702bd16f4515", "registry.redhat.io/openshift4/ose-logging-kibana6@sha256:0b7f2c19f44b73739c8c3925d2f305047c979a943fa942b4bea2ab776943f4cf", "registry.redhat.io/openshift4/ose-oauth-proxy@sha256:6db2efb24cf572508af74ee155b2437c60ebb40f52619320038dec2ab5553413" ] Example: * This is a custom script customer built ~~~ ./check_digest.sh 772ade6ee79fd75d04a9a1c2bb8c92c2900c305e5f5c868e59b99643c9b515d9 SOURCE: sha256:772ade6ee79fd75d04a9a1c2bb8c92c2900c305e5f5c868e59b99643c9b515d9 DEST: sha256:65ac828ee5bff80547fd2045b4d0730cadaa16e22f7b07a757817305891b5fdd ~~~ As you can see the image on the SOURCE registry (Red Hat) is different from the DEST registry (NEXUS) This happened to all images for elasticsearch-operator and cluster-logging. As a workaround, customer had to edit the `ClusterServiceVersion` for both Operators to point to the digest value in their mirror registry. "mistakes were made" last week and 4.5 metadata was out of sync with the images available until Monday. Things should be synced now, else file more bugs.
> customer noticed that the image Digest changed for that image when mirrored to their internal registry (NEXUS).
What they have done is sync the single architecture image instead of the manifest list (operator manifests point at manifest list shasums).
--------------
$ oc image info registry-proxy.engineering.redhat.com/rh-osbs/openshift-ose-elasticsearch-operator@sha256:772ade6ee79fd75d04a9a1c2bb8c92c2900c305e5f5c868e59b99643c9b515d9
error: the image is a manifest list and contains multiple images - use --filter-by-os to select from:
OS DIGEST
linux/amd64 sha256:65ac828ee5bff80547fd2045b4d0730cadaa16e22f7b07a757817305891b5fdd
--------------
Assuming they are using oc image mirror they need to include the `--filter-by-os=.*` parameter to get the manifest list as well.
I don't think this is a bug (now), but feel free to continue to reopen if it needs something.
Thanks for the pointer Luke. Customer tested with the `--filter-by-os=.*` parameter and the procedure worked. I believe we could update the documentation [1] adding this parameter. What do you think ? [1] https://docs.openshift.com/container-platform/4.5/operators/olm-restricted-networks.html#olm-restricted-networks-operatorhub_olm-restricted-networks (step 3.b) (In reply to Hideshi Fukumoto from comment #14) > @Luke, > > Thank you for your help. > I've re-read your comment on this issue, and I understand that > > 1. The documentation should be improved. > => OK, please improve the doc ASAP in this bug. > If you need another bug to do it, please let us know. I think this is the bug to do that :) I am not the expert on this but I will ask experts to review. Editing "Docs text" to indicate the changes. > > 2. There is currently no way to reduce the storage space by not copying > unnecessary > architecture's image/something-else, and it's the current specification. > => However, our customer wants to avoid wasting disk space and transfer > time by > copying unnecessary architecture's images. > Can you handle it in this bug ? > If not, should we file new RFE request on it? That should be a new RFE. And I have to say up front, it will require either a significant architectural change or a registry that's more relaxed about data integrity. If you can find a registry that will host the manifest list without requiring all the manifests to be present, then you can save the space for the superfluous arches. Otherwise, we'll either need an architectural revision to make the operator manifests support multiple arches, or publish multiple manifests to multiple channels, or something else I haven't thought of. This problem doesn't go away with bundle format in 4.6+. The commands probably change but references are still to manifest lists. (In reply to Luke Meyer from comment #20) @luke Thanks for your answers. > (In reply to Hideshi Fukumoto from comment #14) > > @Luke, > > > > Thank you for your help. > > I've re-read your comment on this issue, and I understand that > > > > 1. The documentation should be improved. > > => OK, please improve the doc ASAP in this bug. > > If you need another bug to do it, please let us know. > > I think this is the bug to do that :) I am not the expert on this but I will > ask experts to review. > Editing "Docs text" to indicate the changes. I personally believe that the content in the "Docs text" is used in the next Release Note to reflect as "Bug fixes", "Known issues", "Enhancement". Therefore, in this case, I think it's better to change the explanation in "OCP Operators" guide[1] directly (and append some additional notes if necessary) | Step 3b "oc image mirror" should look like this: | | $ oc image mirror \ | --filter-by-os=".*" \ | [-a ${REG_CREDS}] \ | -f ./redhat-operators-manifests/mapping.txt [1] https://docs.openshift.com/container-platform/4.5/operators/olm-restricted-networks.html#olm-restricted-networks-operatorhub_olm-restricted-networks (step 3.b) > > 2. There is currently no way to reduce the storage space by not copying > > unnecessary > > architecture's image/something-else, and it's the current specification. > > => However, our customer wants to avoid wasting disk space and transfer > > time by > > copying unnecessary architecture's images. > > Can you handle it in this bug ? > > If not, should we file new RFE request on it? > > That should be a new RFE. Okay, I'll file a new RFE on it. > And I have to say up front, it will require either > a significant architectural change or a registry that's more relaxed about > data integrity. > > If you can find a registry that will host the manifest list without > requiring all the manifests to be present, then you can save the space for > the superfluous arches. I'm sorry but I do not understand your idea above. So, if it's a realistic procedure for our customers to implement at this time, please let us about it. @Luke One of my CU has a restricted environment. OCP Version is 4.5.8. CU was facing challenges while installing elasticsearch operator, hence CU followed the article[1]. After following the article[1], CU was able to successfully install elasticsearch operator. Now, CU is facing issues for other operators as well. I would like to know that is it expected behavior? For CU manually deleting the CSV and then deleting the deployment is not a practical solution. Can you advise? [1] https://access.redhat.com/solutions/5200741 Hideshi -- Forgive my ignorance on this issue. It is not entirely clear to me what the changes you requested are, especially considering that the section you commented on has been removed. It seems that the information now appears in * "Mirroring an Operator catalog" [1] for 4.6-4.8 * "Mirroring Operator catalogs for use with disconnected clusters" in the "Mirroring catalog contents to airgapped registries" subsection [2] for 4.9+. You are suggesting that for multi-arch environments that the ---- $ oc adm catalog mirror \ <index_image> \ <mirror_registry>:<port>/<namespace> \ [-a ${REG_CREDS}] \ [--insecure] \ [--index-filter-by-os='<platform>/<arch>'] \ [--manifests-only] ---- should be ---- $ oc adm catalog mirror \ <index_image> \ <mirror_registry>:<port>/<namespace> \ [-a ${REG_CREDS}] \ [--insecure] \ [--manifests-only] ---- And ---- $ oc adm catalog mirror \ <index_image> \ file:///local/index \ -a ${REG_CREDS} \ --insecure \ --index-filter-by-os='<platform>/<arch>' ---- Should be: ---- $ oc image mirror \ --filter-by-os=".*" \ [-a ${REG_CREDS}] \ -f ./redhat-operators-manifests/mapping.txt ---- However, the `oc adm catalog mirror` is in a distinct section, Mirroring catalog contents to registries on the same network [3], in the 4.9+ docs. Any thoughts on how to proceed? Thank you, Michael [1] https://docs.openshift.com/container-platform/4.8/operators/admin/olm-restricted-networks.html#olm-mirror-catalog_olm-restricted-networks [2] https://docs.openshift.com/container-platform/4.10/installing/disconnected_install/installing-mirroring-installation-images.html#olm-mirror-catalog-colocated_installing-mirroring-installation-images [3] https://docs.openshift.com/container-platform/4.10/installing/disconnected_install/installing-mirroring-installation-images.html#olm-mirror-catalog-colocated_installing-mirroring-installation-images The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |