Bug 1833422

Summary: missing 4.3.8 tag for operator-registry image
Product: OpenShift Container Platform Reporter: Rutvik <rkshirsa>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OperatorHub QA Contact: Jian Zhang <jiazha>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: high CC: aos-bugs, baptiste.millemathias, bluddy, cswanson, danili, ecordell, fbrychta, jmalde, jokerman, nhale, rkshirsa, vlaad
Version: 4.3.z   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1840889 (view as bug list) Environment:
Last Closed: 2020-06-19 13:17:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rutvik 2020-05-08 16:09:00 UTC
Description of problem:

There is an where the customer wanted to build the OLM catalog for their disconnected environment.


Situation: 1

Based on the given documents[0, 1], it is recommended to use SHA ID instead of TAGs. 
However, while building the app registry for a disconnected environment using the below command, it fails with HTTP 404 "File not found error". We even tried to apply tag like v4.3.8 but faced the same error.

Then we tried to pull the same image with SHA ID of cluster equivalent version i.e 4.3.8 in this case and we can see the same error.

# oc adm catalog build --appregistry-org redhat-operators --from=registry.redhat.io/openshift4/ose-operator-registry@sha256:a414f6308db72f88e9d2e95018f0cc4db71c6b12b2ec0f44587488f0a16efc42--to=registry.test.com:5000/olm/redhat-operators:v1

ERROR:
~~~
error: unable to read image registry.redhat.io/openshift4/ose-operator-registry@sha256:a414f6308db72f88e9d2e95018f0cc4db71c6b12b2ec0f44587488f0a16efc42: error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: "File not found.\""
~~~

-----------------------

Situation 2:

When we exclude this "--from" field from the command. It worked for them and OLM is up and running, post that they were able to build the operators as well.

~~~
# oc adm catalog build --appregistry-org redhat-operators --to=registry.test.com:5000/olm/redhat-operators:v1
~~~

The question here is, is it fine if we do not specify the registry.redhat.io source path?

-----------------------

Situation 3:

- When we pull the image with the major version tag, then pull works fine but if put minor version or it's SHA then it gives the same error.

podman or docker pull registry.redhat.io/openshift4/ose-operator-registry:v4.3.8 ---> Does not work

podman or docker pull registry.redhat.io/openshift4/ose-operator-registry@sha256:a414f6308db72f88e9d2e95018f0cc4db71c6b12b2ec0f44587488f0a16efc42 ---> Does not work

podman or docker pull registry.redhat.io/openshift4/ose-operator-registry:v4.3 ---> works
podman or docker pull registry.redhat.io/openshift4/ose-operator-registry:v4.4 ---> works
podman or docker pull registry.redhat.io/openshift4/ose-operator-registry:latest ---> works


Why there is such inconsistency when SHA is used?

What could be the best approach to build the catalog image for a disconnected environment?

Is podman mandatory on the registry host? It seems that most of the operations were successful with docker as well. The customer had some issues with Podman installation hence they moved ahead with docker. When I tested the same pull operation with podman and I faced the same error.


[0] https://docs.openshift.com/container-platform/4.3/operators/olm-restricted-networks.html#olm-understanding-operator-catalog-images_olm-restricted-networks

[1] https://access.redhat.com/articles/4975041 [ FAQ: Image SHA Digests vs. Image Tags ]



Version-Release number of selected component (if applicable):
OCP v4.3.8

How reproducible:
Frequently

Comment 2 Ben Luddy 2020-05-14 18:24:57 UTC
1 & 3) I don't know why, but there is apparently no image for registry.redhat.io/openshift4/ose-operator-registry with a v4.3.8 tag. It will take more time to discover why that's the case, but in the meantime, registry.redhat.io/openshift4/ose-operator-registry:v4.3.7 should be a good workaround as there were no changes in the registry between 4.3.7 and 4.3.8.

2) Providing a suitable base image via the --from flag is strongly recommended. By default, it will use the bleeding-edge tag quay.io/openshift/origin-operator-registry:latest. Eventually, this default may be selected more intelligently (e.g. based on the oc or cluster version), and that problem is being tracked elsewhere (https://bugzilla.redhat.com/show_bug.cgi?id=1827544).

As for the problem with the v4.3 tag no longer working, this may be due to the fact that v4.3 tracks the latest z-stream, and the most recent 4.3.z images are now multi-arch images. Did the error look something like this?

> error: unable to parse image registry.redhat.io/openshift4/ose-operator-registry:v4.3: unknown image manifest of type *manifestlist.DeserializedManifestList from manifest sha256:34588f4b73e97b6fb96fcdf01ecfc0986ebb51b7965b94e7aee88a184be2efdf

Take a look at the difference between v4.3 and v4.3.7:

$ docker manifest inspect registry.redhat.io/openshift4/ose-operator-registry:v4.3
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1371,
         "digest": "sha256:80a7c1a68adbba29e5c5dcf1f9b72b920b4fe39a5c743f2f2b4238baacca1acd",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1371,
         "digest": "sha256:e19d965f4c371e54c5ef7b6b8651abc2b5cd4a2a27819b99825a427efc3bb2af",
         "platform": {
            "architecture": "ppc64le",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1371,
         "digest": "sha256:7b41441fdafb07a6a2e0d49eb1db2ecf2b826f526c5338c9cc4d7cdf85b6711d",
         "platform": {
            "architecture": "s390x",
            "os": "linux"
         }
      }
   ]
}

$ docker manifest inspect registry.redhat.io/openshift4/ose-operator-registry:v4.3.7
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1371,
         "digest": "sha256:1506cdcc08ae29e1c25e9555aa13c191e38efea945b80e38a83f8641a441b73c",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      }
   ]
}

No workaround should be necessary if using the v4.3.7 base image, but the newer images can be used via the --filter-by-os flag to "oc adm catalog build". Values for --filter-by-os are regular expressions that will be used to match strings of the form "<platform>/<architecture>[/<variant>]", so if you wanted to base the catalog on the linux/amd64 image, you might pass --filter-by-os='amd64' or --filter-by-os='linux/amd64'.

I'll ask around to try to figure out what happened to the v4.3.8 tag. A quick check shows that other images are missing v4.3.8 tags too.

Comment 8 Baptiste.MILLEMATHIAS@amadeus.com 2020-05-25 14:21:53 UTC
Hello,

don't know if it's related but got the very same error message when trying to build an operator catalog, it used to work fine few weeks ago

I'm running a Openshift cluster 4.3.1 but with oc 4.3.20.

oc adm catalog build  -a $HOME/pull.secret.json  --appregistry-org redhat-operators --to=myrepo.corp.tld/olm/redhat-operators:${version} --from=registry.redhat.io/openshift4/ose-operator-registry:v4.3error: unable to parse image registry.redhat.io/openshift4/ose-operator-registry:v4.3: unknown image manifest of type *manifestlist.DeserializedManifestList from manifest sha256:9265fd170ee380ead9c232e55fb882b871290d83dd1352328c718059ee8b50aa

doing the same with registry.redhat.io/openshift4/ose-operator-registry:v4.3.7 for instance does not exhibit same problem. using 4.3.20 shows the same issue with the same digest.

So basically any person following procedure given on officiel RedHat OCP document will encouter the issue (this is told to use v4.3 which is v4.3.20)

Comment 9 Evan Cordell 2020-05-27 19:39:57 UTC
I have cloned this to https://bugzilla.redhat.com/show_bug.cgi?id=1840889 to track fixing the docs for this to indicate `--filter-by-os` needs to be used.

Also note that in 4.5+ versions of oc, this value is defaulted, so the error will not be seen in the common case.

Comment 10 Luke Meyer 2020-05-29 14:46:20 UTC
https://access.redhat.com/errata/RHBA-2020:0858 (release 4.3.8) shipped  operator-registry-container-v4.3.7-202003161611. The component versions/tags won't reliably match the release versions and in general should not be used; the shasums from the OperatorHub are the authoritative reference.

It seems like a bit of a gap to me that we can't discover from the advisory what the contents of the extras were. https://access.redhat.com/solutions/4919021 has only the payload containers.

Returning to OperatorHub to determine any further steps.

Comment 14 Filip Brychta 2020-06-17 10:35:26 UTC
It is also failing for v4.4 tag:
error: unable to parse image registry.redhat.io/openshift4/ose-operator-registry:v4.4: unknown image manifest of type *manifestlist.DeserializedManifestList from manifest sha256:b66567091a4d95f6d17a967540e54415fa7543582bc0b38ace90199009d08840

Comment 18 Evan Cordell 2020-07-08 15:44:42 UTC
> If that is not the case then is it fine if they build the catalog image just with this tag ose-operator-registry:v4.3 or ose-operator-registry:v4.4 and keep it the same throughout the other minor releases(X.Y.z)?

This process should be fine, since the latest z should contain only backwards-compatible bugfixes.

Also to note that newer versions of `oc` can accept a release image as input for `oc adm catalog` and will use that to select the base image from the same release. It is always overrideable.