Bug 1868445 - Operator registry images not available for ppc64le/s390x
Summary: Operator registry images not available for ppc64le/s390x
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.6.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard: multi-arch
Depends On:
Blocks: 1850692 ocp-46-z-tracker
TreeView+ depends on / blocked
 
Reported: 2020-08-12 17:55 UTC by Prashanth Sundararaman
Modified: 2021-08-02 18:25 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-08 11:08:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Prashanth Sundararaman 2020-08-12 17:55:09 UTC
Description of problem:
After deploying a 4.6 cluster on ppc64le noticed that the marketplace operator was pulling in x86 images for:

-certified operator
-community operator
-redhat marketplace
-redhat operators

This seems to be related to a recent change in moving from OperatorSources to CatalogSources: https://github.com/operator-framework/operator-marketplace/pull/318

Comment 2 Luke Meyer 2020-08-20 15:48:06 UTC
It seems unlikely anyone is building and shipping multiarch operators other than ART, and we haven't shipped any for 4.6 of course. Are these ART-built operators from previous versions, or others that are creating operators?

Other operators are probably just not even thinking about MA support. Do they have metadata to signal that it is/isn't supposed to be used on specific arches?

A specific example operator that's a problem might help here.

Comment 3 Jeremy Poulin 2020-08-20 21:21:49 UTC
I can't speak for Prashanth's use case, but I ran into this just now when doing the image-stream verification:
Aug 20 20:19:00.979 W ns/openshift-marketplace pod/redhat-operators-wz5ql node/jpoulin-ocp-cg9qx-worker-0-27gps reason/Failed Failed to pull image "registry.redhat.io/redhat/redhat-operator-index:v4.6": rpc error: code = Unknown desc = Error choosing image instance: no image found in manifest list for architecture s390x, variant "", OS linux

It's weird to me that we need to push something like nfd, logging, elastic etc in order to get the operator index manifested. Shouldn't pull the index above and just provide us an empty list of operators?

Comment 4 Luke Meyer 2020-08-21 20:47:08 UTC
I don't _think_ there are going to be different bundles published for different arches (we release a bundle with shasums that point at manifest lists so they're arch-agnostic). But that does raise the question of how OLM is going to know to filter out ones that aren't relevant for a given arch. It does seem like OLM ought to show an empty catalog rather than x86 content.

If it's waiting on ART to publish something this will just sit until 4.6 is GA. I'll pass it to OLM for comment.

Comment 5 Jeremy Poulin 2020-08-21 21:14:19 UTC
Working marketplace images seems to be a blocker to the image-ecosystem CI tests, which is a blocker for 4.6 on P/Z. Ideally we can solve this before GA. In the meantime I will investigate the failing tests to see to what degree the dependencies are coupled.

The key thing is that at this point, it looks like we need non-x86 images for these images in the openshift-marketplace namespace.
-certified operator
-community operator
-redhat marketplace
-redhat operators

I image those multi-arch images would discover that there is no *content* in the form of the SHA bundles provided by ART. But right now, the lack of images at all is inhibit progress on key deliverables, so hopefully there is a pre-GA solution.

Comment 6 Prashanth Sundararaman 2020-08-24 12:48:20 UTC
Ok - just to clarify a bit more. For  the certified operators, redhat market place and redhat operators I am seeing that no non x86 images are available:

[core@psundara-ocp-bspmb-master-0 ~]$ podman pull registry.redhat.io/redhat/certified-operator-index:v4.6
Trying to pull registry.redhat.io/redhat/certified-operator-index:v4.6...
  no image found in manifest list for architecture ppc64le, variant "", OS linux
Error: error pulling image "registry.redhat.io/redhat/certified-operator-index:v4.6": unable to pull registry.redhat.io/redhat/certified-operator-index:v4.6: unable to pull image: Error choosing an image from manifest list docker://registry.redhat.io/redhat/certified-operator-index:v4.6: no image found in manifest list for architecture ppc64le, variant "", OS linux


For the community operators I am seeing an exec format error meaning it is fetching an x86 image:

[psundara@ibm-p8-14 ~]$ ./oc logs community-operators-k9j52 -n openshift-marketplace
standard_init_linux.go:210: exec user process caused "exec format error"

Comment 10 Holger Wolf 2020-09-08 08:33:15 UTC
This blocks Z Testing for OCP 4.6 and we need to understand how the operators will be provided to first test and later for our customer. 
This is a regressions from 4.3 timeframe on.

Comment 11 Yaakov Selkowitz 2020-09-08 11:08:19 UTC
Holger, this is a public bug, but the required resources are only available internally prior to GA, and therefore the answer was given in a private comment.  Any partner engineers that haven't already received the answer should reach out through the usual channels to be directed accordingly.

Comment 12 Vance 2021-07-22 01:06:13 UTC
I'm seeing this in OCP 4.7.

The community-operators pod in the openshift-marketplace namespace is failing with:

standard_init_linux.go:219: exec user process caused: exec format error

Comment 13 Yaakov Selkowitz 2021-07-22 01:49:34 UTC
(In reply to Vance from comment #12)
> I'm seeing this in OCP 4.7.
> 
> The community-operators pod in the openshift-marketplace namespace is
> failing with:
> 
> standard_init_linux.go:219: exec user process caused: exec format error

Indeed, but please open a new BZ (and CC me on it).

Comment 14 René 2021-07-22 07:34:41 UTC
Hi,

same issue here. Please provide the new BZ id here.

Best Regards
René

Comment 15 Vance 2021-08-02 18:25:26 UTC
It is running okay now, I will defer any new BZ, thanks.


Note You need to log in before you can comment on or make changes to this bug.