Bug 1888738 - quay.io/openshift/origin-must-gather:latest is not a multi-arch, manifest-list image
Summary: quay.io/openshift/origin-must-gather:latest is not a multi-arch, manifest-lis...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.5
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.7.0
Assignee: Maciej Szulik
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-15 16:02 UTC by Jeremy Poulin
Modified: 2021-02-24 15:26 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:26:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oc pull 627 0 None closed Bug 1888738: fall-back must-gather to official RH supported image 2021-02-11 20:24:55 UTC
Github openshift release pull 13167 0 None closed Bug 1888738: mirror images, keeping manifest lists 2021-02-11 20:24:55 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:26:53 UTC

Description Jeremy Poulin 2020-10-15 16:02:16 UTC
Description of problem:
One thing we discovered when testing disconnected upgrade from 4.5 to 4.6 is that running oc adm must-gather actually tried running the x86_64 must gather image.

This appears to be an issue with mirroring, since the must-gather uses the correct image when run in an a "connected" install environment.

See https://bugzilla.redhat.com/show_bug.cgi?id=1888065#c5 for the relevant log.

Comment 1 W. Trevor King 2020-10-15 17:05:47 UTC
> This appears to be an issue with mirroring...

Can you post complete 'oc adm release mirror ...' and other steps you used to mirror the release image, as well as an 'oc adm release info --pullspecs $MIRRORED_PULLSPEC'?

Comment 2 Jeremy Poulin 2020-10-15 17:57:59 UTC
Hi Kyle, can you provide the information requested in the comment above.

Thanks!

Comment 3 Prashanth Sundararaman 2020-10-15 17:59:44 UTC
on talking to the power team which did disconnected installs, looks like the image being fetched is from origin:

# oc adm must-gather
[must-gather      ] OUT unable to resolve the imagestream tag openshift/must-gather:latest
[must-gather      ] OUT
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest
[must-gather      ] OUT namespace/openshift-must-gather-2x2kl created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-kwwxc created
[must-gather      ] OUT pod for plug-in image quay.io/openshift/origin-must-gather:latest created
[must-gather-29kz6] OUT gather did not start: unable to pull image: ImagePullBackOff: Back-off pulling image "quay.io/openshift/origin-must-gather:latest"
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-kwwxc deleted
[must-gather      ] OUT namespace/openshift-must-gather-2x2kl deleted
error: gather did not start for pod must-gather-29kz6: unable to pull image: ImagePullBackOff: Back-off pulling image "quay.io/openshift/origin-must-gather:latest"

It looks like the pullspecs for must-gather are still pointing to quay.io eventhough the images are mirrored? is it similar to https://bugzilla.redhat.com/show_bug.cgi?id=1777890 ?

Comment 4 Jeremy Poulin 2020-10-15 18:04:49 UTC
Actually, upon further investigation, Psundara got to the bottom of the issue:
https://coreos.slack.com/archives/CFFJUNP6C/p1602766832331400?thread_ts=1602699361.317000&cid=CFFJUNP6C

To quote that thread:
> ok..looks like this is a known issue...not sure if there is any fix for this - the problem is that the pullspecs point to quay.io even in the disconnected case:
> https://coreos.slack.com/archives/CKXD7GR9B/p1586256571049400?thread_ts=1586205790.048700&cid=CKXD7GR9B

> https://bugzilla.redhat.com/show_bug.cgi?id=1777890 - is marked as won't fix which is similar to must-gather

I think this may be a documentation issue then.
I think we need docs to specify that in a disconnected install on P/Z, you need to run with the image flag override, or else the quay.io image is tried, and then the fallback origin x86 image is used.

# oc adm must-gather --image=registry.alisha-46.example.com:5000/ocp4/openshift4@sha256:66b418fb1aad1fc264c0e549b4f89e2fe44aa32522229d759215632b56afb43c
[must-gather      ] OUT Using must-gather plugin-in image: registry.alisha-46.example.com:5000/ocp4/openshift4@sha256:66b418fb1aad1fc264c0e549b4f89e2fe44aa32522229d759215632b56afb43c
[must-gather      ] OUT namespace/openshift-must-gather-hv777 created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-nm5q2 created
[must-gather      ] OUT pod for plug-in image registry.alisha-46.example.com:5000/ocp4/openshift4@sha256:66b418fb1aad1fc264c0e549b4f89e2fe44aa32522229d759215632b56afb43c created
[must-gather-dptwj] POD Wrote inspect data to must-gather.
[must-gather-dptwj] POD Gathering data for ns/openshift-cluster-version...
[must-gather-dptwj] POD Wrote inspect data to must-gather.

Comment 5 W. Trevor King 2020-10-15 18:29:02 UTC
Actually the linked comment has:

[must-gather      ] OUT unable to resolve the imagestream tag openshift/must-gather:latest
[must-gather      ] OUT
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest

That looks like it's not a multi-arch, manifest-list image.

$ curl -sIH 'Accept: application/vnd.oci.image.index.v1+json;q=1, application/vnd.docker.distribution.manifest.list.v2+json;q=0.9, *;q=0.1' https://quay.io/v2/openshift/origin-must-gather/manifests/latest | grep Content-Type
Content-Type: application/vnd.docker.distribution.manifest.v1+json

We should probably publish it as a manifest list, but it means that the problem is unlikely to be a mirroring issue.  The fact that your cluster fell back to the canonical image suggests you are not actually in a restricted network.  Or maybe you are, and have an ImageContentSourcePolicy in place for quay.io/openshift/origin-must-gather?  Also in this space is bug 1823839.

Comment 6 Maciej Szulik 2020-10-19 10:27:56 UTC
This is still case where you don't have must-gather mirrored. IIRC docs have a BZ I can't find atm. Specifically:

[must-gather      ] OUT unable to resolve the imagestream tag openshift/must-gather:latest

claims you don't have that in your local image streams, which is the simplest possible approach for you to pursuit.

Comment 8 W. Trevor King 2020-10-19 17:42:46 UTC
> This is still case where you don't have must-gather mirrored.

I'm pretty sure the must-gather is mirrored, but they are bumping into bug 1823839.  I agree that they should be able to used the local ImageStream.  But that's orthogonal to this bug, which is about the fallback image being amd64-only.

Comment 9 Maciej Szulik 2020-10-20 08:59:09 UTC
(In reply to W. Trevor King from comment #8)
> > This is still case where you don't have must-gather mirrored.
> 
> I'm pretty sure the must-gather is mirrored, but they are bumping into bug
> 1823839.  I agree that they should be able to used the local ImageStream. 
> But that's orthogonal to this bug, which is about the fallback image being
> amd64-only.

If we're talking about published images, I'm sending this over to ART team which is responsible for publishing artifacts externally.

Comment 11 W. Trevor King 2020-10-24 03:44:04 UTC
The backing 'oc' code does attempt to find the release-image-referenced must-gather via an ImageStream, but that is currently unreliable (bug 1823839).  ART builds official images for must-gather today for each arch, right?  Can't we wrap those up in a multi-arch, manifest-list image and push that somewhere standardized?  If not, I think oc should drop the central-pullspec fallback.

Comment 12 Maciej Szulik 2020-10-26 09:14:42 UTC
(In reply to W. Trevor King from comment #11)
> The backing 'oc' code does attempt to find the release-image-referenced
> must-gather via an ImageStream, but that is currently unreliable (bug
> 1823839).  ART builds official images for must-gather today for each arch,
> right?  Can't we wrap those up in a multi-arch, manifest-list image and push
> that somewhere standardized?  If not, I think oc should drop the
> central-pullspec fallback.

Before dropping those from oc, or replacing with a supported image, like registry.redhat.io/rhel8/support-tools from oc debug, for example.
Lemme first try to figure out what's the status of quay.io/openshift/* images.

Comment 13 W. Trevor King 2020-10-27 20:04:35 UTC
Bot moved this to post, but links no PR?  Manually linking the PR it seemed to be considering, although my concern in this bug is with the quay.io/openshift fallback image, which is orthogonal to anything that folks may be mirroring as part of the release image (release images and the images they reference are hosted under quay.io/openshift-release-dev).

Comment 14 Maciej Szulik 2020-10-28 09:20:45 UTC
Trevor, using that quay.io/openshift/must-gather image is the last resort, if everything else fails. 
I prefer to do it that way than having a hard fail when we can't find anything. At most we can
be more verbose about which image we're picking, but from what I see from must-gather logs
it's one of the first elements we print.

Comment 15 W. Trevor King 2020-10-28 14:04:41 UTC
I agree, and this bug is about that hard-fail-avoidance fallback.  Issues that lead us to the fallback, like 1823839, should get separate bugs.  This one is about ensuring that when we hit the fallback, we get an image that works for all of our arches and which has been signed with an official RH key.

Comment 16 Jeremy Poulin 2020-10-28 18:30:11 UTC
I don't have an environment from which to duplicate this issue.
If there is still information being sought for this, please need info krmoser.com.

Comment 22 errata-xmlrpc 2021-02-24 15:26:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.