1845644 – openshift-marketplace: Pods found with invalid container images not present in release payload (v2)

Bug 1845644 - openshift-marketplace: Pods found with invalid container images not present in release payload (v2)

Summary: openshift-marketplace: Pods found with invalid container images not present i...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	OLM
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Kevin Rizza
QA Contact:	Salvatore Colangelo
Docs Contact:
URL:
Whiteboard:
Depends On:	1833207
Blocks:	1847740
TreeView+	depends on / blocked

Reported:	2020-06-09 17:54 UTC by Ben Parees
Modified:	2020-07-13 17:44 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1833207
Environment:
Last Closed:	2020-07-13 17:43:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	operator-framework operator-marketplace pull 314	0	None	closed	[release-4.5] Bug 1845644: Ensure correct registry image	2020-10-27 12:33:42 UTC
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:44:06 UTC

Description Ben Parees 2020-06-09 17:54:01 UTC

+++ This bug was initially created as a clone of Bug #1833207 +++

Description of problem:

Like bug 1821783, but since that is already CLOSED ERRATA, in a new bug.  I'm still seeing CI failures in the update-chain jobs due to:

  fail [github.com/openshift/origin/test/extended/operators/images.go:154]: May  7 07:06:24.907: Pods found with invalid container images not present in release payload: openshift-marketplace/certified-operators-847568b454-nsqsv/certified-operators image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0bbb83b21f43ec97f52d10fc9d088cd57e8a6c970ad8a699e718081734b415d1
openshift-marketplace/community-operators-ddbdbc749-jz75c/community-operators image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0bbb83b21f43ec97f52d10fc9d088cd57e8a6c970ad8a699e718081734b415d1

and similar.  CI search for the past 2d [1].  Examples jobs [2,3,4].  This mechanism is breaking at least the following test:

[sig-arch] Managed cluster should ensure pods use downstream images from our release image with proper ImagePullPolicy [Suite:openshift/conformance/parallel]

I dunno what happened with bug 1821783.  Seems like it went straight from NEW -> ON_QA without any code changes or stated motivation?  Can someone who understands OLM explain what's going on with these failures?

[1]: https://search.apps.build01.ci.devcluster.openshift.com/?name=upgrade&search=Pods%20found%20with%20invalid%20container%20images%20not%20present%20in%20release%20payload
[2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.2-to-4.3-to-4.4-to-4.5-ci/59
[3]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1-to-4.2-to-4.3-to-4.4-nightly/74
[4]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.3-to-4.4-to-4.5-to-4.6-ci/38

--- Additional comment from Evan Cordell on 2020-05-08 13:06:33 UTC ---

> I dunno what happened with bug 1821783.  Seems like it went straight from NEW -> ON_QA without any code changes or stated motivation?  Can someone who understands OLM explain what's going on with these failures?

It looks like it was verified that, on a new install, the images are correct. I agree that this is an issue that needs to be investigated / addressed.


The image is specified here: https://github.com/operator-framework/operator-marketplace/blob/97ae9930ea7cfaa27248b2cecaf312d177b16629/manifests/09_operator.yaml#L45

and should be replaced during release because of: https://github.com/operator-framework/operator-marketplace/blob/97ae9930ea7cfaa27248b2cecaf312d177b16629/manifests/image-references#L9-L12

OperatorSource pods are reconciled on a timer, so I suspect that just waiting a bit longer would force marketplace to roll out an update with a new image. The fix should be that we explicitly check if the operatorsource pod has the `image` that marketplace is configured with.

--- Additional comment from Eric Paris on 2020-05-08 14:31:30 UTC ---

This bug has been set to target the 4.5.0 release without specifying a severity. As part of triage when determining the importance of bugs a severity should be specified. Since these bugs have not been properly triaged we are removing the target release. Teams will need to add a severity before deferring these bugs again.

--- Additional comment from Evan Cordell on 2020-05-18 20:21:22 UTC ---

Since this issue should reconcile itself in real clusters, it is not a blocker. Moving to 4.6

--- Additional comment from Nick Hale on 2020-06-02 17:31:13 UTC ---

Marking for upcoming sprint.

Comment 1 Ben Parees 2020-06-09 17:55:17 UTC

cloning this to 4.5 and marking high, it is breaking our 4.2->4.5 upgrade CI testing.

https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.2-to-4.3-to-4.4-to-4.5-ci/91


[sig-arch] Managed cluster should ensure pods use downstream images from our release image with proper ImagePullPolicy [Suite:openshift/conformance/parallel] expand_less 	4m22s
fail [github.com/openshift/origin/test/extended/operators/images.go:154]: Jun  8 07:55:57.781: Pods found with invalid container images not present in release payload: openshift-marketplace/certified-operators-745b8b4946-mbv8d/certified-operators image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eb18cb01725f9a26a5bf851bb2e7a91588e70876b18f73854eaaa953aab7f6ad
openshift-marketplace/community-operators-fd8b68876-kkclh/community-operators image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eb18cb01725f9a26a5bf851bb2e7a91588e70876b18f73854eaaa953aab7f6ad
openshift-marketplace/redhat-marketplace-6b9b964456-b657m/redhat-marketplace image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2569a3f79728be9174d720159e3dc0d2c6127b6961c6c8ea00f7187310bf8348
openshift-marketplace/redhat-operators-86dbd57d46-c6h4f/redhat-operators image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eb18cb01725f9a26a5bf851bb2e7a91588e70876b18f73854eaaa953aab7f6ad

Comment 7 Salvatore Colangelo 2020-06-16 09:41:43 UTC

1, Cluster version is 4.5.0-0.nightly-2020-06-15-194331
 
[scolange@scolange ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-06-15-194331   True        False         6h26m   Cluster version is 4.5.0-0.nightly-2020-06-15-194331


marketplace-operator version contains the fixed PR:
[scolange@scolange ~]$ oc exec marketplace-operator-7b46798c6c-smw7v -- marketplace-operator --version
time="2020-06-16T09:20:48Z" level=info msg="Go Version: go1.13.4"
time="2020-06-16T09:20:48Z" level=info msg="Go OS/Arch: linux/amd64"
time="2020-06-16T09:20:48Z" level=info msg="operator-sdk Version: v0.8.0"
Marketplace source git commit: b1ba97ff4cf3999fd8fcdc2c97700d5291dca1f0

2, check these OperatorSource images if are downstream.

[scolange@scolange ~]$ oc get pod
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-7bc7ccb7f8-hmllm    1/1     Running   0          6h37m
community-operators-87bbf6587-sx7wj     1/1     Running   0          6h37m
marketplace-operator-7b46798c6c-smw7v   1/1     Running   0          6h37m
qe-app-registry-6cb6579b66-56dpp        1/1     Running   0          6h27m
redhat-marketplace-9587c6c97-pgpss      1/1     Running   0          6h37m
redhat-operators-76895bb4bc-6prxh       1/1     Running   0          6h37m


[scolange@scolange ~]$ oc exec certified-operators-7bc7ccb7f8-hmllm -- cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

[scolange@scolange ~]$ oc exec community-operators-87bbf6587-sx7wj -- cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

[scolange@scolange ~]$ oc exec marketplace-operator-7b46798c6c-smw7v -- cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

[scolange@scolange ~]$ oc exec redhat-marketplace-9587c6c97-pgpss -- cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

[scolange@scolange ~]$ oc exec redhat-operators-76895bb4bc-6prxh -- cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)


All of them are downstream, LGTM, verify it.

Comment 8 errata-xmlrpc 2020-07-13 17:43:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.