Bug 1833207

Summary: openshift-marketplace: Pods found with invalid container images not present in release payload (v2)
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OperatorHub QA Contact: Jian Zhang <jiazha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: nhale, pamoedom, vdinh
Version: 4.5Keywords: UpcomingSprint
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1845644 (view as bug list) Environment:
Last Closed: 2020-10-27 15:58:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1845644    

Description W. Trevor King 2020-05-08 04:52:26 UTC
Description of problem:

Like bug 1821783, but since that is already CLOSED ERRATA, in a new bug.  I'm still seeing CI failures in the update-chain jobs due to:

  fail [github.com/openshift/origin/test/extended/operators/images.go:154]: May  7 07:06:24.907: Pods found with invalid container images not present in release payload: openshift-marketplace/certified-operators-847568b454-nsqsv/certified-operators image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0bbb83b21f43ec97f52d10fc9d088cd57e8a6c970ad8a699e718081734b415d1
openshift-marketplace/community-operators-ddbdbc749-jz75c/community-operators image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0bbb83b21f43ec97f52d10fc9d088cd57e8a6c970ad8a699e718081734b415d1

and similar.  CI search for the past 2d [1].  Examples jobs [2,3,4].  This mechanism is breaking at least the following test:

[sig-arch] Managed cluster should ensure pods use downstream images from our release image with proper ImagePullPolicy [Suite:openshift/conformance/parallel]

I dunno what happened with bug 1821783.  Seems like it went straight from NEW -> ON_QA without any code changes or stated motivation?  Can someone who understands OLM explain what's going on with these failures?

[1]: https://search.apps.build01.ci.devcluster.openshift.com/?name=upgrade&search=Pods%20found%20with%20invalid%20container%20images%20not%20present%20in%20release%20payload
[2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.2-to-4.3-to-4.4-to-4.5-ci/59
[3]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1-to-4.2-to-4.3-to-4.4-nightly/74
[4]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.3-to-4.4-to-4.5-to-4.6-ci/38

Comment 1 Evan Cordell 2020-05-08 13:06:33 UTC
> I dunno what happened with bug 1821783.  Seems like it went straight from NEW -> ON_QA without any code changes or stated motivation?  Can someone who understands OLM explain what's going on with these failures?

It looks like it was verified that, on a new install, the images are correct. I agree that this is an issue that needs to be investigated / addressed.


The image is specified here: https://github.com/operator-framework/operator-marketplace/blob/97ae9930ea7cfaa27248b2cecaf312d177b16629/manifests/09_operator.yaml#L45

and should be replaced during release because of: https://github.com/operator-framework/operator-marketplace/blob/97ae9930ea7cfaa27248b2cecaf312d177b16629/manifests/image-references#L9-L12

OperatorSource pods are reconciled on a timer, so I suspect that just waiting a bit longer would force marketplace to roll out an update with a new image. The fix should be that we explicitly check if the operatorsource pod has the `image` that marketplace is configured with.

Comment 8 Jian Zhang 2020-06-11 08:44:41 UTC
1, Cluster version is 4.6.0-0.nightly-2020-06-11-041445
marketplace-operator version contains the fixed PR:
[root@preserve-olm-env data]# oc exec marketplace-operator-5cf4488cfc-6b8t8 -- marketplace-operator --version
Marketplace source git commit: a00763fa951dad170d671eb6ddc69f8dcab13c6e
time="2020-06-11T08:40:20Z" level=info msg="Go Version: go1.13.4"
time="2020-06-11T08:40:20Z" level=info msg="Go OS/Arch: linux/amd64"
time="2020-06-11T08:40:20Z" level=info msg="operator-sdk Version: v0.8.0"

2, check these OperatorSource images if are downstream.

[root@preserve-olm-env data]# oc get pods
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-68487fcd8d-5vl58    1/1     Running   0          45m
community-operators-6b8c84d5c5-bl54c    1/1     Running   0          44m
marketplace-operator-5cf4488cfc-6b8t8   1/1     Running   0          45m
redhat-marketplace-fb6b559c5-n4wh6      1/1     Running   0          44m
redhat-operators-78645b7dbb-pb4t2       1/1     Running   0          45m

[root@preserve-olm-env data]# oc exec certified-operators-68487fcd8d-5vl58  -- cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

[root@preserve-olm-env data]# oc exec community-operators-6b8c84d5c5-bl54c -- cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

[root@preserve-olm-env data]# oc exec redhat-operators-78645b7dbb-pb4t2  -- cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

[root@preserve-olm-env data]# oc exec redhat-marketplace-fb6b559c5-n4wh6   -- cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

[root@preserve-olm-env data]# oc exec marketplace-operator-5cf4488cfc-6b8t8    -- cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

All of them are downstream, LGTM, verify it.

Comment 11 errata-xmlrpc 2020-10-27 15:58:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196