Bug 1877835

Summary: Red Hat Operators Production Index-Image is not getting refreshed when new content is available
Product: OpenShift Container Platform Reporter: Oren Cohen <ocohen>
Component: OLMAssignee: Daniel Sover <dsover>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: dsover, ecordell, krizza, ocohen, stirabos
Version: 4.5   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:39:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1878359    

Description Oren Cohen 2020-09-10 14:28:44 UTC
Description of problem:
When an OCP 4.5 cluster is configured with a catalog source pointed to the production's index-image:
registry.redhat.io/redhat/redhat-operator-index:v4.5
It doesn't get content updates (e.g. new CSV version) when the floating image is being updated.

Furthermore, deleting and recreating the catalog source does not update the packagemanifest with the most up-to-date content.

Reason: The index image with the specified tag (v4.5) already exist on the cluster and is not pulled, even if a newer image exists on the registry server.

workaround: delete the old image from the node itself, using "crictl rmi ..."

Version-Release number of selected component (if applicable):
OCP 4.5.7
OLM 0.15.1

How reproducible:
reproduced on at least 4 clusters during the release of Openshift Virtualization 2.4.1

Steps to Reproduce:
1. described above
2.
3.

Actual results:
the production index image is not getting pulled when new content is available in the registry server.

Expected results:
the production index image is getting pulled automatically and processed by OLM when new content is available on the registry server.

Additional info:

Comment 2 Simone Tiraboschi 2020-09-10 15:17:00 UTC
Please be aware that we created a custom catalog source to consume that index image on OCP 4.5 and we didn't specified any UpdateStrategy on it:
https://github.com/operator-framework/operator-marketplace/blob/master/pkg/apis/olm/v1alpha1/catalogsource_types.go#L80-L83

Maybe the issue is just the lack of a sane default there.

Comment 10 Jian Zhang 2020-09-14 07:51:07 UTC
Cluster version is 4.6.0-0.nightly-2020-09-12-230035, which contains the fixed PR.
[root@preserve-olm-env data]# oc -n openshift-operator-lifecycle-manager exec catalog-operator-694676c897-8rdgt -- olm --version
OLM version: 0.16.1
git commit: 6d26c16166b232561132985e1132fce4b4d36532

1, The CatalogSource(tag) pod's imagePullPolicy is Always.
[root@preserve-olm-env data]# oc get pods redhat-operators-k9tlq -o yaml|grep imagePullPolicy
            f:imagePullPolicy: {}
    imagePullPolicy: Always

[root@preserve-olm-env data]# oc get catalogsource redhat-operators -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
...
spec:
  displayName: Red Hat Operators
  icon:
    base64data: ""
    mediatype: ""
  image: registry.redhat.io/redhat/redhat-operator-index:v4.6
  priority: -100
  publisher: Red Hat
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 10m0s

2, And, create a CatalogSource with a digest image. Its pod's imagePullPolicy is IfNotPresent.

[root@preserve-olm-env data]# cat cs-etcd.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: etcd2-test
  namespace: openshift-marketplace
spec:
  displayName: Jian Test
  publisher: Jian
  sourceType: grpc
  image: quay.io/olmqe/etcd-index@sha256:ee23a1fd8a76e1ed95219577fe764c843ae932735181f26d7d75ae268c13526e
  updateStrategy:
    registryPoll:
      interval: 10m

[root@preserve-olm-env data]# oc create -f cs-etcd.yaml 
catalogsource.operators.coreos.com/etcd2-test created

[root@preserve-olm-env data]# oc get pods
NAME                                    READY   STATUS    RESTARTS   AGE
...
etcd2-test-dwqjd                        1/1     Running   0          32s

[root@preserve-olm-env data]# oc get pods etcd2-test-dwqjd -o yaml|grep imagePullPolicy
            f:imagePullPolicy: {}
    imagePullPolicy: IfNotPresent

LGTM, verify it.

Comment 13 errata-xmlrpc 2020-10-27 16:39:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196