Bug 1881584 - remove update check from previous polling implementation
Summary: remove update check from previous polling implementation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Daniel Sover
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks: 1882444
TreeView+ depends on / blocked
 
Reported: 2020-09-22 17:52 UTC by Daniel Sover
Modified: 2020-10-27 16:44 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1882444 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:44:46 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1773 0 None closed Bug 1881584: fix check from previous polling implementation 2020-11-25 09:34:13 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:44:48 UTC

Comment 2 Jian Zhang 2020-09-25 11:03:48 UTC

[root@preserve-olm-env data]# cat cs-poll-test.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: poll-test
  namespace: openshift-marketplace
spec:
  displayName: Jian Test
  publisher: Jian
  sourceType: grpc
  image: quay.io/olmqe/etcd-index:0.9.2
  poll: 10m

[root@preserve-olm-env data]# oc create -f cs-poll-test.yaml 
catalogsource.operators.coreos.com/poll-test created
[root@preserve-olm-env data]# oc get catalogsource
NAME                  DISPLAY                TYPE   PUBLISHER      AGE
certified-operators   Certified Operators    grpc   Red Hat        3h39m
community-operators   Community Operators    grpc   Red Hat        3h39m
poll-test             Jian Test              grpc   Jian           4s
qe-app-registry       Production Operators   grpc   OpenShift QE   3h21m
redhat-marketplace    Red Hat Marketplace    grpc   Red Hat        3h39m
redhat-operators      Red Hat Operators      grpc   Red Hat        3h39m
[root@preserve-olm-env data]# 
[root@preserve-olm-env data]# 
[root@preserve-olm-env data]# oc get packagemanifest|grep Jian
etcd                                        Jian Test              29s

[root@preserve-olm-env data]# oc get pods
NAME                                                              READY   STATUS      RESTARTS   AGE
24c6f4b3f063ca9db570182fb4c4784b9b9390003ed9a31ea4a0003270xw4zk   0/1     Completed   0          112m
cc8c72b382bd1ac5ca006964332b9f198745bc52b7851422b254d4dcc5qjc9v   0/1     Completed   0          111m
certified-operators-7sws2                                         1/1     Running     0          3h19m
community-operators-bpvld                                         1/1     Running     0          3h19m
f1ee3a8dd6fef253f3083a53de9a2144b4678abf006c49106c4d8085d48nlpk   0/1     Completed   0          112m
marketplace-operator-66c9f5b5ff-5qjp8                             1/1     Running     0          3h17m
poll-test-bjp5t                                                   1/1     Running     0          33s
qe-app-registry-frw4n                                             1/1     Running     0          91m
redhat-marketplace-sln8r                                          1/1     Running     0          3h19m
redhat-operators-dw5mg                                            1/1     Running     0          3h19m

[root@preserve-olm-env data]# oc get pods poll-test-bjp5t -o yaml |grep image
            f:image: {}
            f:imagePullPolicy: {}
  - image: quay.io/olmqe/etcd-index:0.9.2
    imagePullPolicy: Always
  imagePullSecrets:
    image: quay.io/olmqe/etcd-index:0.9.2
    imageID: quay.io/olmqe/etcd-index@sha256:ee23a1fd8a76e1ed95219577fe764c843ae932735181f26d7d75ae268c13526e

Build and Push a new Index image.

[root@preserve-olm-env data]# opm index add --overwrite-latest -b quay.io/olmqe/etcd-bundle:0.9.2-share -f quay.io/olmqe/etcd-index:0.9.2 -t quay.io/olmqe/etcd-index:0.9.2 -c docker
INFO[0000] building the index                            bundles="[quay.io/olmqe/etcd-bundle:0.9.2-share]"
...

[root@preserve-olm-env data]# podman push quay.io/olmqe/etcd-index:0.9.2
Getting image source signatures
Copying blob 87b4d8523d6d done  
Copying blob 5a2855009e89 done  
Copying blob b5c842f44e0c done  
Copying blob 4150c4f2e6df skipped: already exists  
Copying blob 50644c29ef5a skipped: already exists  
Copying config 15ec4c23e7 done  
Writing manifest to image destination
Writing manifest to image destination
Storing signatures

But, this CatalogSource pod(poll-test-bjp5t) wasn't updated after about 20 mins.

mac:~ jianzhang$ date
Fri Sep 25 18:42:35 CST 2020
mac:~ jianzhang$ oc get pods
NAME                                                              READY   STATUS      RESTARTS   AGE
24c6f4b3f063ca9db570182fb4c4784b9b9390003ed9a31ea4a0003270xw4zk   0/1     Completed   0          142m
cc8c72b382bd1ac5ca006964332b9f198745bc52b7851422b254d4dcc5qjc9v   0/1     Completed   0          142m
certified-operators-7sws2                                         1/1     Running     0          3h50m
community-operators-bpvld                                         1/1     Running     0          3h50m
f1ee3a8dd6fef253f3083a53de9a2144b4678abf006c49106c4d8085d48nlpk   0/1     Completed   0          143m
marketplace-operator-66c9f5b5ff-5qjp8                             1/1     Running     0          3h47m
poll-test-bjp5t                                                   1/1     Running     0          31m
qe-app-registry-frw4n                                             1/1     Running     0          122m
redhat-marketplace-sln8r                                          1/1     Running     0          3h50m
redhat-operators-dw5mg                                            1/1     Running     0          3h50m


mac:~ jianzhang$ oc get pods poll-test-bjp5t -o yaml|grep image
            f:image: {}
            f:imagePullPolicy: {}
  - image: quay.io/olmqe/etcd-index:0.9.2
    imagePullPolicy: Always
  imagePullSecrets:
    image: quay.io/olmqe/etcd-index:0.9.2
    imageID: quay.io/olmqe/etcd-index@sha256:ee23a1fd8a76e1ed95219577fe764c843ae932735181f26d7d75ae268c13526e

mac:~ jianzhang$ oc get packagemanifest|grep Jian
etcd                                        Jian Test              32m

mac:~ jianzhang$ date
Fri Sep 25 19:03:01 CST 2020

Comment 3 Evan Cordell 2020-09-25 18:46:50 UTC
It looks like the spec for the test CatalogSource is a little off:

```
spec:
  displayName: Jian Test
  publisher: Jian
  sourceType: grpc
  image: quay.io/olmqe/etcd-index:0.9.2
  poll: 10m
```

the field `poll` is not recognized by OLM. See the CatalogSource spec here:

https://docs.openshift.com/container-platform/4.5/rest_api/operatorhub_apis/catalogsource-operators-coreos-com-v1alpha1.html#specification

It should be:

```
spec:
  updateStrategy:
    registryPoll:
      interval: 10m
```

Comment 5 Jian Zhang 2020-09-27 02:38:30 UTC
Hi Evan,

Yes, sorry, my mistake, I used an old one. 


mac:~ jianzhang$ cat cs-poll.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: poll-test
spec:
  displayName: Jian Test
  sourceType: grpc
  image: quay.io/olmqe/etcd-index:0.9.2
  updateStrategy:
    registryPoll:
      interval: 10m

mac:~ jianzhang$ oc get catalogsource poll-test -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
...
spec:
  displayName: Jian Test
  image: quay.io/olmqe/etcd-index:0.9.2
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 10m

mac:~ jianzhang$ oc get pods poll-test-265f4 -o yaml|grep image
            f:image: {}
            f:imagePullPolicy: {}
  - image: quay.io/olmqe/etcd-index:0.9.2
    imagePullPolicy: Always
  imagePullSecrets:
    image: quay.io/olmqe/etcd-index:0.9.2
    imageID: quay.io/olmqe/etcd-index@sha256:5c78b3e89530d8ad13f07d5b0f91d89caea803554aaad89e72cb2a4aa2321b48

mac:~ jianzhang$ date
Sun Sep 27 09:49:18 CST 2020
mac:~ jianzhang$ oc get packagemanifest|grep Jian



Build and Push a new Index image.
[root@preserve-olm-env data]# opm index add --overwrite-latest -b quay.io/olmqe/etcd-bundle:0.9.2 -f quay.io/olmqe/etcd-index:0.9.2 -t quay.io/olmqe/etcd-index:0.9.2 -c podman
INFO[0000] building the index                            bundles="[quay.io/olmqe/etcd-bundle:0.9.2]"
...

[root@preserve-olm-env data]# podman push quay.io/olmqe/etcd-index:0.9.2
Getting image source signatures
...


The CatalogSource image updated, and we can get the etcd operator successfully.
mac:~ jianzhang$ oc get pods poll-test-mx7h4 -o yaml|grep image
            f:image: {}
            f:imagePullPolicy: {}
        f:imagePullSecrets:
            f:image: {}
            f:imagePullPolicy: {}
  - image: quay.io/olmqe/etcd-index:0.9.2
    imagePullPolicy: Always
  imagePullSecrets:
    image: quay.io/olmqe/etcd-index:0.9.2
    imageID: quay.io/olmqe/etcd-index@sha256:713031208758396ae94f727f53bdddae17be3662979f9203e6f0f066fff6372f

mac:~ jianzhang$ oc get packagemanifest|grep Jian
etcd                                        Jian Test              33m

But, there is a problem, the digest of the quay.io/olmqe/etcd-index:0.9.2 is f2086851f1bdd1f13137313ae0de4e45022a79f4b3a3f36f43adcb2386c12f95, not the 713031208758396ae94f727f53bdddae17be3662979f9203e6f0f066fff6372f, see the screenshot: https://user-images.githubusercontent.com/15416633/94354409-3a286480-00ad-11eb-8d0a-ad16cd920e93.png
I'm not sure where it's from. 

[root@preserve-olm-env data]# docker pull quay.io/olmqe/etcd-index@sha256:713031208758396ae94f727f53bdddae17be3662979f9203e6f0f066fff6372f
Error response from daemon: manifest for quay.io/olmqe/etcd-index@sha256:713031208758396ae94f727f53bdddae17be3662979f9203e6f0f066fff6372f not found: manifest unknown: manifest unknown

Comment 6 Jian Zhang 2020-09-28 07:55:59 UTC
Hi Evan, Daniel

Now, for the CatalogSource updates design, we compare the "pod.Status.ContainerStatuses[0].ImageID" of the serving pod and the update pod. Once meet the Polling interval, we create the update pod and then check its imageID. But, creating pod taking much time, that means, when specify the "interval: 10m", in fact, it will update around 12 mins later. It's not exactly. That is not precise for the user. I'm not sure why we design it as this, why not check the digest of the Index image directly instead of creating the update pod?

Comment 9 Jian Zhang 2020-09-29 01:46:03 UTC
Hi Daniel,

Thanks for your explantion! I see now, so the key reason is that we cannot ensure the imageID is consistent on the Registry and the OCP.
LGTM, verify it based on comment 5, 6.

Comment 12 errata-xmlrpc 2020-10-27 16:44:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.