Bug 1909928

Summary: Fail to create updateservice with osus operator due to OPERAND_IMAGE is not correct
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: OpenShift Update ServiceAssignee: Lalatendu Mohanty <lmohanty>
OpenShift Update Service sub component: operator QA Contact: liujia <jiajliu>
Status: CLOSED CURRENTRELEASE Docs Contact: Kathryn Alexander <kalexand>
Severity: high    
Priority: high CC: ableisch, bleanhar, lmohanty, yanyang
Version: 4.6   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-14 16:07:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liujia 2020-12-22 04:50:32 UTC
Description of problem (please be detailed as possible and provide log
snippests):
Fail to create updateservice with osus operator. Now OPERAND_IMAGE is refer to quay.io instead of registry.redhat.io.
# ./oc get po
NAME                                      READY   STATUS             RESTARTS   AGE
updateservice-operator-74f9dc44fb-t8ktx   1/1     Running            0          31m
updateservice-sample-6cc7b5c78-qwqxc      0/2     ImagePullBackOff   0          11m

Events:
  Type     Reason          Age                    From                                                  Message
  ----     ------          ----                   ----                                                  -------
  Normal   Scheduled       7h59m                                                                        Successfully assigned mytest/updateservice-sample-6cc7b5c78-qwqxc to jliu468-ksntb-w-b-1.c.openshift-qe.internal
  Normal   AddedInterface  7h59m                  multus                                                Add eth0 [10.128.2.21/23]
  Normal   Pulling         7h59m                  kubelet, jliu468-ksntb-w-b-1.c.openshift-qe.internal  Pulling image "jliu468.mirror-registry.qe.gcp.devcluster.openshift.com:5000/osus/cincinnati-graph-data-container:1.0"
  Normal   Pulled          7h59m                  kubelet, jliu468-ksntb-w-b-1.c.openshift-qe.internal  Successfully pulled image "jliu468.mirror-registry.qe.gcp.devcluster.openshift.com:5000/osus/cincinnati-graph-data-container:1.0" in 7.969827992s
  Normal   Created         7h59m                  kubelet, jliu468-ksntb-w-b-1.c.openshift-qe.internal  Created container graph-data
  Normal   Started         7h59m                  kubelet, jliu468-ksntb-w-b-1.c.openshift-qe.internal  Started container graph-data
  Warning  Failed          7h58m                  kubelet, jliu468-ksntb-w-b-1.c.openshift-qe.internal  Error: ImagePullBackOff
  Warning  Failed          7h58m                  kubelet, jliu468-ksntb-w-b-1.c.openshift-qe.internal  Failed to pull image "quay.io/cincinnati/cincinnati:latest": rpc error: code = Unknown desc = error pinging docker registry quay.io: Get "https://quay.io/v2/": dial tcp 34.192.143.146:443: i/o timeout
  Warning  Failed          7h58m                  kubelet, jliu468-ksntb-w-b-1.c.openshift-qe.internal  Error: ErrImagePull
  Normal   BackOff         7h58m (x2 over 7h58m)  kubelet, jliu468-ksntb-w-b-1.c.openshift-qe.internal  Back-off pulling image "quay.io/cincinnati/cincinnati:latest"
  Warning  Failed          7h58m (x2 over 7h58m)  kubelet, jliu468-ksntb-w-b-1.c.openshift-qe.internal  Error: ImagePullBackOff
  Normal   BackOff         7h58m                  kubelet, jliu468-ksntb-w-b-1.c.openshift-qe.internal  Back-off pulling image "quay.io/cincinnati/cincinnati:latest"
  Normal   Pulling         7h57m (x2 over 7h59m)  kubelet, jliu468-ksntb-w-b-1.c.openshift-qe.internal  Pulling image "quay.io/cincinnati/cincinnati:latest"


Version of all relevant components (if applicable):
bundle image: v1.0-2
operator: v4.6.0-1

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
yes

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:
no

Steps to Reproduce:
1. install ocp v4.6.8(disconnected cluster)
2. install osus operator from operatorhub
3. create updateservice with osus operator

Actual results:
Fail to create updateservice.

Expected results:
Updateservice is created successfully.

Additional info:
# ./oc get csv update-service-operator.v4.6.0 -oyaml|grep -A2 OPERAND_IMAGE
                - name: OPERAND_IMAGE
                  value: quay.io/cincinnati/cincinnati:latest
                image: registry.redhat.io/openshift-update-service/openshift-update-service-rhel8-operator@sha256:59e6e4cf633c2a4143693962d21564d6d5fec2b9078a66500d3d2f8476e6daca
The OPERAND_IMAGE should also refer to a tobe released registry/repo. AFAIK, quay.io/cincinnati/cincinnati should be an upstream repo.

Comment 1 Brenton Leanhardt 2021-02-02 17:51:03 UTC
Hi Liu Jia, we think this is fixed in the most recent bundle.  Would you mind checking?

Comment 2 liujia 2021-02-03 01:26:34 UTC
sure, will check all osus bugs based recent images this week.

Comment 3 liujia 2021-02-03 08:32:15 UTC
The issue still existed.

OSUS operator image: v4.6.0-4
Bundle image: v1.0-10

Fail to create updateservice with osus operator. OPERAND_IMAGE was not correct(Now it refer to openshift-update-service)
# ./oc get po
NAME                                      READY   STATUS             RESTARTS   AGE
updateservice-operator-84d88c6477-86m4j   1/1     Running            0          3h56m
updateservice-sample-776b4d8dbd-nmxc8     0/2     ImagePullBackOff   0          5m27s

Events:
  Type     Reason          Age                 From               Message
  ----     ------          ----                ----               -------
  Normal   Scheduled       8h                  default-scheduler  Successfully assigned openshift-update-service/updateservice-sample-776b4d8dbd-nmxc8 to jliu-a46-m9xlz-w-b-1.c.openshift-qe.internal
  Normal   AddedInterface  8h                  multus             Add eth0 [10.128.2.21/23]
  Normal   Pulled          8h                  kubelet            Container image "jliu-a46.mirror-registry.qe.gcp.devcluster.openshift.com:5000/rh-osbs/cincinnati-graph-data-container:v4.6.0" already present on machine
  Normal   Created         8h                  kubelet            Created container graph-data
  Normal   Started         8h                  kubelet            Started container graph-data
  Warning  Failed          8h                  kubelet            Failed to pull image "openshift-update-service": rpc error: code = Unknown desc = error pinging docker registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp 35.174.73.84:443: i/o timeout
  Warning  Failed          7h59m               kubelet            Failed to pull image "openshift-update-service": rpc error: code = Unknown desc = error pinging docker registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp 54.236.165.68:443: i/o timeout
  Warning  Failed          7h59m (x2 over 8h)  kubelet            Error: ErrImagePull
  Normal   BackOff         7h58m (x4 over 8h)  kubelet            Back-off pulling image "openshift-update-service"
  Warning  Failed          7h58m (x4 over 8h)  kubelet            Error: ImagePullBackOff
  Normal   BackOff         7h58m (x2 over 8h)  kubelet            Back-off pulling image "openshift-update-service"
  Warning  Failed          7h58m (x2 over 8h)  kubelet            Error: ImagePullBackOff
  Normal   Pulling         7h58m (x3 over 8h)  kubelet            Pulling image "openshift-update-service"

# ./oc describe deployment.apps/updateservice-sample|grep -B3 Image
                updateservice.operator.openshift.io/graph-builder-config-hash: 32258a7a01794001c6e47794df05cfe9a95d6cd02a24da368368e6e2456b09af
  Init Containers:
   graph-data:
    Image:        jliu-a46.mirror-registry.qe.gcp.devcluster.openshift.com:5000/rh-osbs/cincinnati-graph-data-container:v4.6.0
--
      /var/lib/cincinnati/graph-data from cincinnati-graph-data (rw)
  Containers:
   graph-builder:
    Image:       openshift-update-service
--
      /var/lib/cincinnati/graph-data from cincinnati-graph-data (rw)
      /var/lib/cincinnati/registry-credentials from pull-secret (ro)
   policy-engine:
    Image:       openshift-update-service

Workaround the issue via replacing "openshift-update-service" with correct osus container image in deployment to get other test move on.

Comment 4 Lalatendu Mohanty 2021-03-09 16:30:17 UTC
I believe this is fixed in latest build i.e. "cincinnati-operator-metadata-container-v1.0-20". Moving it to on_qa.

Comment 5 liujia 2021-03-10 03:56:16 UTC
OSUS operator image: v4.6.0-6
Bundle image: v1.0-20
Operand image: v4.6.0-8

updateservice still can not be deployed successfully.

# ./oc get po
NAME                                     READY   STATUS             RESTARTS   AGE
updateservice-operator-888d8784c-hnchb   1/1     Running            0          19h
updateservice-sample-74d9bd6f84-dzgxz    0/2     ImagePullBackOff   0          11m

Events:
  Type     Reason          Age                   From               Message
  ----     ------          ----                  ----               -------
  Normal   Scheduled       2m14s                 default-scheduler  Successfully assigned openshift-update-service/updateservice-sample-74d9bd6f84-dzgxz to jliu-46-tgtv2-w-a-0.c.openshift-qe.internal
  Normal   AddedInterface  2m13s                 multus             Add eth0 [10.128.2.21/23]
  Normal   Pulled          2m13s                 kubelet            Container image "jliu-46.mirror-registry.qe.gcp.devcluster.openshift.com:5000/rh-osbs/cincinnati-graph-data-container:1.0" already present on machine
  Normal   Created         2m12s                 kubelet            Created container graph-data
  Normal   Started         2m12s                 kubelet            Started container graph-data
  Warning  Failed          100s (x2 over 2m11s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff         100s (x4 over 2m12s)  kubelet            Back-off pulling image "registry-proxy.engineering.redhat.com/rh-osbs/cincinnati-openshift-update-service:v4.6.0"
  Normal   BackOff         100s (x2 over 2m11s)  kubelet            Back-off pulling image "registry-proxy.engineering.redhat.com/rh-osbs/cincinnati-openshift-update-service:v4.6.0"
  Warning  Failed          85s (x3 over 2m12s)   kubelet            Failed to pull image "registry-proxy.engineering.redhat.com/rh-osbs/cincinnati-openshift-update-service:v4.6.0": rpc error: code = Unknown desc = error pinging docker registry registry-proxy.engineering.redhat.com: Get "https://registry-proxy.engineering.redhat.com/v2/": dial tcp: lookup registry-proxy.engineering.redhat.com on 169.254.169.254:53: no such host
  Warning  Failed          85s (x3 over 2m12s)   kubelet            Error: ErrImagePull
  Warning  Failed          85s (x5 over 2m12s)   kubelet            Error: ImagePullBackOff
  Normal   Pulling         85s (x3 over 2m12s)   kubelet            Pulling image "registry-proxy.engineering.redhat.com/rh-osbs/cincinnati-openshift-update-service:v4.6.0"

ICSP was created and applied successfully.
# ./oc get imagecontentsourcepolicies.operator.openshift.io image-policy-aosqe -ojson|jq .spec.repositoryDigestMirrors[-1]
{
  "mirrors": [
    "jliu-46.mirror-registry.qe.gcp.devcluster.openshift.com:5000"
  ],
  "source": "registry-proxy.engineering.redhat.com"
}

# ./oc get csv update-service-operator.v4.6.0 -oyaml|grep -A1 OPERAND_IMAGE
                - name: OPERAND_IMAGE
                  value: registry-proxy.engineering.redhat.com/rh-osbs/cincinnati-openshift-update-service:v4.6.0

It should because we use image tag but not digest in csv since only digest make sense for icsp. So we still need update the operand image's value with digest instead of tag.

Comment 6 Lalatendu Mohanty 2021-03-10 22:03:11 UTC
The reason we use tag because Brew/OSBS OSBS will find the .image pullspec, replace the floating tag with the SHA256 digest and replace registry, namespace and repo. 

For example :

    old: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-ose-ansible-service-broker-operator:v4.3
    new: registry.redhat.io/openshift4/ose-ansible-service-broker-operator@sha256:abcdef...

I am looking in to the issue.

Comment 7 Lalatendu Mohanty 2021-03-16 16:36:32 UTC
This is fixed with metadata bundle image: v1.0-21. Moving it to ON_QA. The build is attached in errata.

Comment 8 liujia 2021-03-17 02:56:12 UTC
OSUS operator image: v4.6.0-6
Bundle image: v1.0-21
Operand image: v4.6.0-8

updateservice was deployed successfully.

# ./oc get po
NAME                                      READY   STATUS    RESTARTS   AGE
my-cincy-8f888cfdb-bqnvw                  2/2     Running   0          6m2s
updateservice-operator-587b9dfb9d-82bww   1/1     Running   0          17h


# ./oc get csv update-service-operator.v4.6.0 -oyaml|grep -A1 OPERAND_IMAGE
                - name: OPERAND_IMAGE
                  value: registry-proxy.engineering.redhat.com/rh-osbs/cincinnati-openshift-update-service@sha256:0f93e726a1b13b71a627ec24223694618b411ee08280751dbd9e9cdf89c7fddd

Verified the bug.