Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1626425

Summary:	[olm] run failed when using the downstream image of olm
Product:	OpenShift Container Platform	Reporter:	Jian Zhang <jiazha>
Component:	OLM	Assignee:	Evan Cordell <ecordell>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Jian Zhang <jiazha>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	3.11.0	CC:	chezhang, dyan, jfan, wsun, zitang
Target Milestone:	---
Target Release:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-12-21 15:23:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jian Zhang 2018-09-07 09:45:57 UTC

Description of problem:
Got below errors when using the downstream image of olm:
E0907 09:29:43.857199       1 queueinformer_operator.go:121] Sync "operator-lifecycle-manager/ocs" failed: failed to update catalog source ocs status: the server could not find the requested resource (put catalogsources.operators.coreos.com ocs)


Version-Release number of selected component (if applicable):
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-operator-lifecycle-manager:v3.11

How reproducible:
always

Steps to Reproduce:
1. Install the OLM component via the openshift-ansible openshift-ansible-3.11.0-0.28.0 branch.
2. Replace the upstream image with the downstream image.
# oc edit deployment
[root@qe-jiazha-testmaster-etcd-1 ~]# oc get pods -o yaml | grep image
      image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-operator-lifecycle-manager:v3.11
      imagePullPolicy: IfNotPresent
    imagePullSecrets:
      image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-operator-lifecycle-manager:v3.11
      imageID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-operator-lifecycle-manager@sha256:ad85f3223e21aea490c9fb499969e8006655fe6c6e209866326b75ca66c7b05a
      image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-operator-lifecycle-manager:v3.11
      imagePullPolicy: IfNotPresent
    imagePullSecrets:
      image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-operator-lifecycle-manager:latest
      imageID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-operator-lifecycle-manager@sha256:ad85f3223e21aea490c9fb499969e8006655fe6c6e209866326b75ca66c7b05a

3. Check the catalog-operator logs.

Actual results:
[root@qe-jiazha-testmaster-etcd-1 ~]# oc get pods
NAME                                READY     STATUS    RESTARTS   AGE
alm-operator-566c45b4c8-fvvtn       1/1       Running   0          42s
catalog-operator-58bdf7bb8c-98h79   1/1       Running   0          7m
[root@qe-jiazha-testmaster-etcd-1 ~]# oc logs -f catalog-operator-58bdf7bb8c-98h79
time="2018-09-07T09:29:43Z" level=info msg="Using in-cluster kube client config"
time="2018-09-07T09:29:43Z" level=info msg="Using in-cluster kube client config"
time="2018-09-07T09:29:43Z" level=info msg="connection established. cluster-version: v1.11.0+d4cacc0"
time="2018-09-07T09:29:43Z" level=info msg="Operator ready"
time="2018-09-07T09:29:43Z" level=info msg="starting informers..."
time="2018-09-07T09:29:43Z" level=info msg="waiting for caches to sync..."
time="2018-09-07T09:29:43Z" level=info msg="operator-lifecycle-manager/ocs added"
time="2018-09-07T09:29:43Z" level=info msg="starting workers..."
time="2018-09-07T09:29:43Z" level=info msg="getting from queue" key=operator-lifecycle-manager/ocs queue=catsrc
time="2018-09-07T09:29:43Z" level=info msg="retrying operator-lifecycle-manager/ocs"
E0907 09:29:43.857199       1 queueinformer_operator.go:121] Sync "operator-lifecycle-manager/ocs" failed: failed to update catalog source ocs status: the server could not find the requested resource (put catalogsources.operators.coreos.com ocs)
time="2018-09-07T09:29:43Z" level=info msg="getting from queue" key=operator-lifecycle-manager/ocs queue=catsrc
time="2018-09-07T09:29:43Z" level=info msg="retrying operator-lifecycle-manager/ocs"
E0907 09:29:43.888573       1 queueinformer_operator.go:121] Sync "operator-lifecycle-manager/ocs" failed: failed to update catalog source ocs status: the server could not find the requested resource (put catalogsources.operators.coreos.com ocs)
...

Expected results:
The catalog-operator works well.

Additional info:
The catalog-operator works well when using the upstream image.
quay.io/coreos/catalog@sha256:20886d49205aa8d8fd53f1c85fad6a501775226da25ef14f51258b7066e91064

Comment 1 Evan Cordell 2018-09-10 22:14:28 UTC

I believe this is now fixed in master

Comment 2 Jian Zhang 2018-09-11 05:19:46 UTC

I install the OLM component via the latest openshift-ansible master branch. Still got the same error. Details:

[root@qe-juzhao-311-gce-1-master-etcd-1 ~]# oc get pods
NAME                               READY     STATUS    RESTARTS   AGE
alm-operator-7bccff7988-8hqxm      1/1       Running   0          30s
catalog-operator-f655bccb9-rkdm2   1/1       Running   0          27s
[root@qe-juzhao-311-gce-1-master-etcd-1 ~]# oc logs -f catalog-operator-f655bccb9-rkdm2
time="2018-09-11T03:30:07Z" level=info msg="Using in-cluster kube client config"
time="2018-09-11T03:30:08Z" level=info msg="Using in-cluster kube client config"
time="2018-09-11T03:30:08Z" level=info msg="connection established. cluster-version: v1.11.0+d4cacc0"
time="2018-09-11T03:30:08Z" level=info msg="Operator ready"
time="2018-09-11T03:30:08Z" level=info msg="starting informers..."
time="2018-09-11T03:30:08Z" level=info msg="waiting for caches to sync..."
time="2018-09-11T03:30:08Z" level=info msg="operator-lifecycle-manager/rh-operators added"
time="2018-09-11T03:30:08Z" level=info msg="starting workers..."
time="2018-09-11T03:30:08Z" level=info msg="getting from queue" key=operator-lifecycle-manager/rh-operators queue=catsrc
time="2018-09-11T03:30:08Z" level=info msg="retrying operator-lifecycle-manager/rh-operators"
E0911 03:30:08.271076       1 queueinformer_operator.go:121] Sync "operator-lifecycle-manager/rh-operators" failed: failed to update catalog source rh-operators status: the server could not find the requested resource (put catalogsources.operators.coreos.com rh-operators)
time="2018-09-11T03:30:08Z" level=info msg="getting from queue" key=operator-lifecycle-manager/rh-operators queue=catsrc
time="2018-09-11T03:30:08Z" level=info msg="retrying operator-lifecycle-manager/rh-operators"
E0911 03:30:08.372938       1 queueinformer_operator.go:121] Sync "operator-lifecycle-manager/rh-operators" failed: failed to update catalog source rh-operators status: the server could not find the requested resource (put catalogsources.operators.coreos.com rh-operators)
time="2018-09-11T03:30:08Z" level=info msg="getting from queue" key=operator-lifecycle-manager/rh-operators queue=catsrc
time="2018-09-11T03:30:08Z" level=info msg="retrying operator-lifecycle-manager/rh-operators"
E0911 03:30:08.443381       1 queueinformer_operator.go:121] Sync "operator-lifecycle-manager/rh-operators" failed: failed to update catalog source rh-operators status: the server could not find the requested resource (put catalogsources.operators.coreos.com rh-operators)
time="2018-09-11T03:30:08Z" level=info msg="getting from queue" key=operator-lifecycle-manager/rh-operators queue=catsrc
time="2018-09-11T03:30:08Z" level=info msg="retrying operator-lifecycle-manager/rh-operators"
E0911 03:30:08.523392       1 queueinformer_operator.go:121] Sync "operator-lifecycle-manager/rh-operators" failed: failed to update catalog source rh-operators status: the server could not find the requested resource (put catalogsources.operators.coreos.com rh-operators)
time="2018-09-11T03:30:08Z" level=info msg="getting from queue" key=operator-lifecycle-manager/rh-operators queue=catsrc
time="2018-09-11T03:30:08Z" level=info msg="retrying operator-lifecycle-manager/rh-operators"
E0911 03:30:08.598387       1 queueinformer_operator.go:121] Sync "operator-lifecycle-manager/rh-operators" failed: failed to update catalog source rh-operators status: the server could not find the requested resource (put catalogsources.operators.coreos.com rh-operators)
time="2018-09-11T03:30:08Z" level=info msg="getting from queue" key=operator-lifecycle-manager/rh-operators queue=catsrc



[root@qe-juzhao-311-gce-1-master-etcd-1 ~]# oc get pods -o yaml | grep image
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imagePullPolicy: IfNotPresent
    imagePullSecrets:
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imageID: docker-pullable://registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager@sha256:b3656f79465c21b0843739dbe3456fadeabfcaf3014551f85cb91f57da559948
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imagePullPolicy: IfNotPresent
    imagePullSecrets:
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imageID: docker-pullable://registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager@sha256:b3656f79465c21b0843739dbe3456fadeabfcaf3014551f85cb91f57da559948

Comment 3 Evan Cordell 2018-09-11 15:09:48 UTC

I was able to reproduce this by using the `registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11` and applying the files in the olm role in openshift-ansible.

The root of the problem is that the openshift-ansible had an older version of CatalogSource CRD that didn't have the Status subresource enabled.

I have a PR out to fix this, which I will also cherry pick onto 3.11: https://github.com/openshift/openshift-ansible/pull/10004

Comment 4 Jian Zhang 2018-09-12 09:01:15 UTC

Evan,

Thanks! I used the latest master branch to install it separately and it works as expected. LGTM.

[root@qe-jiazha-311-gce-1-master-etcd-1 ~]# oc get pods
NAME                                READY     STATUS    RESTARTS   AGE
catalog-operator-76c846684c-kxjgt   1/1       Running   0          4h
olm-operator-5b7f7c4556-6xlqm       1/1       Running   0          4h

[root@qe-jiazha-311-gce-1-master-etcd-1 ~]# oc get pods -o yaml | grep image
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imagePullPolicy: IfNotPresent
    imagePullSecrets:
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imageID: docker-pullable://registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager@sha256:b3656f79465c21b0843739dbe3456fadeabfcaf3014551f85cb91f57da559948
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imagePullPolicy: IfNotPresent
    imagePullSecrets:
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imageID: docker-pullable://registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager@sha256:b3656f79465c21b0843739dbe3456fadeabfcaf3014551f85cb91f57da559948

INSTALLER STATUS **************************************************************************************************************************************************************************************************
Initialization  : Complete (0:00:59)
OLM Install     : Complete (0:02:37)
Wednesday 12 September 2018  12:29:19 +0800 (0:00:00.038)       0:03:36.767 *** 
=============================================================================== 
olm : Copy manifests to temp directory -------------------------------------------------------------------------------------------------------------------------------------------------------------------- 91.94s
Gathering Facts ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 32.08s
olm : Apply rh-operators ConfigMap manifest --------------------------------------------------------------------------------------------------------------------------------------------------------------- 10.41s
olm : Set olm-operator template ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 6.99s
olm : Set catalog-operator template ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 6.85s
Run variable sanity checks --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.18s
Gather Cluster facts --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 4.04s
olm : create operator-lifecycle-manager project ------------------------------------------------------------------------------------------------------------------------------------------------------------ 3.98s
olm : Apply aggregate-olm-view ClusterRole manifest -------------------------------------------------------------------------------------------------------------------------------------------------------- 3.73s
Initialize openshift.node.sdn_mtu -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 3.51s
olm : Apply olm-operator Deployment manifest --------------------------------------------------------------------------------------------------------------------------------------------------------------- 3.27s
olm : Apply aggregate-olm-edit ClusterRole manifest -------------------------------------------------------------------------------------------------------------------------------------------------------- 3.23s
olm : Apply ocs CatalogSource manifest --------------------------------------------------------------------------------------------------------------------------------------------------------------------- 3.09s
olm : Apply clusterserviceversions.operators.coreos.com CustomResourceDefinition manifest ------------------------------------------------------------------------------------------------------------------ 2.93s
olm : Apply subscriptions.operators.coreos.com CustomResourceDefinition manifest --------------------------------------------------------------------------------------------------------------------------- 2.88s
olm : Apply catalog-operator Deployment manifest ----------------------------------------------------------------------------------------------------------------------------------------------------------- 2.81s
olm : Apply catalogsources.operators.coreos.com CustomResourceDefinition manifest -------------------------------------------------------------------------------------------------------------------------- 2.73s
olm : Apply installplans.operators.coreos.com CustomResourceDefinition manifest ---------------------------------------------------------------------------------------------------------------------------- 2.73s
olm : Apply olm-operator-serviceaccount ServiceAccount manifest -------------------------------------------------------------------------------------------------------------------------------------------- 2.68s
olm : Apply operator-lifecycle-manager ClusterRole manifest ------------------------------------------------------------------------------------------------------------------------------------------------ 2.58s

The latest git commit:
[jzhang@localhost openshift-ansible]$ git log
commit 7a493317fe5e889db84c62eaa6f8b31d88385eda
Merge: 2c62a3b 4d6fbd0
Author: OpenShift Merge Robot <openshift-merge-robot.github.com>
Date:   Tue Sep 11 15:59:52 2018 -0700

    Merge pull request #10011 from brancz/fix-tag-removal
    
    cluster-monitoring: Fix regex_replace to remove image tag

Comment 5 Luke Meyer 2018-12-21 15:23:37 UTC

Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.