Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1620329

Summary: should replace the upstream images with the downstream images in the OLM code
Product: OpenShift Container Platform Reporter: Jian Zhang <jiazha>
Component: InstallerAssignee: Evan Cordell <ecordell>
Status: CLOSED CURRENTRELEASE QA Contact: Jian Zhang <jiazha>
Severity: high Docs Contact:
Priority: urgent    
Version: 3.11.0CC: aos-bugs, ecordell, jokerman, mmccomas, wsun
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-21 15:23:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jian Zhang 2018-08-23 03:16:19 UTC
Description of problem:
At present, we use the upstream images to deploy the OLM component in openshift-ansible, list them in the below, we should replace them with the downstream images.

https://github.com/openshift/openshift-ansible/blob/master/roles/olm/files/13-catalog-operator.deployment.yaml#L30
https://github.com/openshift/openshift-ansible/blob/master/roles/olm/files/12-alm-operator.deployment.yaml#L27
https://github.com/openshift/openshift-ansible/blob/master/roles/olm/files/08-ocs.configmap.yaml#L5823
https://github.com/openshift/openshift-ansible/blob/master/roles/olm/files/08-ocs.configmap.yaml#L6008
https://github.com/openshift/openshift-ansible/blob/master/roles/olm/files/08-ocs.configmap.yaml#L6019
https://github.com/openshift/openshift-ansible/blob/master/roles/olm/files/08-ocs.configmap.yaml#L6291
https://github.com/openshift/openshift-ansible/blob/master/roles/olm/files/08-ocs.configmap.yaml#L6302
https://github.com/openshift/openshift-ansible/blob/master/roles/olm/files/08-ocs.configmap.yaml#L6316
https://github.com/openshift/openshift-ansible/blob/master/roles/olm/files/08-ocs.configmap.yaml#L6580
https://github.com/openshift/openshift-ansible/blob/master/roles/olm/files/08-ocs.configmap.yaml#L6837
https://github.com/openshift/openshift-ansible/blob/master/roles/olm/files/08-ocs.configmap.yaml#L7123

  
Version-Release number of selected component (if applicable):
openshift-ansible master branch

How reproducible:
always

Steps to Reproduce:
1. Install the OLM component via the openshift-ansible. 
2. Create subscriptions to create the "etcd-operator" and "Premetheus-operator".
3. Check their images.

Actual results:
image: quay.io/coreos/olm@sha256:44b445850b3e612c062424c3727bb85048ec8e71407b39985786d29aa20f5c79
image: quay.io/coreos/catalog@sha256:20886d49205aa8d8fd53f1c85fad6a501775226da25ef14f51258b7066e91064
image: quay.io/coreos/prometheus-operator@sha256:3daa69a8c6c2f1d35dcf1fe48a7cd8b230e55f5229a1ded438f687debade5bcf
...

Expected results:
Should replace these upstream images with the downstream images in the openshift-ansible files once the downstream images are ready.

Additional info:


Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Jian Zhang 2018-09-03 05:22:48 UTC
FYI, PR: https://github.com/openshift/openshift-ansible/pull/9864

Comment 2 Jian Zhang 2018-09-06 08:57:54 UTC
Added the "testblocker" keywords since it blocks the installation of the OLM.

Comment 3 Jian Zhang 2018-09-06 10:07:36 UTC
I encounter some problems when installing the OLM via the openshift-ansible master branch which PR https://github.com/openshift/openshift-ansible/pull/9864 merged in.
I want to sync with you here:

1,  The install/uninstall task cannot be invoked, I new a bug 1625875 to trace it.

2, Why we depend on the "openshift_cluster_monitoring_operator_namespace"? IMO, we should use the "operator-lifecycle-manager" namespace. Or is it a bug?
Here: https://github.com/openshift/openshift-ansible/blob/master/roles/olm/tasks/install.yaml#L26 

3, I also encounter below errors, is it a bug?
TASK [olm : Set olm-operator template] ****************************************************************************************************************************************************************************
Thursday 06 September 2018  16:52:24 +0800 (0:01:18.862)       0:02:23.990 **** 
fatal: [qe-jiazha-master-etcd-1.0906-c4y.qe.rhcloud.com]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'l_osm_registry_url' is undefined"}

Comment 4 Evan Cordell 2018-09-07 02:16:28 UTC
This is fixed in this PR pending merge: https://github.com/openshift/openshift-ansible/pull/9949

Comment 5 Jian Zhang 2018-09-10 05:58:28 UTC
I install the OLM component separately via openshift-ansible-3.11.0-0.33.0 branch since bug 1626812.

The installation for OLM was succeeded, but the alm-operator pods worked failed. Logs as below:
[root@qe-juzhao-311-gce-1-master-etcd-1 ~]# oc get pods
NAME                               READY     STATUS             RESTARTS   AGE
alm-operator-7bccff7988-lskfh      0/1       CrashLoopBackOff   6          7m
catalog-operator-f655bccb9-qgbf4   1/1       Running            0          7m
[root@qe-juzhao-311-gce-1-master-etcd-1 ~]# oc logs -f alm-operator-7bccff7988-lskfh
time="2018-09-10T05:52:30Z" level=info msg="Using in-cluster kube client config"
time="2018-09-10T05:52:30Z" level=info msg="Using in-cluster kube client config"
time="2018-09-10T05:52:30Z" level=fatal msg="error configuring operator: namespaces is forbidden: User \"system:serviceaccount:operator-lifecycle-manager:olm-operator-serviceaccount\" cannot list namespaces at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io \"system:controller:operator-lifecycle-manager\" not found"

I believe there are something wrong with the RBAC policy.

Comment 6 Jian Zhang 2018-09-10 07:29:14 UTC
Evan,

I believe the root cause is the deployment of the "clusterrole" missed in the install task: https://github.com/openshift/openshift-ansible/blob/master/roles/olm/tasks/install.yaml

INSTALLER STATUS **************************************************************************************************************************************************************************************************
Initialization  : Complete (0:00:57)
OLM Install     : Complete (0:02:47)
Monday 10 September 2018  13:46:33 +0800 (0:00:00.049)       0:03:45.196 ****** 
=============================================================================== 
olm : Copy manifests to temp directory ------------------------------------------------------------------------------------------------------------------------------------------------------------------- 102.94s
Gathering Facts ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 30.84s
olm : Apply rh-operators ConfigMap manifest --------------------------------------------------------------------------------------------------------------------------------------------------------------- 12.86s
olm : Set olm-operator template ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 7.03s
olm : Set catalog-operator template ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 6.97s
Run variable sanity checks --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 4.51s
Gather Cluster facts --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 4.46s
Initialize openshift.node.sdn_mtu -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 4.09s
olm : Apply clusterserviceversions.operators.coreos.com CustomResourceDefinition manifest ------------------------------------------------------------------------------------------------------------------ 3.87s
olm : create operator-lifecycle-manager project ------------------------------------------------------------------------------------------------------------------------------------------------------------ 3.36s
olm : Apply olm-operator Deployment manifest --------------------------------------------------------------------------------------------------------------------------------------------------------------- 3.04s
olm : Apply catalog-operator Deployment manifest ----------------------------------------------------------------------------------------------------------------------------------------------------------- 3.03s
olm : Apply aggregate-olm-edit ClusterRole manifest -------------------------------------------------------------------------------------------------------------------------------------------------------- 2.91s
olm : Apply ocs CatalogSource manifest --------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2.87s
olm : Apply subscriptions.operators.coreos.com CustomResourceDefinition manifest --------------------------------------------------------------------------------------------------------------------------- 2.84s
olm : Apply olm-operator-serviceaccount ServiceAccount manifest -------------------------------------------------------------------------------------------------------------------------------------------- 2.82s
olm : Apply aggregate-olm-view ClusterRole manifest -------------------------------------------------------------------------------------------------------------------------------------------------------- 2.78s
olm : Apply catalogsources.operators.coreos.com CustomResourceDefinition manifest -------------------------------------------------------------------------------------------------------------------------- 2.71s
olm : Apply olm-operator-binding-operator-lifecycle-manager ClusterRoleBinding manifest -------------------------------------------------------------------------------------------------------------------- 2.69s
olm : Apply installplans.operators.coreos.com CustomResourceDefinition manifest ---------------------------------------------------------------------------------------------------------------------------- 2.62s

 
[root@qe-juzhao-311-gce-1-master-etcd-1 ~]# oc get clusterrole system:controller:operator-lifecycle-manager -o yaml
Error from server (NotFound): clusterroles.authorization.openshift.io "system:controller:operator-lifecycle-manager" not found

Comment 7 Evan Cordell 2018-09-10 22:13:15 UTC
This is fixed in openshift-ansible master.

Comment 8 Jian Zhang 2018-09-11 05:45:19 UTC
Yes, I think the fixed PR is: https://github.com/openshift/openshift-ansible/pull/9974

I used the latest master branch to test it and it works as expected for installation. We can see the images of OLM have been changed to the downstream images. Verify it. But, we still encounter bug 1626425.

[root@qe-juzhao-311-gce-1-master-etcd-1 ~]# oc get pods -o yaml -n  operator-lifecycle-manager | grep image 
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imagePullPolicy: IfNotPresent
    imagePullSecrets:
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imageID: docker-pullable://registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager@sha256:b3656f79465c21b0843739dbe3456fadeabfcaf3014551f85cb91f57da559948
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imagePullPolicy: IfNotPresent
    imagePullSecrets:
      image: registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager:v3.11
      imageID: docker-pullable://registry.reg-aws.openshift.com:443/openshift3/ose-operator-lifecycle-manager@sha256:b3656f79465c21b0843739dbe3456fadeabfcaf3014551f85cb91f57da559948


The latest git commit for the master branch.
[jzhang@localhost openshift-ansible]$ git log
commit 472b3687f35f48da78ee51d7bf8051f911f6c66c
Merge: 50f6d45 7a0551a
Author: OpenShift Merge Robot <openshift-merge-robot.github.com>
Date:   Mon Sep 10 19:04:16 2018 -0700

    Merge pull request #9972 from smarterclayton/sdn_prep
    
    Prepare to split openshift-sdn out of the openshift binary

commit 50f6d45d800a12998c26f232853877e597b49a1a
Merge: f635dd1 9cbf039

Comment 9 Jian Zhang 2018-09-11 05:50:15 UTC
PS: as we synced with emails, the Etcd-operator and Prometheus-operator which provided by the OLM still use the upstream images for 3.11.

Comment 10 Luke Meyer 2018-12-21 15:23:12 UTC
Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.