Bug 1866444

Summary: There are 2 duplicated running pods for certified-operators/community-operators/redhat-marketplace/redhat-operators in the project openshift-marketplace
Product: OpenShift Container Platform Reporter: yhui
Component: OLMAssignee: Daniel Sover <dsover>
OLM sub component: OLM QA Contact: yhui
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: anbhatta, dsover, jiazha, krizza, lgallett
Version: 4.6Keywords: UpcomingSprint
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:25:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1878297    

Description yhui 2020-08-05 15:07:39 UTC
Description of problem:
On OCP 4.6 cluster, there are 2 duplicated running pods for certified-operators/community-operators/redhat-marketplace/redhat-operators in the project openshift-marketplace.
[root@preserve-olm-env daily-test]# oc get pod -n openshift-marketplace
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-mwwcb               1/1     Running   0          12m
certified-operators-wgj9f               1/1     Running   0          67m
community-operators-hmt56               1/1     Running   0          27m
community-operators-m6jr4               1/1     Running   0          12m
marketplace-operator-554c465f85-94zrj   1/1     Running   0          72m
qe-app-registry-5q6zm                   1/1     Running   0          74m
redhat-marketplace-bjxlx                1/1     Running   0          12m
redhat-marketplace-k2n7k                1/1     Running   0          74m
redhat-operators-45ztg                  1/1     Running   0          12m
redhat-operators-jzqms                  1/1     Running   0          74m



Version-Release number of selected component (if applicable):
[root@preserve-olm-env daily-test]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-08-05-082458   True        False         83m     Cluster version is 4.6.0-0.nightly-2020-08-05-082458
[root@preserve-olm-env daily-test]# oc exec catalog-operator-5d748db9c6-5pt47 -n openshift-operator-lifecycle-manager -- olm --version
OLM version: 0.16.0
git commit: 163608d60f37cc3496736bfc4ec72ca01dc7083a


How reproducible:
Always


Steps to Reproduce:
1. Install OCP 4.6 cluster.
2. Check the pods in the project openshift-marketplace.
[root@preserve-olm-env daily-test]# oc get pod -n openshift-marketplace
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-mwwcb               1/1     Running   0          12m
certified-operators-wgj9f               1/1     Running   0          67m
community-operators-hmt56               1/1     Running   0          27m
community-operators-m6jr4               1/1     Running   0          12m
marketplace-operator-554c465f85-94zrj   1/1     Running   0          72m
qe-app-registry-5q6zm                   1/1     Running   0          74m
redhat-marketplace-bjxlx                1/1     Running   0          12m
redhat-marketplace-k2n7k                1/1     Running   0          74m
redhat-operators-45ztg                  1/1     Running   0          12m
redhat-operators-jzqms                  1/1     Running   0          74m


Actual results:
There are 2 duplicated running pods for certified-operators/community-operators/redhat-marketplace/redhat-operators in the project openshift-marketplace.


Expected results:
There should be 1 running pods for certified-operators/community-operators/redhat-marketplace/redhat-operators in the project openshift-marketplace.


Additional info:
To debug, the cluster access info is https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Launch%20Environment%20Flexy/105277/artifact/workdir/install-dir/auth/kubeconfig/*view*/.

Comment 1 yhui 2020-08-05 15:20:53 UTC
Then create subscription in the project test-operators. No InstallPlan, csv or pods created.

[root@preserve-olm-env daily-test]# oc create ns test-operators
namespace/test-operators created
[root@preserve-olm-env daily-test]# oc create -f - <<EOF
> apiVersion: operators.coreos.com/v1
> kind: OperatorGroup
> metadata:
>   name: test-operators-og
>   namespace: test-operators
> spec:
>   targetNamespaces:
>   - test-operators  
> EOF
operatorgroup.operators.coreos.com/test-operators-og created

[root@preserve-olm-env daily-test]# oc create -f - <<EOF
> apiVersion: operators.coreos.com/v1alpha1
> kind: Subscription
> metadata:
>   generation: 1
>   name: amq-streams
>   namespace: test-operators
> spec:
>   channel: stable
>   installPlanApproval: Automatic
>   name: amq-streams
>   source: redhat-operators
>   sourceNamespace: openshift-marketplace
>   startingCSV: amqstreams.v1.2.0
> EOF
subscription.operators.coreos.com/amq-streams created

[root@preserve-olm-env daily-test]# oc get sub -n test-operators
NAME          PACKAGE       SOURCE             CHANNEL
amq-streams   amq-streams   redhat-operators   stable

[root@preserve-olm-env daily-test]# oc get ip -n test-operators
No resources found in test-operators namespace.

[root@preserve-olm-env daily-test]# oc get csv -n test-operators
No resources found in test-operators namespace.

Comment 2 lgallett 2020-08-06 13:46:06 UTC
this is expected behavior as part of the polling feature of these catalogsources. The old pods should be cleaned up during the next sync cycle. That there is no installPlan is a separate issue that has to do with the use of a stale registry image

Comment 3 yhui 2020-08-07 03:38:52 UTC
OK. How long is the sync interval? Can it be set in the configuration file or by other method? 
The old pods had been running for more than 1 hours, not be cleaned up. This behavior is strange.

Comment 8 Daniel Sover 2020-09-03 17:32:55 UTC
*** Bug 1870367 has been marked as a duplicate of this bug. ***

Comment 10 yhui 2020-09-14 09:19:28 UTC
Version:
[hui@localhost verification-tests]$ oc get clusterversion
NAME      VERSION                        AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.ci-2020-09-12-172918   True        False         36h     Cluster version is 4.6.0-0.ci-2020-09-12-172918
[hui@localhost verification-tests]$ oc exec olm-operator-67d84fff6c-ndgnc -n openshift-operator-lifecycle-manager -- olm --version
OLM version: 0.16.1
git commit: 6d26c16166b232561132985e1132fce4b4d36532

Steps to test:
1, Install OCP 4.6 cluster.

2, Check the catalogsource pods in the project openshift-marketplace.
[hui@localhost verification-tests]$ oc get pod -n openshift-marketplace
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-jz4pt               1/1     Running   0          7m53s
certified-operators-r7g7n               1/1     Running   0          16h
community-operators-krrb6               1/1     Running   0          7m53s
community-operators-tgsc7               1/1     Running   0          116m
marketplace-operator-54789d9cf4-8cd5h   1/1     Running   0          36h
qe-app-registry-gvcj8                   1/1     Running   0          36h
qe-app-registry-wfwxw                   1/1     Running   0          7m53s
redhat-marketplace-94fmf                1/1     Running   0          36h
redhat-marketplace-trgfg                1/1     Running   0          7m53s
redhat-operators-t6bk5                  1/1     Running   0          7m53s
redhat-operators-wg6h9                  1/1     Running   0          36h

3, According to the Comment 2 and Comment 7, none of the newer catalogs have an age of more than 15 minutes. And this is the original feature design.

The bug is verified.

Comment 13 errata-xmlrpc 2020-10-27 16:25:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196