Bug 1678654 - [marketplace] the default catalogsourceconfig and marketplace’s pod will be deleted automatically
Summary: [marketplace] the default catalogsourceconfig and marketplace’s pod will be d...
Keywords:
Status: CLOSED DUPLICATE of bug 1679309
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.1.0
Assignee: Aravindh Puthiyaparambil
QA Contact: Fan Jia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-19 10:27 UTC by Fan Jia
Modified: 2019-03-12 14:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-20 21:04:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Fan Jia 2019-02-19 10:27:30 UTC
Description of problem:
the default catalogsourceconfig and marketplace’s pod will be deleted automatically , and the reload of marketplace will occur “clusteroperators.config.openshift.io\" is forbidden”

Version-Release number of selected component (if applicable):
clusterversion:4.0.0-0.nightly-2019-02-18-224151
marketplace image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fa1bfe505ba77054fd42aa8d2af7094dbe3a19242639e18b6924b564f583799a
olm : quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a8ca6bf86ff96fc7487ed4d80b7d8f9fa51c6a0fbc3b6ec95b3e73ea2f7fdf2a

How reproducible:
always

Steps to Reproduce:
1. install the cluster

Actual results:
1. the marketplace’s pod and catalogsourceconfig `certified-operators`,`community-operators`, `redhat-operators` disappears somehow and the marketplace‘s first time reload have `clusteroperators.config.openshift.io\ is forbidden`, you can delete the pod and the second time the pod and catalogsourceconfig can reconciled successfully


Expected results:
1. no delete actions

Additional info:
1.the logs before the marketplace crash :
`time="2019-02-19T08:29:28Z" level=info msg="Out of sync, scheduling for reconciliation from 'Purging' phase" name=certified-operators namespace=openshift-marketplace type=OperatorSourcetime="2019-02-19T08:29:28Z" level=info msg="Purging all resource(s)" name=certified-operators namespace=openshift-marketplace type=OperatorSourcetime="2019-02-19T08:33:24Z" level=info msg="Purging all resource(s)" name=certified-operators namespace=openshift-marketplace type=OperatorSource
time="2019-02-19T08:33:24Z" level=info msg="Finalizer removed, now garbage collector will clean it up." name=community-operators targetNamespace=openshift-marketplace type=CatalogSourceConfigtime="2019-02-19T08:33:24Z" level=info msg="Reconciling CatalogSourceConfig openshift-marketplace/certified-operators\n"time="2019-02-19T08:33:24Z" level=info msg="Reconciling CatalogSourceConfig openshift-marketplace/community-operators\n"time="2019-02-19T08:33:24Z" level=info msg="Reconciling CatalogSourceConfig openshift-marketplace/redhat-operators\n"time="2019-02-19T08:33:24Z" level=info msg="Finalizer removed, now garbage collector will clean it up." name=redhat-operators targetNamespace=openshift-marketplace type=CatalogSourceConfigtime="2019-02-19T08:33:24Z" level=info msg="Reconciling CatalogSourceConfig openshift-marketplace/redhat-operators\n"
time="2019-02-19T08:33:24Z" level=info msg="Finalizer removed, now garbage collector will clean it up." name=redhat-operators targetNamespace=openshift-marketplace type=CatalogSourceConfigtime="2019-02-19T08:33:24Z" level=info msg="Reconciling CatalogSourceConfig openshift-marketplace/redhat-operators\n"2019/02/19 08:33:25 <nil>`


2.the logs for the first time reload of marketplace:
`
2019/02/18 07:23:47 Go Version: go1.10.8
2019/02/18 07:23:47 Go OS/Arch: linux/amd64
2019/02/18 07:23:47 operator-sdk Version: v0.3.0
time="2019-02-18T07:23:47Z" level=warning msg="ClusterOperator API not present: customresourcedefinitions.apiextensions.k8s.io \"clusteroperators.config.openshift.io\" is forbidden: User \"system:serviceaccount:openshift-marketplace:marketplace-operator\" cannot get resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the cluster scope"
2019/02/18 07:23:47 Registering Components.
2019/02/18 07:23:47 Starting the Cmd.

E0218 07:23:57.948887       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:openshift-marketplace:marketplace-operator" cannot list resource "secrets" in API group "" in the namespace "openshift-marketplace"
time="2019-02-18T07:23:58Z" level=info msg="Reconciling CatalogSourceConfig openshift-marketplace/certified-operators\n"

`
3.the logs for the second time reload of marketplace:
`2019/02/19 10:17:21 Go Version: go1.10.82019/02/19 10:17:21 Go OS/Arch: linux/amd642019/02/19 10:17:21 operator-sdk Version: v0.3.02019/02/19 10:17:21 Registering Components.2019/02/19 10:17:21 Starting the Cmd.time="2019-02-19T10:17:21Z" level=info msg="[sync] Operator source sync loop will start after 10m0s"time="2019-02-19T10:17:21Z" level=info msg="[sync] CatalogSourceConfig sync loop will start after 10m0s"time="2019-02-19T10:17:21Z" level=info msg="Found existing ClusterOperator"time="2019-02-19T10:17:21Z" level=info msg="Setting ClusterOperator condition: Available message: Operator running"
time="2019-02-19T10:17:34Z" level=info msg="Created Deployment certified-operators with registry command: [appregistry-server -s openshift-marketplace/certified-operators -o couchbase-enterprise,mongodb-enterprise]" name=certified-operators targetNamespace=openshift-marketplace type=CatalogSourceConfigtime="2019-02-19T10:17:34Z" level=info msg="Created Service certified-operators" name=certified-operators targetNamespace=openshift-marketplace type=CatalogSourceConfigtime="2019-02-19T10:17:34Z" level=info msg="Creating CatalogSource certified-operators" name=certified-operators targetNamespace=openshift-marketplace type=CatalogSourceConfigtime="2019-02-19T10:17:34Z" level=info msg="Created CatalogSource certified-operators" name=certified-operators targetNamespace=openshift-marketplace type=CatalogSourceConfigtime="2019-02-19T10:17:34Z" level=info msg="The object has been successfully reconciled" name=certified-operators targetNamespace=openshift-marketplace type=CatalogSourceConfigtime="2019-02-19T10:17:34Z" level=info msg="Reconciling CatalogSourceConfig openshift-marketplace/certified-operators\n"time="2019-02-19T10:17:34Z" level=info msg="No action taken, the object has already been reconciled" name=certified-operators targetNamespace=openshift-marketplace type=CatalogSourceConfig
`

Comment 1 Aravindh Puthiyaparambil 2019-02-19 19:08:43 UTC
(1. logs before the marketplace crash)
The scenario here is the marketplace-operator pod crashed. There is nothing in the logs to indicate why this crash happened. 

(2.the logs for the first time reload of marketplace)
Given that the marketplace-operator pod is part of a deployment, another instance is launched again. "User \"system:serviceaccount:openshift-marketplace:marketplace-operator\" cannot get resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the cluster scope" and "Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:openshift-marketplace:marketplace-operator" cannot list resource "secrets" in API group "" in the namespace "openshift-marketplace" indicates that the "ClusterRole" or "ClusterRoleBindings" for the "marketplace-operator" have disappeared. 

One theory I have is that whatever entity that deleted "ClusterRole" or "ClusterRoleBinding", also deleted the Deployment. Then CVO recreated the Deployment first before recreating the "ClusterRole" or "ClusterRoleBinding"

(3. the logs for the second time reload of marketplace)
During this reload it looks like the "ClusterRole" or "ClusterRoleBinding" for the "marketplace-operator" has been created again, allowing it to successfully come up and recreate the resources required.

So we need to figure out:
1. Why did the "marketplace-operator" crash in the first place?
2. Why did the "ClusterRole" or "ClusterRoleBindings" for the "marketplace-operator" disappear?


As a side note, please be aware that the `CatalogSourceConfigs` and it child resources, associated with "OperatorSources" will be deleted and recreated to sync with Quay on very "marketplace-operator" restart. We plan to fix this bug soon but that is not related to this issue.

Comment 2 Fan Jia 2019-02-20 06:15:55 UTC
All the resources (olm's packageserver, marketplace , all the pods) will remain stable after stopping the cluster-version-operator.

Comment 3 Fan Jia 2019-02-20 09:25:58 UTC
The olm also has the same situation : lose resource like packageserver & serviceaccount ( https://bugzilla.redhat.com/show_bug.cgi?id=1678606 )

Comment 4 Aravindh Puthiyaparambil 2019-02-20 21:04:17 UTC

*** This bug has been marked as a duplicate of bug 1679309 ***


Note You need to log in before you can comment on or make changes to this bug.