Bug 1868770 - catalogSource named "redhat-operators" deleted in a disconnected cluster
Summary: catalogSource named "redhat-operators" deleted in a disconnected cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Anik
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks: 1895952
TreeView+ depends on / blocked
 
Reported: 2020-08-13 19:05 UTC by Asher Shoshan
Modified: 2021-02-24 15:16 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: https://github.com/operator-framework/operator-marketplace/pull/336 Consequence: Pre 4.6, when marketplace operator had the OperatorSource CRD, customers with a disconnected cluster could disabled the default OperatorSources in the openshift-marketplace namespace and create CatalogSources with the same name as the default sources. Since 4.6, CatalogSources were promoted to first class citizens for marketplace after the OperatorSource CRD was deprecated. As a result openshift-marketplace had default CatalogSource that were managed by the OperatorHub API. In a disconnected environment, when a cluster admin tried to create a CatalogSource with the same name as that of the default sources after disabling the CatalogSources via the OperatorHub API, the OperatorHub API was removing the custom CatalogSource. If the CatalogSources was not disabled via the OperatorHub API and changes were made to the default CatalogSource (for example changing spec.image to point to an internal registry for the disconnected environment), the spec was being restored to the default spec. Fix: Allow for creation/update/deletion of CatalogSources with same name as the default sources if they are disabled via the OperatorHub. Result: Once the cluster is ready in the disconnected environment, disabling a default CatalogSource via the OperatorHub API will allow for the creation of a new CatalogSource with the same name as that of the default CatalogSource. The CatalogSource can be updated/deleted without OperatorHub API intervention too. If the default source is re-enabled via the OperatorHub API, the default spec will be restored for the CatalogSource.
Clone Of:
Environment:
Last Closed: 2021-02-24 15:15:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-marketplace pull 359 0 None closed Bug 1868770: Allow catsrc with default catsrc name in disconnected env 2021-02-18 21:32:50 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:16:17 UTC

Description Asher Shoshan 2020-08-13 19:05:42 UTC
Description of problem:

When deploying a disconnected/restricted OCP 4.6, and operatorhub default is disabled and catalogSource.yaml of "redhat-operators" is in installer manifests then this catalogSOurce is deleted sometime towards end of installation. 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. start OCP 4.6 disconnected cluster
2. generate manifests  (openshift-installer generate manifests)
3. put following yaml's in <install-dir>/openshift
---
apiVersion: config.openshift.io/v1
kind: OperatorHub
metadata:
  name: cluster
spec:
  disableAllDefaultSources: true
---
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operators
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: <local-registry>/olm/redhat-operators:latest  
  displayName: redhat-operators-disconnected
  publisher: Red Hat

4. deploy cluster (openshift-installer create cluster)

Actual results:
catalogSource deleted


Expected results:
to stay as is (as in OCP 4.5)


Additional info:

Comment 1 Evan Cordell 2020-08-17 13:43:30 UTC
This is likely best solved with documentation, and potentially a change to `oc` to mirror the catalog image and suggest the manifests to use.

In 4.6, it would be simplest to not disable the default CatalogSources, and instead set an ImageContentSourcePolicy entry remapping the default `registry.redhat.io` catalog image to your internal `<local-registry>/olm/redhat-operators:latest`

Comment 2 Asher Shoshan 2020-08-18 06:40:35 UTC
(In reply to Evan Cordell from comment #1)
> This is likely best solved with documentation, and potentially a change to
> `oc` to mirror the catalog image and suggest the manifests to use.
> 
> In 4.6, it would be simplest to not disable the default CatalogSources, and
> instead set an ImageContentSourcePolicy entry remapping the default
> `registry.redhat.io` catalog image to your internal
> `<local-registry>/olm/redhat-operators:latest`

pls note, if I create the catalogSource post deployment (of disconnected cluster) - then it won't be deleted.

ICSP (in a disconnected cluster) must be used anyway.. Not disabling the default --> this method only with full catalog. Pruning it, I won't be able to edit "redhat-operators", and change the image name. (until disabling the default).  right?

btw, why we are not doing this ICSP way (only without disabling the default) in 4.4, 4.5?

Comment 3 Kevin Rizza 2020-08-20 19:25:33 UTC
When you say this works post deployment, could you be more specific? Are you saying that you're applying this yaml during cluster boostrap?

Comment 4 Asher Shoshan 2020-08-24 06:46:48 UTC
Yes. I'm applying this during cluster bootstrap.  i.e: I put the catalogSource.yaml in <cluster-dir>/openshift (after openshift-install create manifests). then at some point this catalog is deleted.
After cluster deployment, I can re-apply the yaml, and this time catalog is not deleted

Comment 6 Kevin Rizza 2020-10-21 17:21:55 UTC
> btw, why we are not doing this ICSP way (only without disabling the default) in 4.4, 4.5?

This is because in 4.4 and 4.5 the default operators weren't backed by pregenerated catalog images at all -- they were pointing to an external endpoint that generated the catalog metadata at runtime.

I'm closing this bug as NOTABUG since the default workflow in 4.6+ doesn't require disabling these sources at all anymore. Docs should reflect this change in 4.6

Comment 19 Jian Zhang 2020-11-09 07:02:51 UTC
Cluster version is 4.7.0-0.nightly-2020-11-08-225909

[root@preserve-olm-env data]# oc exec marketplace-operator-574f95d4c5-mvnzw -- marketplace-operator --version
Marketplace source git commit: cc91a681f58587ec7748ec3c1d12142f7e7f8638
time="2020-11-09T03:55:59Z" level=info msg="Go Version: go1.15.2"
time="2020-11-09T03:55:59Z" level=info msg="Go OS/Arch: linux/amd64"
time="2020-11-09T03:55:59Z" level=info msg="operator-sdk Version: v0.8.0"

1, Disable the default CatalogSource resources.
[root@preserve-olm-env data]# oc patch operatorhub cluster -p '{"spec": {"disableAllDefaultSources": true}}' --type=merge
operatorhub.config.openshift.io/cluster patched

2, Create a custom CatalogSource with a default CatalogSource object name.
[root@preserve-olm-env data]# cat cs-redhat.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operators
  namespace: openshift-marketplace
spec:
  displayName: Red Hat Operators
  icon:
    base64data: ""
    mediatype: ""
  image: quay.io/openshift-qe-optional-operators/ocp4-index:latest
  priority: -100
  publisher: Jian
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 10m0s

[root@preserve-olm-env data]# oc get catalogsource
NAME               DISPLAY             TYPE   PUBLISHER   AGE
redhat-operators   Red Hat Operators   grpc   Jian        8s

[root@preserve-olm-env data]# oc get pods
NAME                                    READY   STATUS              RESTARTS   AGE
marketplace-operator-574f95d4c5-mvnzw   1/1     Running             0          3h37m
redhat-operators-jpcl9                  0/1     ContainerCreating   0          3s

This custom CatalogSource (Publisher is Jian) can be created well.

3, After the default OperatorHub reenabled, the default CatalogSource objects are back. That custom catalog source object is overwide.

[root@preserve-olm-env data]# oc patch operatorhub cluster -p '{"spec": {"disableAllDefaultSources": false}}' --type=merge
operatorhub.config.openshift.io/cluster patched
[root@preserve-olm-env data]# oc get catalogsource
NAME                  DISPLAY               TYPE   PUBLISHER   AGE
certified-operators   Certified Operators   grpc   Red Hat     3s
community-operators   Community Operators   grpc   Red Hat     3s
redhat-marketplace    Red Hat Marketplace   grpc   Red Hat     3s
redhat-operators      Red Hat Operators     grpc   Red Hat     2m14s
[root@preserve-olm-env data]# oc get pods
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-5gtnx               0/1     Running   0          10s
community-operators-jl4mz               0/1     Running   0          10s
marketplace-operator-574f95d4c5-gd86v   1/1     Running   0          50m
redhat-marketplace-f55bx                1/1     Running   0          10s
redhat-operators-mxdgd                  0/1     Running   0          10s

LGTM, but just a log output format here: https://github.com/operator-framework/operator-marketplace/pull/361
Verify this bug first.

Comment 22 errata-xmlrpc 2021-02-24 15:15:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.