Bug 1732580 - CatalogSources should be permissive to errors in manifests
Summary: CatalogSources should be permissive to errors in manifests
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.1.z
Assignee: Evan Cordell
QA Contact: Bruno Andrade
URL:
Whiteboard:
Depends On: 1732579
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-23 19:18 UTC by Evan Cordell
Modified: 2019-11-15 06:28 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-25 07:27:53 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github operator-framework operator-registry pull 81 'None' closed [release-4.1] Bug 1732580: Best-effort loading 2020-01-27 12:58:04 UTC
Red Hat Product Errata RHBA-2019:2820 None None None 2019-09-25 07:28:02 UTC

Description Evan Cordell 2019-07-23 19:18:16 UTC
Description of problem:

Operator-Registry is a component of OLM that serves operator metadata for a cluster. These servers get built at runtime by the marketplace-operator, which reads from a remote appregistry store and parses them for local use. Any number of issues can arise with the parsing of these manifests which causes the pod serving this data to fail with an error message.

Instead of failing, operator-registry should be permissive in the parsing of manifests, so that an entire catalog isn't out of commission if there are errors in the source data.


How reproducible:

Always


Steps to Reproduce:

(as an example)
1. Push manifests to an appregistry repo (like community-operators) that has a ClusterServiceVersion with a "replaces" field set to "thisWillFail"
2. Create an OperatorSource pointing to this appregistry repo
3. Note the failing pod in the cluster.

Actual results:

The CatalogSource pod never becomes healthy.


Expected results:

The CatalogSource loads, with bad manifests removed.

Comment 4 Bruno Andrade 2019-09-06 03:39:15 UTC
Verify failed, 

Actual results:
The CatalogSource pod becomes healthy, it's possible to see the package of this catalog source, but it not becomes healthy for OLM checks as shown on step 5.

Cluster version: 4.1.0-0.nightly-2019-09-05-144858
OLM Version:
oc exec catalog-operator-6c69f84c96-xhblp -n openshift-operator-lifecycle-manager -- olm -version
OLM version: 0.9.0
git commit: 23cb7ef

Steps used to reproduce:

1) Create a CSV in operator-registry that has wrong definitions like a replacement that does not exists, it will be needed to set the registry builder as permissive:

time="2019-09-06T02:43:47Z" level=info msg=directory dir=manifests file=etcd load=package
time="2019-09-06T02:43:47Z" level=warning msg="permissive mode enabled" error="error loading manifests from directory: error loading package into db: etcdoperator.v0.9.6 specifies replacement that couldn't be found"
Successfully built 3ba4d6da16bf
Successfully tagged quay.io/bandrade/etcd-operator:bug-1732580-2

2) Create an OperatorSource pointing to this appregistry repo

oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: etcd-bug-operator
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/bandrade/etcd-operator:bug-1732580-2
  displayName: ETCD Bug Operators
  publisher: bandrade
EOF

3) Check if the operatorsource pod is healthy:

oc get pods -n openshift-marketplace
NAME                                    READY   STATUS    RESTARTS   AGE
certified-operators-79699cb769-n4wbb    1/1     Running   0          7h43m
community-operators-5cbcbbb774-r7wnl    1/1     Running   0          7h43m
etcd-bug-operator-fwnr6                 1/1     Running   0          4m20s
marketplace-operator-747ffdcb4c-w7cd7   1/1     Running   0          7h44m
redhat-operators-7d79bb998-qdqnx        1/1     Running   0          7h43m

oc logs -f etcd-bug-operator-fwnr6 -n openshift-marketplace
time="2019-09-06T02:50:32Z" level=info msg="serving registry" database=bundles.db port=5005

4) Check if the packagemanifest is available:

oc get packagemanifest -n openshift-marketplace | grep "ETCD Bug Operators"
etcd-bz                                     ETCD Bug Operators    6h6m


oc get packagemanifest etcd-bz -n openshift-marketplace -o yaml 
apiVersion: packages.operators.coreos.com/v1
kind: PackageManifest
metadata:
  creationTimestamp: "2019-09-06T03:20:29Z"
  labels:
    catalog: etcd-bug-operator
    catalog-namespace: openshift-marketplace
    provider: CNCF
    provider-url: ""
  name: etcd-bz
  namespace: openshift-marketplace
  selfLink: /apis/packages.operators.coreos.com/v1/namespaces/openshift-marketplace/packagemanifests/etcd-bz
spec: {}
status:
  catalogSource: etcd-bug-operator
  catalogSourceDisplayName: ETCD Bug Operators
  catalogSourceNamespace: openshift-marketplace
  catalogSourcePublisher: bandrade
  channels:
  - currentCSV: etcdoperator.v0.9.4-clusterwide
    currentCSVDesc:
      annotations:
        alm-examples: |
          [
            {
              "apiVersion": "etcd.database.coreos.com/v1beta2",
              "kind": "EtcdCluster",
              "metadata": {
                "name": "example",
                "annotations": {
                  "etcd.database.coreos.com/scope": "clusterwide"
                }
              },
              "spec": {
                "size": 3,
                "version": "3.2.13"
              }
            },
            {
              "apiVersion": "etcd.database.coreos.com/v1beta2",
              "kind": "EtcdRestore",
              "metadata": {
                "name": "example-etcd-cluster-restore"
              },
              "spec": {
                "etcdCluster": {
                  "name": "example-etcd-cluster"
                },
                "backupStorageType": "S3",
                "s3": {
                  "path": "<full-s3-path>",
                  "awsSecret": "<aws-secret>"
                }
              }
            },
            {
              "apiVersion": "etcd.database.coreos.com/v1beta2",
              "kind": "EtcdBackup",
              "metadata": {
                "name": "example-etcd-cluster-backup"
              },
              "spec": {
                "etcdEndpoints": ["<etcd-cluster-endpoints>"],
                "storageType":"S3",
                "s3": {
                  "path": "<full-s3-path>",
                  "awsSecret": "<aws-secret>"
                }
              }
            }
          ]
        capabilities: Full Lifecycle
        categories: Database
        containerImage: quay.io/coreos/etcd-operator@sha256:66a37fd61a06a43969854ee6d3e21087a98b93838e284a6086b13917f96b0d9b
        createdAt: "2019-02-28T01:03:00Z"
        description: Create and maintain highly-available etcd clusters on Kubernetes
        repository: https://github.com/coreos/etcd-operator
        tectonic-visibility: ocs
      description: |
      [...]
        mediatype: image/png
      installModes:
      - supported: true
        type: OwnNamespace
      - supported: false
        type: SingleNamespace
      - supported: false
        type: MultiNamespace
      - supported: true
        type: AllNamespaces
      provider:
        name: CNCF
      version: 0.9.4-clusterwide
    name: clusterwide-alpha
  - currentCSV: etcdoperator.v0.9.6
    currentCSVDesc:
      annotations:
        alm-examples: |
          [
            {
              "apiVersion": "etcd.database.coreos.com/v1beta2",
              "kind": "EtcdCluster",
              "metadata": {
                "name": "example"
              },
              "spec": {
                "size": 3,
                "version": "3.2.13"
              }
            },
            {
              "apiVersion": "etcd.database.coreos.com/v1beta2",
              "kind": "EtcdRestore",
              "metadata": {
                "name": "example-etcd-cluster-restore"
              },
              "spec": {
                "etcdCluster": {
                  "name": "example-etcd-cluster"
                },
                "backupStorageType": "S3",
                "s3": {
                  "path": "<full-s3-path>",
                  "awsSecret": "<aws-secret>"
                }
              }
            },
            {
              "apiVersion": "etcd.database.coreos.com/v1beta2",
              "kind": "EtcdBackup",
              "metadata": {
                "name": "example-etcd-cluster-backup"
              },
              "spec": {
                "etcdEndpoints": ["<etcd-cluster-endpoints>"],
                "storageType":"S3",
                "s3": {
                  "path": "<full-s3-path>",
                  "awsSecret": "<aws-secret>"
                }
              }
            }
          ]
        capabilities: Full Lifecycle
        categories: Database
        containerImage: quay.io/coreos/etcd-operator@sha256:66a37fd61a06a43969854ee6d3e21087a98b93838e284a6086b13917f96b0d9b
        createdAt: "2019-02-28T01:03:00Z"
        description: Create and maintain highly-available etcd clusters on Kubernetes
        repository: https://github.com/coreos/etcd-operator
        tectonic-visibility: ocs
      description: [...]
        mediatype: image/png
      installModes:
      - supported: true
        type: OwnNamespace
      - supported: true
        type: SingleNamespace
      - supported: false
        type: MultiNamespace
      - supported: false
        type: AllNamespaces
      provider:
        name: CNCF
      version: 0.9.6
    name: singlenamespace-alpha
  defaultChannel: singlenamespace-alpha
  packageName: etcd-bz
  provider:
    name: CNCF


5) Try to create a subscription with that package

oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: etcd-bug
  namespace: openshift-operators
spec:
  source: etcd-bug-operator
  sourceNamespace: openshift-marketplace
  name: etcd-bz
  startingCSV: etcdoperator.v0.9.4-clusterwide
  channel: clusterwide-alpha
EOF

The subscription is not created because the catalogsource did not become healthy

oc logs -f catalog-operator-75f6b677f5-kxfqm -n openshift-operator-lifecycle-manager

time="2019-09-06T02:50:12Z" level=info msg="building connection to registry" currentSource="{etcd-bug-operator openshift-marketplace}" id=XtQMd source=etcd-bug-operator
time="2019-09-06T02:50:12Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{etcd-bug-operator openshift-marketplace}" id=XtQMd source=etcd-bug-operator
time="2019-09-06T02:50:34Z" level=info msg="building connection to registry" currentSource="{etcd-bug-operator openshift-marketplace}" id=1pJ9S source=etcd-bug-operator
time="2019-09-06T02:50:34Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{etcd-bug-operator openshift-marketplace}" id=1pJ9S source=etcd-bug-operator

Comment 6 Bruno Andrade 2019-09-12 18:25:13 UTC
My apologies, I used the wrong steps to reproduce this issue. I should push the bad manifest to appregistry instead of creating an operator-registry image. I will review again with the proper steps and should post the results soon.

Comment 7 Bruno Andrade 2019-09-13 15:54:39 UTC
The CatalogSource is healthy even with bad manifests, the related bad upgrade graph were removed properly. Considering that, marking as VERIFIED

Cluster Version: 4.1.0-0.nightly-2019-09-11-214406
OLM version: 0.9.0
git commit: d7e8c9d

Steps used to reproduce:

1) Create an Application Repository at https://quay.io/new/
Have your Quay token stored on QUAY_TOKEN variable, use this guide https://github.com/operator-framework/community-operators/blob/master/docs/testing-operators.md#quay-login

2) Clone the community-operators repo to your local.

$ git clone git@github.com:operator-framework/community-operators.git

3) Edit manifests/etcd/etcdoperator.v0.9.4.clusterserviceversion.yaml file at the following line from: replaces: etcdoperator.v0.9.2 to:  replaces: thisWillFail

4) Add the following environment variables according to your Quay namespace:

export OPERATOR_DIR=etcd/
export QUAY_NAMESPACE=bandrade
export PACKAGE_NAME=etcd
export PACKAGE_VERSION=1.0.0
export TOKEN=$QUAY_TOKEN

5) Push the operator manifests to quay appregistry, as below:

operator-courier push "$OPERATOR_DIR" "$QUAY_NAMESPACE" "$PACKAGE_NAME" "$PACKAGE_VERSION" "$TOKEN"

6) Create the CatalogSource, change the registryNamespace to the one that you're using

oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1
kind: OperatorSource
metadata:
  name: catalogsource-test
  namespace: openshift-marketplace
spec:
  endpoint: https://quay.io/cnr
  registryNamespace: bandrade
  type: appregistry
  displayName: Custom
EOF

6) Check OperatorSource health:

oc get operatorsource -n openshift-marketplace
NAME                  TYPE          ENDPOINT              REGISTRY              DISPLAYNAME           PUBLISHER   STATUS      MESSAGE                                       AGE
catalogsource-test    appregistry   https://quay.io/cnr   bandrade              Custom                            Succeeded   The object has been successfully reconciled   11s

oc get packagemanifest | grep "Custom"
etcd                                        Custom                48s


7) Create a namespace:
oc create ns test-operators

8) Create the Operator Group

oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: test-operators-og
  namespace: test-operators
spec:
  targetNamespaces:
  - test-operators
EOF

9) Create the CatalogSourceConfig
oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1
kind: CatalogSourceConfig
metadata:
  name: installed-custom-test-operators
  namespace: openshift-marketplace
spec:
  csDisplayName: Custom Operators
  csPublisher: Custom
  packages: etcd
  targetNamespace: test-operators
EOF

Create the subscription, as below:
oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  labels:
    csc-owner-name: installed-custom-test-operators
    csc-owner-namespace: openshift-marketplace
  name: etcd
  namespace: test-operators
spec:
  channel: singlenamespace-alpha
  name: etcd
  source: installed-custom-test-operators
  sourceNamespace: test-operators
EOF


10) Check the csv status.
oc get csv -n test-operators
NAME                  DISPLAY   VERSION   REPLACES       PHASE
etcdoperator.v0.9.4   etcd      0.9.4     thisWillFail   Succeeded

Comment 9 errata-xmlrpc 2019-09-25 07:27:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2820


Note You need to log in before you can comment on or make changes to this bug.