Description of problem: Operator-Registry is a component of OLM that serves operator metadata for a cluster. These servers get built at runtime by the marketplace-operator, which reads from a remote appregistry store and parses them for local use. Any number of issues can arise with the parsing of these manifests which causes the pod serving this data to fail with an error message. Instead of failing, operator-registry should be permissive in the parsing of manifests, so that an entire catalog isn't out of commission if there are errors in the source data. How reproducible: Always Steps to Reproduce: (as an example) 1. Push manifests to an appregistry repo (like community-operators) that has a ClusterServiceVersion with a "replaces" field set to "thisWillFail" 2. Create an OperatorSource pointing to this appregistry repo 3. Note the failing pod in the cluster. Actual results: The CatalogSource pod never becomes healthy. Expected results: The CatalogSource loads, with bad manifests removed.
Verify failed, Actual results: The CatalogSource pod becomes healthy, it's possible to see the package of this catalog source, but it not becomes healthy for OLM checks as shown on step 5. Cluster version: 4.1.0-0.nightly-2019-09-05-144858 OLM Version: oc exec catalog-operator-6c69f84c96-xhblp -n openshift-operator-lifecycle-manager -- olm -version OLM version: 0.9.0 git commit: 23cb7ef Steps used to reproduce: 1) Create a CSV in operator-registry that has wrong definitions like a replacement that does not exists, it will be needed to set the registry builder as permissive: time="2019-09-06T02:43:47Z" level=info msg=directory dir=manifests file=etcd load=package time="2019-09-06T02:43:47Z" level=warning msg="permissive mode enabled" error="error loading manifests from directory: error loading package into db: etcdoperator.v0.9.6 specifies replacement that couldn't be found" Successfully built 3ba4d6da16bf Successfully tagged quay.io/bandrade/etcd-operator:bug-1732580-2 2) Create an OperatorSource pointing to this appregistry repo oc apply -f - <<EOF apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: etcd-bug-operator namespace: openshift-marketplace spec: sourceType: grpc image: quay.io/bandrade/etcd-operator:bug-1732580-2 displayName: ETCD Bug Operators publisher: bandrade EOF 3) Check if the operatorsource pod is healthy: oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE certified-operators-79699cb769-n4wbb 1/1 Running 0 7h43m community-operators-5cbcbbb774-r7wnl 1/1 Running 0 7h43m etcd-bug-operator-fwnr6 1/1 Running 0 4m20s marketplace-operator-747ffdcb4c-w7cd7 1/1 Running 0 7h44m redhat-operators-7d79bb998-qdqnx 1/1 Running 0 7h43m oc logs -f etcd-bug-operator-fwnr6 -n openshift-marketplace time="2019-09-06T02:50:32Z" level=info msg="serving registry" database=bundles.db port=5005 4) Check if the packagemanifest is available: oc get packagemanifest -n openshift-marketplace | grep "ETCD Bug Operators" etcd-bz ETCD Bug Operators 6h6m oc get packagemanifest etcd-bz -n openshift-marketplace -o yaml apiVersion: packages.operators.coreos.com/v1 kind: PackageManifest metadata: creationTimestamp: "2019-09-06T03:20:29Z" labels: catalog: etcd-bug-operator catalog-namespace: openshift-marketplace provider: CNCF provider-url: "" name: etcd-bz namespace: openshift-marketplace selfLink: /apis/packages.operators.coreos.com/v1/namespaces/openshift-marketplace/packagemanifests/etcd-bz spec: {} status: catalogSource: etcd-bug-operator catalogSourceDisplayName: ETCD Bug Operators catalogSourceNamespace: openshift-marketplace catalogSourcePublisher: bandrade channels: - currentCSV: etcdoperator.v0.9.4-clusterwide currentCSVDesc: annotations: alm-examples: | [ { "apiVersion": "etcd.database.coreos.com/v1beta2", "kind": "EtcdCluster", "metadata": { "name": "example", "annotations": { "etcd.database.coreos.com/scope": "clusterwide" } }, "spec": { "size": 3, "version": "3.2.13" } }, { "apiVersion": "etcd.database.coreos.com/v1beta2", "kind": "EtcdRestore", "metadata": { "name": "example-etcd-cluster-restore" }, "spec": { "etcdCluster": { "name": "example-etcd-cluster" }, "backupStorageType": "S3", "s3": { "path": "<full-s3-path>", "awsSecret": "<aws-secret>" } } }, { "apiVersion": "etcd.database.coreos.com/v1beta2", "kind": "EtcdBackup", "metadata": { "name": "example-etcd-cluster-backup" }, "spec": { "etcdEndpoints": ["<etcd-cluster-endpoints>"], "storageType":"S3", "s3": { "path": "<full-s3-path>", "awsSecret": "<aws-secret>" } } } ] capabilities: Full Lifecycle categories: Database containerImage: quay.io/coreos/etcd-operator@sha256:66a37fd61a06a43969854ee6d3e21087a98b93838e284a6086b13917f96b0d9b createdAt: "2019-02-28T01:03:00Z" description: Create and maintain highly-available etcd clusters on Kubernetes repository: https://github.com/coreos/etcd-operator tectonic-visibility: ocs description: | [...] mediatype: image/png installModes: - supported: true type: OwnNamespace - supported: false type: SingleNamespace - supported: false type: MultiNamespace - supported: true type: AllNamespaces provider: name: CNCF version: 0.9.4-clusterwide name: clusterwide-alpha - currentCSV: etcdoperator.v0.9.6 currentCSVDesc: annotations: alm-examples: | [ { "apiVersion": "etcd.database.coreos.com/v1beta2", "kind": "EtcdCluster", "metadata": { "name": "example" }, "spec": { "size": 3, "version": "3.2.13" } }, { "apiVersion": "etcd.database.coreos.com/v1beta2", "kind": "EtcdRestore", "metadata": { "name": "example-etcd-cluster-restore" }, "spec": { "etcdCluster": { "name": "example-etcd-cluster" }, "backupStorageType": "S3", "s3": { "path": "<full-s3-path>", "awsSecret": "<aws-secret>" } } }, { "apiVersion": "etcd.database.coreos.com/v1beta2", "kind": "EtcdBackup", "metadata": { "name": "example-etcd-cluster-backup" }, "spec": { "etcdEndpoints": ["<etcd-cluster-endpoints>"], "storageType":"S3", "s3": { "path": "<full-s3-path>", "awsSecret": "<aws-secret>" } } } ] capabilities: Full Lifecycle categories: Database containerImage: quay.io/coreos/etcd-operator@sha256:66a37fd61a06a43969854ee6d3e21087a98b93838e284a6086b13917f96b0d9b createdAt: "2019-02-28T01:03:00Z" description: Create and maintain highly-available etcd clusters on Kubernetes repository: https://github.com/coreos/etcd-operator tectonic-visibility: ocs description: [...] mediatype: image/png installModes: - supported: true type: OwnNamespace - supported: true type: SingleNamespace - supported: false type: MultiNamespace - supported: false type: AllNamespaces provider: name: CNCF version: 0.9.6 name: singlenamespace-alpha defaultChannel: singlenamespace-alpha packageName: etcd-bz provider: name: CNCF 5) Try to create a subscription with that package oc apply -f - <<EOF apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: etcd-bug namespace: openshift-operators spec: source: etcd-bug-operator sourceNamespace: openshift-marketplace name: etcd-bz startingCSV: etcdoperator.v0.9.4-clusterwide channel: clusterwide-alpha EOF The subscription is not created because the catalogsource did not become healthy oc logs -f catalog-operator-75f6b677f5-kxfqm -n openshift-operator-lifecycle-manager time="2019-09-06T02:50:12Z" level=info msg="building connection to registry" currentSource="{etcd-bug-operator openshift-marketplace}" id=XtQMd source=etcd-bug-operator time="2019-09-06T02:50:12Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{etcd-bug-operator openshift-marketplace}" id=XtQMd source=etcd-bug-operator time="2019-09-06T02:50:34Z" level=info msg="building connection to registry" currentSource="{etcd-bug-operator openshift-marketplace}" id=1pJ9S source=etcd-bug-operator time="2019-09-06T02:50:34Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{etcd-bug-operator openshift-marketplace}" id=1pJ9S source=etcd-bug-operator
My apologies, I used the wrong steps to reproduce this issue. I should push the bad manifest to appregistry instead of creating an operator-registry image. I will review again with the proper steps and should post the results soon.
The CatalogSource is healthy even with bad manifests, the related bad upgrade graph were removed properly. Considering that, marking as VERIFIED Cluster Version: 4.1.0-0.nightly-2019-09-11-214406 OLM version: 0.9.0 git commit: d7e8c9d Steps used to reproduce: 1) Create an Application Repository at https://quay.io/new/ Have your Quay token stored on QUAY_TOKEN variable, use this guide https://github.com/operator-framework/community-operators/blob/master/docs/testing-operators.md#quay-login 2) Clone the community-operators repo to your local. $ git clone git:operator-framework/community-operators.git 3) Edit manifests/etcd/etcdoperator.v0.9.4.clusterserviceversion.yaml file at the following line from: replaces: etcdoperator.v0.9.2 to: replaces: thisWillFail 4) Add the following environment variables according to your Quay namespace: export OPERATOR_DIR=etcd/ export QUAY_NAMESPACE=bandrade export PACKAGE_NAME=etcd export PACKAGE_VERSION=1.0.0 export TOKEN=$QUAY_TOKEN 5) Push the operator manifests to quay appregistry, as below: operator-courier push "$OPERATOR_DIR" "$QUAY_NAMESPACE" "$PACKAGE_NAME" "$PACKAGE_VERSION" "$TOKEN" 6) Create the CatalogSource, change the registryNamespace to the one that you're using oc apply -f - <<EOF apiVersion: operators.coreos.com/v1 kind: OperatorSource metadata: name: catalogsource-test namespace: openshift-marketplace spec: endpoint: https://quay.io/cnr registryNamespace: bandrade type: appregistry displayName: Custom EOF 6) Check OperatorSource health: oc get operatorsource -n openshift-marketplace NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGE AGE catalogsource-test appregistry https://quay.io/cnr bandrade Custom Succeeded The object has been successfully reconciled 11s oc get packagemanifest | grep "Custom" etcd Custom 48s 7) Create a namespace: oc create ns test-operators 8) Create the Operator Group oc apply -f - <<EOF apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: test-operators-og namespace: test-operators spec: targetNamespaces: - test-operators EOF 9) Create the CatalogSourceConfig oc apply -f - <<EOF apiVersion: operators.coreos.com/v1 kind: CatalogSourceConfig metadata: name: installed-custom-test-operators namespace: openshift-marketplace spec: csDisplayName: Custom Operators csPublisher: Custom packages: etcd targetNamespace: test-operators EOF Create the subscription, as below: oc apply -f - <<EOF apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: labels: csc-owner-name: installed-custom-test-operators csc-owner-namespace: openshift-marketplace name: etcd namespace: test-operators spec: channel: singlenamespace-alpha name: etcd source: installed-custom-test-operators sourceNamespace: test-operators EOF 10) Check the csv status. oc get csv -n test-operators NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.4 etcd 0.9.4 thisWillFail Succeeded
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2820