Bug 1817833
Summary: | Catalog-operator crashed when a CatalogSource object doesn't have the `address` and `image` fields | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jian Zhang <jiazha> | |
Component: | OLM | Assignee: | Nick Hale <nhale> | |
OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | nhale, pruan, schoudha, tflannag, vlaad, wking | |
Version: | 4.3.z | |||
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: Invalid CatalogSource configurations were causing a nil-pointer exception and a panic.
Consequence: The catalog-operator pod would crash every time an invalid CatalogSource was reconciled.
Fix: Add runtime nil checks and CatalogSource validation.
Result: Invalid CatalogSources are given a representative condition, and the catalog-operator pod no longer crashes.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1818850 (view as bug list) | Environment: | ||
Last Closed: | 2020-08-04 18:07:12 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1818850 |
Description
Jian Zhang
2020-03-27 04:15:31 UTC
CatalogSource: apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2020-03-26T10:03:27Z" generation: 1 labels: olm-visibility: hidden openshift-marketplace: "true" opsrc-datastore: "true" opsrc-owner-name: qe-app-registry opsrc-owner-namespace: openshift-marketplace name: qe-app-registry namespace: openshift-marketplace resourceVersion: "238817" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-marketplace/catalogsources/qe-app-registry uid: 006e2ae6-3146-4996-920c-43d810c721e4 spec: address: qe-app-registry.openshift-marketplace.svc:50051 icon: base64data: "" mediatype: "" sourceType: grpc status: connectionState: address: qe-app-registry.openshift-marketplace.svc:50051 lastConnect: "2020-03-26T19:02:31Z" lastObservedState: READY registryService: createdAt: "2020-03-26T10:03:27Z" protocol: grpc I see a different error in the catalog pod: time="2020-03-27T14:02:04Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/local-storage/subscriptions/local-storage-operator panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x15a4748] goroutine 307 [running]: github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).healthy(0xc000585540, 0xc000ffc3f0, 0xc000ffc1f8, 0xc00347a0e0, 0x0) /build/pkg/controller/operators/catalog/subscription/reconciler.go:185 +0x48 github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).health(0xc000585540, 0xc0002fc560, 0xc000ffc3f0, 0xc0002fc780, 0x0, 0x0) /build/pkg/controller/operators/catalog/subscription/reconciler.go:159 +0x39 github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).catalogHealth(0xc000585540, 0xc0000c5e50, 0xd, 0x1ca2fa0, 0xc00094d970, 0x1529800700c701, 0xc0033f70e8, 0xc0033f7060) /build/pkg/controller/operators/catalog/subscription/reconciler.go:137 +0x2ba github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).Reconcile(0xc000585540, 0x1c729a0, 0xc00043c140, 0x7f5ee5367630, 0xc00094d950, 0x7f5ee5367630, 0xc00094d950, 0x0, 0x0) /build/pkg/controller/operators/catalog/subscription/reconciler.go:82 +0x21c github.com/operator-framework/operator-lifecycle-manager/pkg/lib/kubestate.ReconcilerChain.Reconcile(0xc00052cfc0, 0x3, 0x4, 0x1c729a0, 0xc00043c140, 0x7f5ee5367250, 0xc00094d910, 0x0, 0x0, 0x70763750545a6742, ...) /build/pkg/lib/kubestate/kubestate.go:128 +0xa5 github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*subscriptionSyncer).Sync(0xc0004518f0, 0x1c729a0, 0xc00043c140, 0x1c4e620, 0xc0004b6ba0, 0xc003811c01, 0x0) /build/pkg/controller/operators/catalog/subscription/syncer.go:75 +0x596 github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*QueueInformer).Sync(...) /build/pkg/lib/queueinformer/queueinformer.go:36 github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).processNextWorkItem(0xc0000d73f0, 0x1c729a0, 0xc00043c140, 0xc00054bf20, 0x45c900) /build/pkg/lib/queueinformer/queueinformer_operator.go:287 +0x32b github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).worker(0xc0000d73f0, 0x1c729a0, 0xc00043c140, 0xc00054bf20) /build/pkg/lib/queueinformer/queueinformer_operator.go:231 +0x49 created by github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).start /build/pkg/lib/queueinformer/queueinformer_operator.go:221 +0x455 This is hitting an edge case in the catalog operator, triggered by the following catalogsource in the test cluster: ``` apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"operators.coreos.com/v1alpha1","kind":"CatalogSource","metadata":{"annotations":{},"name":"installed-redhat-metering-operators-openshift-metering","namespace":"openshift-marketplace"},"spec":{"packages":"metering-ocp","sourceType":"grpc","targetNamespace":"openshift-metering"}} creationTimestamp: "2020-03-26T19:18:35Z" generation: 1 name: installed-redhat-metering-operators-openshift-metering namespace: openshift-marketplace resourceVersion: "245879" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-marketplace/catalogsources/installed-redhat-metering-operators-openshift-metering uid: 8d2b53d3-b49e-45d8-99f2-abcb6eab1936 spec: sourceType: grpc status: message: no reconciler for source type grpc reason: RegistryServerError ``` The problematic section is: ``` spec: sourceType: grpc ``` if sourceType is grpc, we expect to also have either an `address` field or an `image` field. There is a bug, in that the catalog operator should never crash. We will address the underlying issue and backport to 4.3 However, it is highly unlikely that a user will run into this. Most users are not creating catalog sources directly (instead they are created via OperatorSource), or via instructions for disconnected that clearly indicate that the `image` field is required. removing regression based on previous comment. Hi Evan/Tim, It looks like this PR https://github.com/operator-framework/operator-metering/pull/1143 was the culprit. Should we just revert this PR to unblock this issue? Thanks Tim for reverting it https://github.com/operator-framework/operator-metering/pull/1155 This payload is available. mac:~ jianzhang$ oc adm release info registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-04-01-045338 --commits |grep lifecycle operator-lifecycle-manager https://github.com/operator-framework/operator-lifecycle-manager ae9f66d08ee9ccc9cb7a7bf2b8d7adc1ef462142 1, Create an OCP 4.5 within the fixed PR. mac:~ jianzhang$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-04-01-045338 True False 47m Cluster version is 4.5.0-0.nightly-2020-04-01-045338 mac:~ jianzhang$ oc exec catalog-operator-59c94cf4c9-b4kms -- olm --version OLM version: 0.14.2 git commit: ae9f66d08ee9ccc9cb7a7bf2b8d7adc1ef462142 2, Check the default CatalogSource, OLM pods, they worked well. mac:~ jianzhang$ oc get pods NAME READY STATUS RESTARTS AGE catalog-operator-59c94cf4c9-b4kms 1/1 Running 0 73m olm-operator-8769584b8-kgtg8 1/1 Running 0 73m packageserver-5fb8cc9974-9gg7j 1/1 Running 0 68m packageserver-5fb8cc9974-p4hmz 1/1 Running 0 68m mac:~ jianzhang$ oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE certified-operators-8bcf9b7fd-s5krq 1/1 Running 0 64m community-operators-f764b9999-vcq6q 1/1 Running 0 64m marketplace-operator-899fd465f-8598b 1/1 Running 0 65m redhat-marketplace-777b89d698-kzhjc 1/1 Running 0 64m redhat-operators-794cd6b88c-s8z68 1/1 Running 0 64m mac:~ jianzhang$ oc get packagemanifest NAME CATALOG AGE event-streams-topic Community Operators 63m triggermesh Community Operators 63m ... 3, Create a CatalogSource object(grpc) without image and address. mac:~ jianzhang$ cat cs-1817833.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: bug-no-image namespace: openshift-marketplace spec: sourceType: grpc displayName: Jian Operators publisher: jian mac:~ jianzhang$ mac:~ jianzhang$ oc create -f cs-1817833.yaml catalogsource.operators.coreos.com/bug-no-image created mac:~ jianzhang$ oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE certified-operators-8bcf9b7fd-s5krq 1/1 Running 0 67m community-operators-f764b9999-vcq6q 1/1 Running 0 67m marketplace-operator-899fd465f-8598b 1/1 Running 0 68m redhat-marketplace-777b89d698-kzhjc 1/1 Running 0 67m redhat-operators-794cd6b88c-s8z68 1/1 Running 0 67m mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace NAME DISPLAY TYPE PUBLISHER AGE bug-no-image Jian Operators grpc jian 71s certified-operators Certified Operators grpc Red Hat 68m community-operators Community Operators grpc Red Hat 68m redhat-marketplace Red Hat Marketplace grpc Red Hat 68m redhat-operators Red Hat Operators grpc Red Hat 68m mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace bug-no-image -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2020-04-01T09:44:11Z" generation: 1 name: bug-no-image namespace: openshift-marketplace resourceVersion: "37750" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-marketplace/catalogsources/bug-no-image uid: fda0b201-e806-4ee8-b6a7-e458a6b0e94e spec: displayName: Jian Operators publisher: jian sourceType: grpc status: message: 'image and address unset: at least one must be set for sourcetype: grpc' reason: SpecInvalidError mac:~ jianzhang$ oc get pods NAME READY STATUS RESTARTS AGE catalog-operator-59c94cf4c9-b4kms 1/1 Running 0 81m olm-operator-8769584b8-kgtg8 1/1 Running 0 81m packageserver-5fb8cc9974-9gg7j 1/1 Running 0 76m packageserver-5fb8cc9974-p4hmz 1/1 Running 0 76m The OLM pods work well and the error message reported, looks good. 4, Install an operator, for example, etcd, it works well. mac:~ jianzhang$ oc get sub -n default NAME PACKAGE SOURCE CHANNEL etcd etcd community-operators singlenamespace-alpha mac:~ jianzhang$ oc get csv -n default NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.4 etcd 0.9.4 Succeeded mac:~ jianzhang$ oc get pods -n default NAME READY STATUS RESTARTS AGE etcd-operator-966558f8-w9pmm 3/3 Running 0 45s mac:~ jianzhang$ oc get pods -n default NAME READY STATUS RESTARTS AGE etcd-operator-966558f8-w9pmm 3/3 Running 0 58s mac:~ jianzhang$ oc get pods NAME READY STATUS RESTARTS AGE catalog-operator-59c94cf4c9-b4kms 1/1 Running 0 99m olm-operator-8769584b8-kgtg8 1/1 Running 0 99m packageserver-5fb8cc9974-9gg7j 1/1 Running 0 94m packageserver-5fb8cc9974-p4hmz 1/1 Running 0 94m 5, Create a CatalogSource object(configmap) without image and address. mac:~ jianzhang$ oc create -f cs-1817833-configmap.yaml catalogsource.operators.coreos.com/bug-no-image-cm created mac:~ jianzhang$ cat cs-1817833-configmap.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: bug-no-image-cm namespace: openshift-marketplace spec: sourceType: configmap displayName: Jian Operators publisher: jian mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace bug-no-image-cm -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2020-04-01T10:07:25Z" generation: 1 name: bug-no-image-cm namespace: openshift-marketplace resourceVersion: "45041" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-marketplace/catalogsources/bug-no-image-cm uid: f3a8f51f-b527-4787-9a5e-0429faca4527 spec: displayName: Jian Operators publisher: jian sourceType: configmap status: message: 'configmap name unset: must be set for sourcetype: configmap' reason: SpecInvalidError mac:~ jianzhang$ oc get pods NAME READY STATUS RESTARTS AGE catalog-operator-59c94cf4c9-b4kms 1/1 Running 0 103m olm-operator-8769584b8-kgtg8 1/1 Running 0 103m packageserver-5fb8cc9974-9gg7j 1/1 Running 0 98m packageserver-5fb8cc9974-p4hmz 1/1 Running 0 99m OLM pods work well and the error message reported, looks good. 6, Create a CatalogSource object without image, address, and sourceType. mac:~ jianzhang$ cat cs-1817833-empty.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: bug-empty namespace: openshift-marketplace spec: displayName: Jian Operators publisher: jian mac:~ jianzhang$ oc create -f cs-1817833-empty.yaml The CatalogSource "bug-empty" is invalid: spec.sourceType: Required value LGTM, verify it. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |