Bug 1943937 - CatalogSource incorrect parsing validation
Summary: CatalogSource incorrect parsing validation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.7
Hardware: All
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Anik
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-28 15:07 UTC by David Hernández Fernández
Modified: 2022-10-11 09:31 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2073748 (view as bug list)
Environment:
Last Closed: 2022-08-10 10:36:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github https://github.com/openshift operator-framework-olm pull 285 0 None None None 2022-04-14 09:20:00 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:36:37 UTC

Description David Hernández Fernández 2021-03-28 15:07:28 UTC
Description of problem: Parsing of catalogsource values is not done.

Version-Release number of selected component (if applicable): Openshift 4.7

How reproducible:

Create a catalogsource with the following yaml, example of incorrect input: "interval: 45mError code"

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: ibm-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: ibm-operator-catalog 
  publisher: IBM Content
  sourceType: grpc
  image: docker.io/ibmcom/ibm-operator-catalog
  updateStrategy:
    registryPoll:
      interval: 45mError code

The catalog source gets created but the marketplace operator cannot handle the invalid interval and so the logs are full with this error:

reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1alpha1.CatalogSource: v1alpha1.CatalogSourceList.Items: []v1alpha1.CatalogSource: v1alpha1.CatalogSource.v1alpha1.CatalogSource.Spec: v1alpha1.CatalogSourceSpec.UpdateStrategy: v1alpha1.UpdateStrategy.RegistryPoll: v1alpha1.RegistryPoll.Interval: unmarshalerDecoder: time: unknown unit "mCopy code" in duration "45mError code", error found in #10 byte of ...|Copy code"}}}},{"api|..., bigger context ...|rategy":{"registryPoll":{"interval":"45mCopy code"}}}},{"apiVersion":"operators.coreos.com/v1alpha1"|...

Actual results: No parsing. All of the catalog source pods are constantly being recreated as soon as they become ready which means no operators are available in the operator hub.

There should be a validating webhook on catalogsources to catch this sort of issue, to prevent cluster errors. 

Expected results: Parsing of expected values.


Additional info:

Comment 1 Kevin Rizza 2021-03-29 14:04:02 UTC
I don't think that adding a validating webhook is something that would be accepted as a backportable patch, but it certainly seems like other catalog sources shouldn't be forcefully recreated in this case. It's probably reasonable for us to backport a fix for that issue and include something on the status, and try to come up with some more holistic upfront validation in a future release.

Comment 2 David Hernández Fernández 2021-04-01 17:29:42 UTC
I think the focus at least should be that this should not affect other catalogue sources in case any syntax issue is done. Let me know if you need anything else.

Comment 9 Kevin Rizza 2022-01-05 19:08:50 UTC
Change is open, pull requests still need review

Comment 13 Per da Silva 2022-04-10 07:00:30 UTC
PR has been merged - ready for QA

Comment 14 Jian Zhang 2022-04-11 07:41:32 UTC
1, Create an OCP cluster that contains the fixed PR: https://github.com/openshift/operator-framework-olm/pull/279

mac:~ jianzhang$ oc adm release info registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-04-11-055105 -a .dockerconfigjson --commits|grep olm
  operator-lifecycle-manager                     https://github.com/openshift/operator-framework-olm                         ed7cf0db6fe1f5e91990ca2c02593ba7d1e3cc2e
  operator-registry                              https://github.com/openshift/operator-framework-olm                         ed7cf0db6fe1f5e91990ca2c02593ba7d1e3cc2e

mac:~ jianzhang$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-11-055105   True        False         42m     Cluster version is 4.11.0-0.nightly-2022-04-11-055105

2, Create a CatalogSource that contains the syntax issue, as follows,
mac:~ jianzhang$ cat cs-issue.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: ibm-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: ibm-operator-catalog 
  publisher: IBM Content
  sourceType: grpc
  image: docker.io/ibmcom/ibm-operator-catalog
  updateStrategy:
    registryPoll:
      interval: 45mError code
mac:~ jianzhang$ oc create -f cs-issue.yaml 
catalogsource.operators.coreos.com/ibm-operator-catalog created

3, Check the marketplace-operator logs and the CatalogSource status.
I can find some errors in the marketplace-operator logs, but no reasons/messages on that issued catalogsource, as follows,

W0411 07:02:07.511359       1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1alpha1.CatalogSource: time: unknown unit "mError code" in duration "45mError code"
E0411 07:02:07.511403       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1alpha1.CatalogSource: failed to list *v1alpha1.CatalogSource: time: unknown unit "mError code" in duration "45mError code"


mac:~ jianzhang$  oc get catalogsource ibm-operator-catalog -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  creationTimestamp: "2022-04-11T07:01:30Z"
  generation: 1
  name: ibm-operator-catalog
  namespace: openshift-marketplace
  resourceVersion: "41957"
  uid: 8c217d55-44ca-45e2-829a-da0f90c2d9a4
spec:
  displayName: ibm-operator-catalog
  image: docker.io/ibmcom/ibm-operator-catalog
  publisher: IBM Content
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 45mError code
status:
  connectionState:
    address: ibm-operator-catalog.openshift-marketplace.svc:50051
    lastConnect: "2022-04-11T07:01:55Z"
    lastObservedState: READY
  latestImageRegistryPoll: "2022-04-11T07:19:47Z"
  registryService:
    createdAt: "2022-04-11T07:01:31Z"
    port: "50051"
    protocol: grpc
    serviceName: ibm-operator-catalog
    serviceNamespace: openshift-marketplace

Change the status to ASSIGNED.


PS: 
> I think the focus at least should be that this should not affect other catalogue sources in case any syntax issue is done.
> We should preferably return an error here, but the immediate fix would be to ensure that this doesn't cause problems for other catalogs on the cluster.

I also try to test this bug on a cluster without the fixed PR, but I couldn't reproduce it. The other catalogsource works well.

mac:~ jianzhang$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-08-205307   True        False         7h59m   Cluster version is 4.11.0-0.nightly-2022-04-08-205307

1, Create that issued catalogsource.
mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace
NAME                   DISPLAY                TYPE   PUBLISHER      AGE
certified-operators    Certified Operators    grpc   Red Hat        8h
community-operators    Community Operators    grpc   Red Hat        8h
ibm-operator-catalog   ibm-operator-catalog   grpc   IBM Content    67m
qe-app-registry        Production Operators   grpc   OpenShift QE   7h57m
redhat-marketplace     Red Hat Marketplace    grpc   Red Hat        8h
redhat-operators       Red Hat Operators      grpc   Red Hat        8h

mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace ibm-operator-catalog -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  creationTimestamp: "2022-04-11T06:30:17Z"
  generation: 1
  name: ibm-operator-catalog
  namespace: openshift-marketplace
  resourceVersion: "195259"
  uid: eb81a878-7bf3-459e-bd50-c70bc04fc179
spec:
  displayName: ibm-operator-catalog
  image: docker.io/ibmcom/ibm-operator-catalog
  publisher: IBM Content
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 45mError code
status:
  connectionState:
    address: ibm-operator-catalog.openshift-marketplace.svc:50051
    lastConnect: "2022-04-11T06:36:51Z"
    lastObservedState: READY
  latestImageRegistryPoll: "2022-04-11T07:24:49Z"
  registryService:
    createdAt: "2022-04-11T06:30:17Z"
    port: "50051"
    protocol: grpc
    serviceName: ibm-operator-catalog
    serviceNamespace: openshift-marketplace

mac:~ jianzhang$ oc get pods -n openshift-marketplace
NAME                                                              READY   STATUS      RESTARTS   AGE
02e3403ec6e01f1c5f1ed01afd671c795f8412982e73971271b39b1d16rh5tc   0/1     Completed   0          7h56m
725fb0713557581cb01780e1cdfbc0d7492ca604a54b9e773fb39be15ewdqdl   0/1     Completed   0          7h56m
certified-operators-9l2bc                                         1/1     Running     0          59s
community-operators-5vmwt                                         1/1     Running     0          8h
e8c9651078ae45ddb2807e3a07727d459b82d7def5572a7b7ccaae332b2lxqd   0/1     Completed   0          4m22s
ibm-operator-catalog-7mr5q                                        1/1     Running     0          66m
marketplace-operator-59f5d78dcf-ddsk9                             1/1     Running     0          8h
qe-app-registry-5j67j                                             1/1     Running     0          6h38m
redhat-marketplace-bdxhj                                          1/1     Running     0          8h
redhat-operators-5j5cq                                            1/1     Running     0          8h

> Actual results: No parsing. All of the catalog source pods are constantly being recreated as soon as they become ready which means no operators are available in the operator hub.

Sorry, I didn't meet this. Seems like all other catalog source pods worked well.

Try to subscribe to an operator provided by other catalogsource, it worked well.
mac:~ jianzhang$ oc get sub -n jian
NAME         PACKAGE   SOURCE                CHANNEL
etcd-0.9.4   etcd      community-operators   singlenamespace-alpha
mac:~ jianzhang$ oc get ip -n jian
NAME            CSV                   APPROVAL    APPROVED
install-w78hb   etcdoperator.v0.9.4   Automatic   true
mac:~ jianzhang$ oc get csv -n jian
NAME                               DISPLAY                            VERSION     REPLACES              PHASE
elasticsearch-operator.5.4.0-143   OpenShift Elasticsearch Operator   5.4.0-143                         Succeeded
etcdoperator.v0.9.4                etcd                               0.9.4       etcdoperator.v0.9.2   Succeeded

Comment 15 Anik 2022-04-11 12:33:46 UTC
@pegoncal looks like the PR https://github.com/openshift/operator-framework-olm/pull/279/commits doesn't contain the commit for the fix for this bz (https://github.com/operator-framework/operator-lifecycle-manager/pull/2447/commits). So the fix hasn't been pulled downstream yet from upstream, and there needs to be another downstream sync PR that pulls in the commits for the fix. 


I verified the test Jian was running works as expected on olm's main branch: 

```
status:
  connectionState:
    address: operatorhubio-catalog.olm.svc:50051
    lastConnect: "2022-04-11T12:27:29Z"
    lastObservedState: READY
  message: 'error parsing spec.updateStrategy.registryPoll.interval. Using the default
    value of 15m0s instead. Error: time: unknown unit "mError code" in duration "45mError
    code"'
  reason: InvalidIntervalError
  registryService:
    createdAt: "2022-04-11T12:27:01Z"
    port: "50051"
    protocol: grpc
    serviceName: operatorhubio-catalog
    serviceNamespace: olm

```

Comment 16 Per da Silva 2022-04-11 13:03:51 UTC
I must have done something wrong. I'm really sorry =(
I'm pulling it in in this sync PR: https://github.com/openshift/operator-framework-olm/pull/285

Comment 19 Jian Zhang 2022-04-14 10:15:29 UTC
1, Create an OCP 4.11 which contains the fixed PR.
mac:~ jianzhang$ oc adm release info registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-04-14-080015 -a .dockerconfigjson --commits|grep olm
  operator-lifecycle-manager                     https://github.com/openshift/operator-framework-olm                         698c23184c1c3440dc2f591be7ecf3d99fb0d227
  operator-registry                              https://github.com/openshift/operator-framework-olm                         698c23184c1c3440dc2f591be7ecf3d99fb0d227

mac:~ jianzhang$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-14-080015   True        False         12m     Cluster version is 4.11.0-0.nightly-2022-04-14-080015

2, Create that issued catalogsource.
mac:~ jianzhang$ oc create -f cs-issue.yaml 
catalogsource.operators.coreos.com/ibm-operator-catalog created
mac:~ jianzhang$ cat cs-issue.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: ibm-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: ibm-operator-catalog 
  publisher: IBM Content
  sourceType: grpc
  image: docker.io/ibmcom/ibm-operator-catalog
  updateStrategy:
    registryPoll:
      interval: 45mError code

3, Check the marketplace-operator logs and the CatalogSource status.

W0414 10:12:28.684940       1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1alpha1.CatalogSource: time: unknown unit "mError code" in duration "45mError code"
E0414 10:12:28.684970       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1alpha1.CatalogSource: failed to list *v1alpha1.CatalogSource: time: unknown unit "mError code" in duration "45mError code"
time="2022-04-14T10:12:43Z" level=info msg="[status] Previous and current ClusterOperator Status are the same, the ClusterOperator Status will not be updated."

mac:~ jianzhang$ oc get catalogsource ibm-operator-catalog -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  creationTimestamp: "2022-04-14T10:11:47Z"
  generation: 1
  name: ibm-operator-catalog
  namespace: openshift-marketplace
  resourceVersion: "33761"
  uid: 358507b1-effe-456f-b6d6-5739acd5921f
spec:
  displayName: ibm-operator-catalog
  image: docker.io/ibmcom/ibm-operator-catalog
  publisher: IBM Content
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 45mError code
status:
  connectionState:
    address: ibm-operator-catalog.openshift-marketplace.svc:50051
    lastConnect: "2022-04-14T10:12:10Z"
    lastObservedState: READY
  message: 'error parsing spec.updateStrategy.registryPoll.interval. Using the default
    value of 15m0s instead. Error: time: unknown unit "mError code" in duration "45mError
    code"'
  reason: InvalidIntervalError
  registryService:
    createdAt: "2022-04-14T10:11:47Z"
    port: "50051"
    protocol: grpc
    serviceName: ibm-operator-catalog
    serviceNamespace: openshift-marketplace

I can see the error message in the status, LGTM, verify it.

Comment 21 errata-xmlrpc 2022-08-10 10:36:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.