Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1989711

Summary: Invalid olm.maxOpenShiftVersion properties have unclear/undefined behavior in OLM
Product: OpenShift Container Platform Reporter: Nick Hale <nhale>
Component: OLMAssignee: Nick Hale <nhale>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: jiazha, tflannag
Version: 4.9Keywords: Triaged
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1989704 Environment:
Last Closed: 2021-08-16 18:32:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1989704    
Bug Blocks:    

Description Nick Hale 2021-08-03 18:38:03 UTC
+++ This bug was initially created as a clone of Bug #1989704 +++

Description of problem:

The behavior of OLM when installed operators specify olm.maxOpenShiftVersion:

- with an empty value
- with a poorly formed value; e.g. bad semver
- more than once; ambiguous max

is ill-defined and inconsistent; silently allowing upgrades in the face of invalid values.

Version-Release number of selected component (if applicable): 

OpenShift: 4.9.0-0.nightly-2021-07-27-125952
OLM: version: 0.18.3, git commit: 3475b1d5d8d481394ba90b2823645bed0fb2b076

How reproducible: Always

Steps to Reproduce:

1. Write an operator that specifies an invalid, empty, and/or multiple olm.maxOpenShiftVersion properties
2. Install that operator on a cluster

(Here's a demo that walks through configuring properties, building bundles, and installing on a cluster: https://youtu.be/pa2-YKo07cw?list=PLw471UtJC4H39vNAduRz6CixDetHwljKs)

Expected results:

OLM should set the Upgradeable condition of the operator-lifecycle-manager ClusterOperator resource to False -- i.e. block cluster upgrades -- whenever:

- an installed operator has an invalid olm.maxOpenShiftVersion property; e.g. empty or malformed semver
- an installed operator has declared more than one olm.MaxOpenShiftVersion property
- cluster information is unavailable; e.g. the desired version of the cluster is undefined (non-functional requirement)

Additionally, the message of that condition should identify the operator(s) (and reason for) blocking an upgrade.

Additional info:

This behavior was undefined but originally assumed to fail-open; i.e. invalid properties should be ignored (see https://bugzilla.redhat.com/show_bug.cgi?id=1986753). However, it was decided during review that failing-closed (i.e. invalid properties should block upgrades) with appropriate error messages leaves OpenShift upgrades less susceptible to operator publishing errors.

Upstream PR (merged): https://github.com/operator-framework/operator-lifecycle-manager/pull/2302

--- Additional comment from Nick Hale on 2021-08-03 18:35:18 UTC ---

Comment 5 Jian Zhang 2021-08-10 03:26:28 UTC
[cloud-user@preserve-olm-env jian]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-08-09-135211   True        False         88s     Cluster version is 4.8.0-0.nightly-2021-08-09-135211

[cloud-user@preserve-olm-env jian]$ oc -n openshift-operator-lifecycle-manager  exec deploy/catalog-operator -- olm --version
OLM version: 0.17.0
git commit: 2ab4171de3930b89b6b186724c29bffcf6f39ce2


1, Add olm.properties for etcd-operator CSV:

1) for etcd-operator v0.9.2, add empty value:`olm.properties: '[{"type": "olm.maxOpenShiftVersion", "value": " "]' `,  the empty is a invalid value so it should block the 4.9 upgrade.
2) for etcd-operator v0.9.4, add `'[{"type": "olm.maxOpenShiftVersion", "value": "4.9"}]'`, since the max version is 4.9, but the cluster next version is 4.10.0,  4.9 < 4.10.0, so it should block the 4.9 upgrade.
3) for etcd-operator v0.9.2-cluster, add value: `olm.properties: '[{"type": "olm.maxOpenShiftVersion", "value": "4.10.0"}]'`, 4.10.0-xxx < 4.10.0, so it should NOT block the 4.9 upgrade.
4) for etcd-operator v0.9.4-cluster, add two values: ` olm.properties: '[{"type": "olm.maxOpenShiftVersion", "value": "4.10.0"},{"type": "olm.maxOpenShiftVersion", "value": "4.11.0"}]'`, 
since there are two propreties, so it's a invalid value, so it should block 4.9 cluster upgrading.

2, Create a CatalogSource that consume this index image.

[cloud-user@preserve-olm-env jian]$ cat cs-max.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: max-operators
  namespace: openshift-marketplace
spec:
  displayName: Jian Operators
  image: quay.io/olmqe/etcd-index:upgrade-max3
  priority: -200
  publisher: Jian
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 10m0s
[cloud-user@preserve-olm-env jian]$ oc create -f cs-max.yaml 
catalogsource.operators.coreos.com/max-operators created

3, Subscribe to etcd-operator v0.9.2 with Manual approval. As follows, and approve it.

[cloud-user@preserve-olm-env jian]$ cat og.yaml 
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: default-og
  namespace: default
spec:
  targetNamespaces:
  - default
[cloud-user@preserve-olm-env jian]$ oc create -f og.yaml 
operatorgroup.operators.coreos.com/default-og created

[cloud-user@preserve-olm-env jian]$ cat sub-etcd.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: etcd
  namespace: default
spec:
  channel: singlenamespace-alpha
  installPlanApproval: Manual
  name: etcd
  source: max-operators
  sourceNamespace: openshift-marketplace
  startingCSV: etcdoperator.v0.9.2

[cloud-user@preserve-olm-env jian]$ oc create -f sub-etcd.yaml 
subscription.operators.coreos.com/etcd created

4, The `empty` value is a invalid value, should block the upgrade.
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                  DISPLAY   VERSION   REPLACES   PHASE
etcdoperator.v0.9.2   etcd      0.9.2                Succeeded
[cloud-user@preserve-olm-env jian]$ oc get csv -o yaml|grep "maxOpenShiftVersion"
      olm.properties: '[{"type": "olm.maxOpenShiftVersion", "value": " "}]'
      operatorframework.io/properties: '{"properties":[{"type":"olm.gvk","value":{"group":"etcd.database.coreos.com","kind":"EtcdBackup","version":"v1beta2"}},{"type":"olm.gvk","value":{"group":"etcd.database.coreos.com","kind":"EtcdCluster","version":"v1beta2"}},{"type":"olm.gvk","value":{"group":"etcd.database.coreos.com","kind":"EtcdRestore","version":"v1beta2"}},{"type":"olm.maxOpenShiftVersion","value":"


5, Approve it and make it upgrade to v0.9.4, and then check the status of the Upgradeable.
[cloud-user@preserve-olm-env jian]$ oc get csv -o yaml|grep "maxOpenShiftVersion"
      olm.properties: '[{"type": "olm.maxOpenShiftVersion", "value": "4.9"}]'
...

Since the max version is 4.9, but the cluster next version is 4.9.0-xxx,  4.9.0-xxx < 4.9.0, so it should NOT block the 4.8 upgrade.
[cloud-user@preserve-olm-env jian]$ oc get co operator-lifecycle-manager -o=jsonpath={.status.conditions[?(@.type==\"Upgradeable\")].status}
True

6, Uninstall this v0.9.4 etcd operator, and check the status of the Upgradeable.

[cloud-user@preserve-olm-env jian]$  oc get co operator-lifecycle-manager -o=jsonpath={.status.conditions[?(@.type==\"Upgradeable\")].status}
True

7, Subscribe to etcd-operator v0.9.2-cluster with Manual approval. As follows, and approve it. 
[cloud-user@preserve-olm-env jian]$ oc get ip
NAME            CSV                               APPROVAL   APPROVED
install-2gpng   etcdoperator.v0.9.4-clusterwide   Manual     false
install-4pmls   etcdoperator.v0.9.2-clusterwide   Manual     true
[cloud-user@preserve-olm-env jian]$ oc get csv
NAME                              DISPLAY   VERSION             REPLACES   PHASE
etcdoperator.v0.9.2-clusterwide   etcd      0.9.2-clusterwide              Succeeded

8, Check the status of the Upgradeable, looks good.
[cloud-user@preserve-olm-env jian]$ oc get csv -o yaml|grep "maxOpenShiftVersion"
      olm.properties: '[{"type": "olm.maxOpenShiftVersion", "value": "4.10.0"}]'
[cloud-user@preserve-olm-env jian]$ oc get co operator-lifecycle-manager -o=jsonpath={.status.conditions[?(@.type==\"Upgradeable\")].status}
True

9, Approve it  and make it upgrade to v0.9.4-cluster, and then check the status of the Upgradeable.
[cloud-user@preserve-olm-env jian]$ oc get csv -o yaml|grep "maxOpenShiftVersion"
      olm.properties: '[{"type": "olm.maxOpenShiftVersion", "value": "4.11.0"},{"type":
        "olm.maxOpenShiftVersion", "value": "4.10.0"}]'

Although both the 4.11.0 and 4.10.0 > 4.9.0-xxx, but there are two values for the maxOpenShiftVersion, so they are invalid values, so it should block the cluster upgrade. Looks good.
[cloud-user@preserve-olm-env jian]$  oc get co operator-lifecycle-manager -o=jsonpath={.status.conditions[?(@.type==\"Upgradeable\")].status}
False

10, Uninstall this v0.9.4-cluster etcd operator, and check the status of the Upgradeable. Its value should be `True`, not block the cluster upgrade. Looks good.
[cloud-user@preserve-olm-env jian]$ oc get co operator-lifecycle-manager -o=jsonpath={.status.conditions[?(@.type==\"Upgradeable\")].status}
True

verify it.

Comment 7 errata-xmlrpc 2021-08-16 18:32:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.5 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3121