Bug 1873030 - Subscriptions without any candidate operators should cause resolution to fail
Summary: Subscriptions without any candidate operators should cause resolution to fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Ben Luddy
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-27 08:12 UTC by Jian Zhang
Modified: 2021-02-24 15:17 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Release Note
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:16:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1766 0 None closed Bug 1873030: Make a subscription without at least one candidate fail resolution. 2021-02-18 17:29:15 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:17:05 UTC

Internal Links: 1882791

Description Jian Zhang 2020-08-27 08:12:06 UTC
Description of problem:
When subscribe operator fail, there is not any status indicate this point. It's diffcult for the customer to find out where is wrong. For example, when using a non-exist channel in the subscription, the status of the subscription as follows:

[root@preserve-olm-env data]# oc get sub ocs-subscription  -n openshift-storage -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  creationTimestamp: "2020-08-27T06:37:02Z"
  generation: 1
  labels:
    operators.coreos.com/ocs-operator.openshift-storage: ""
 ...
  name: ocs-subscription
  namespace: openshift-storage
  resourceVersion: "783367"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-storage/subscriptions/ocs-subscription
  uid: eacb8269-0d2d-4d34-8aa4-6626d9e2bade
spec:
  channel: alpha
  config:
    resources: {}
  name: ocs-operator
  source: ocs-catalogsource
  sourceNamespace: openshift-marketplace
status:
  catalogHealth:
  - catalogSourceRef:
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      name: certified-operators
      namespace: openshift-marketplace
      resourceVersion: "776315"
      uid: 41962628-82c3-40c2-9818-a786a684dd17
    healthy: true
    lastUpdated: "2020-08-27T06:37:03Z"
  - catalogSourceRef:
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      name: community-operators
      namespace: openshift-marketplace
      resourceVersion: "776312"
      uid: 9d1b0634-96d7-4c23-bece-e1cda3b18758
    healthy: true
    lastUpdated: "2020-08-27T06:37:03Z"
  - catalogSourceRef:
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      name: ocs-catalogsource
      namespace: openshift-marketplace
      resourceVersion: "776329"
      uid: 3db73f20-28f8-4d19-8a51-3af80788f2d2
    healthy: true
    lastUpdated: "2020-08-27T06:37:03Z"
  - catalogSourceRef:
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      name: qe-app-registry
      namespace: openshift-marketplace
      resourceVersion: "776349"
      uid: 9ce979eb-1581-436d-a90e-75c1236e2448
    healthy: true
    lastUpdated: "2020-08-27T06:37:03Z"
  - catalogSourceRef:
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      name: redhat-marketplace
      namespace: openshift-marketplace
      resourceVersion: "776332"
      uid: 51d5a581-e4e4-4250-92c6-7ad821a40131
    healthy: true
    lastUpdated: "2020-08-27T06:37:03Z"
  - catalogSourceRef:
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      name: redhat-operators
      namespace: openshift-marketplace
      resourceVersion: "776357"
      uid: ccd2cfdc-7be6-4d87-bf52-299735d365a0
    healthy: true
    lastUpdated: "2020-08-27T06:37:03Z"
  conditions:
  - lastTransitionTime: "2020-08-27T06:37:03Z"
    message: all available catalogsources are healthy
    reason: AllCatalogSourcesHealthy
    status: "False"
    type: CatalogSourcesUnhealthy
  lastUpdated: "2020-08-27T06:37:03Z"

Version-Release number of selected component (if applicable):
[root@preserve-olm-env data]# oc -n openshift-operator-lifecycle-manager exec catalog-operator-69984f947f-zsx92 -- olm --version
OLM version: 0.16.0
git commit: c3852d57c86707deb80c042c2155ad82c2d9628f
Cluster version is 4.6.0-0.nightly-2020-08-26-032807

How reproducible:
always

Steps to Reproduce:
1. Install OCP 4.6.
2. Subscribe to operator with a wrong channel. For example,
---
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
  name: ocs-catalogsource
  namespace: openshift-marketplace
spec:
  displayName: Openshift Container Storage
  icon:
    base64data: ""
    mediatype: ""
  image: quay.io/duanwei33/ocs-olm-operator:4.5.0-2
  publisher: Red Hat
  sourceType: grpc

---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    openshift.io/cluster-monitoring: "true"
  name: openshift-storage
spec: {}
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-storage-operatorgroup
  namespace: openshift-storage
spec:
  targetNamespaces:
  - openshift-storage
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ocs-subscription
  namespace: openshift-storage
spec:
  channel: alpha
  config:
    resources: {}
  name: ocs-operator
  source: ocs-catalogsource
  sourceNamespace: openshift-marketplace


Actual results:
There is no InstallPlan generated, and only the subscription. 
[root@preserve-olm-env data]# oc get ip -n openshift-storage
No resources found in openshift-storage namespace.

And, there is no any error or failure status in the subscription. That's very confuing for the customer, they don't know where is wrong. And, the OLM logs always show the message("the object has been modified; please apply your changes to the latest version and try again"). That's too wide range to locate the root cause. 


Expected results:
Should add the failure message or status to the subscription. For the above example, should add message to the subscription status, like "cannot find the alphd channel" 

Additional info:
The wide range message:
time="2020-08-27T06:37:02Z" level=warning msg="an error was encountered during reconciliation" error="Operation cannot be fulfilled on subscriptions.operators.coreos.com \"ocs-subscription\": the object has been modified; please apply your changes to the latest version and try again" event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/openshift-storage/subscriptions/ocs-subscription
E0827 06:37:02.732847       1 queueinformer_operator.go:290] sync {"update" "openshift-storage/ocs-subscription"} failed: Operation cannot be fulfilled on subscriptions.operators.coreos.com "ocs-subscription": the object has been modified; please apply your changes to the latest version and try again

Comment 2 Ben Luddy 2020-09-17 18:00:06 UTC
There is no indication of failure in this particular example because it is being treated as a success. I'm editing this bug's summary to reflect what I see to be the primary issue: this should cause resolution to fail with an error.

Once this is fixed, a "resolution failed" event should be created in this scenario.

There are plans to improve the UX around communicating status/failures to users in nicer ways than the existing event (the closest thing I can find to track these improvements is https://issues.redhat.com/browse/OLM-1739), but please do open RFEs with specific suggestions (such as additions to Subscription status).

Comment 3 Ben Luddy 2020-09-21 23:54:14 UTC
With the latest change, I can create a subscription to a package that does not exist, like this:

$cat << EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: test-subscription
  namespace: test-namespace
spec:
  name: does-not-exist
  source: test-catalogsource
  sourceNamespace: test-namespace
EOF
...

$ kubectl get -n test-namespace event
3m4s        Warning   ResolutionFailed          namespace/test-namespace         constraints not satisfiable: does-not-exist has a dependency without any candidates to satisfy it, does-not-exist is mandatory

As I mentioned above, there are more improvements to make to the experience, but at least now the resolver considers this case to be a failure instead of silently doing nothing.

Comment 5 Jian Zhang 2020-09-24 09:03:33 UTC
Cluster version is 4.6.0-0.nightly-2020-09-23-022756
mac:~ jianzhang$ oc -n openshift-operator-lifecycle-manager exec catalog-operator-85dc479b4d-468m9 -- olm --version
OLM version: 0.16.1
git commit: d0746139120f09ceaf7b18d6429751e6eb2c98a5

Sorry, I couldn't find this warnning evnet. The reproduce steps as comment 0.

mac:~ jianzhang$ oc get catalogsource
NAME                  DISPLAY                       TYPE   PUBLISHER      AGE
certified-operators   Certified Operators           grpc   Red Hat        5h29m
community-operators   Community Operators           grpc   Red Hat        5h29m
ocs-catalogsource     Openshift Container Storage   grpc   Red Hat        14m
qe-app-registry       Production Operators          grpc   OpenShift QE   5h11m
redhat-marketplace    Red Hat Marketplace           grpc   Red Hat        5h29m
redhat-operators      Red Hat Operators             grpc   Red Hat        5h29m
mac:~ jianzhang$ oc get packagemanifest|grep Storage
ocs-operator                                Openshift Container Storage   14m


mac:~ jianzhang$ oc describe  sub ocs-subscription -n openshift-storage
Name:         ocs-subscription
Namespace:    openshift-storage
Labels:       operators.coreos.com/ocs-operator.openshift-storage=
Annotations:  <none>
API Version:  operators.coreos.com/v1alpha1
Kind:         Subscription
Metadata:
...
Spec:
  Channel:  alpha
  Config:
    Resources:
  Name:              ocs-operator
  Source:            ocs-catalogsource
  Source Namespace:  openshift-marketplace
Status:
  Catalog Health:
...
Events:                    <none>

mac:~ jianzhang$ oc get event -n openshift-storage
No resources found in openshift-storage namespace

mac:~ jianzhang$ oc version
Client Version: 4.6.0-0.nightly-2020-09-24-030538
Server Version: 4.6.0-0.nightly-2020-09-23-022756
Kubernetes Version: v1.19.0+8a39924

Comment 6 Marc O'Brien 2020-09-24 18:07:20 UTC
Will document in release note to alert we are aware of issue ad will fix for 4.7

Comment 7 Ben Luddy 2020-09-24 18:18:20 UTC
I'm sorry, there was a mistake in my last comment. I ran "sed" over the shell output and accidentally made it appear as though the event would appear in the same namespace as the subscription. The events are created in the "default" namespace. That is something that we do want to change, but it is a separate issue.

To make sure, I ran the same steps you used on a 4.6 cluster:

$ kubectl get -n default event
14s         Warning   ResolutionFailed                             namespace/openshift-storage                     constraints not satisfiable: ocs-operator has a dependency without any candidates to satisfy it, ocs-operator is mandatory

Comment 8 Ben Luddy 2020-09-24 19:33:04 UTC
Documentation should instruct users to look for events in the default namespace for dependency resolution failure information.

Comment 9 Jian Zhang 2020-09-25 02:41:57 UTC
Hi Ben,

Thanks for your updates! Yes, I can see that in the `default` namespace. As the follows:

mac:~ jianzhang$ oc get event -n default
LAST SEEN   TYPE      REASON             OBJECT                        MESSAGE
6m22s       Warning   ResolutionFailed   namespace/openshift-storage   constraints not satisfiable: ocs-operator has a dependency without any candidates to satisfy it, ocs-operator is mandatory
6m22s       Warning   ResolutionFailed   namespace/openshift-storage   constraints not satisfiable: ocs-operator is mandatory, ocs-operator has a dependency without any candidates to satisfy it

But, this operator was installed in the `openshift-storage` namespace, why we pop the warnning event in the `default` namespace?
Besides, the value of the subscription's `Events` field is `none`. I think we should display the warinning event on here at least.
I change the status to the ASSIGNED.

mac:~ jianzhang$ oc describe  sub ocs-subscription -n openshift-storage
...
Spec:
  Channel:  alpha
  Config:
    Resources:
  Name:              ocs-operator
  Source:            ocs-catalogsource
  Source Namespace:  openshift-marketplace
Status:
  Catalog Health:
...
Events:                    <none>

Comment 11 Ben Luddy 2020-09-25 14:54:58 UTC
Jian,

These events are not new, and they have been created in the default namespace since they were introduced.

The bug that was fixed is that a subscription that does not have any operators to satisfy it is an error. Before this fix, these subscriptions were ignored instead of being an error.

Changing the namespace that events are created in is _not_ related to this issue and is not a trivial change to make as part of making subscriptions without any available operators an error.

Comment 12 Ben Luddy 2020-09-25 19:11:04 UTC
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1882791 to track the UX issue separately from the failure mode.

Comment 14 Jian Zhang 2020-10-10 03:11:03 UTC
Hi Ben,

Thanks for your information! 

> The bug that was fixed is that a subscription that does not have any operators to satisfy it is an error. Before this fix, these subscriptions were ignored instead of being an error.

Yes, the destination of reporting this bug is to let the end user to know where is wrong clearly. To improve the user experience, we added the event report. That's good. But, ralely customers can aware of to find the related events in the `default` namespace.

[root@preserve-olm-env data]# oc get event -n default
LAST SEEN   TYPE      REASON             OBJECT                        MESSAGE
14m         Warning   Unhealthy          pod/iscsi-target              Readiness probe failed: command timed out
17s         Warning   ResolutionFailed   namespace/openshift-storage   constraints not satisfiable: ocs-operator has a dependency without any candidates to satisfy it, ocs-operator is mandatory
18s         Warning   ResolutionFailed   namespace/openshift-storage   constraints not satisfiable: ocs-operator is mandatory, ocs-operator has a dependency without any candidates to satisfy it


> Changing the namespace that events are created in is _not_ related to this issue and is not a trivial change to make as part of making subscriptions without any available operators an error.

Yes, I understand. So, is it availabe to add this event to the Subsctiption status? If yes, that's easy for the customer to find it.
But, now, the value of the Event field still is `None`.

[root@preserve-olm-env data]# oc describe sub ocs-subscription
Name:         ocs-subscription
Namespace:    openshift-storage
Labels:       operators.coreos.com/ocs-operator.openshift-storage=
Annotations:  <none>
API Version:  operators.coreos.com/v1alpha1
Kind:         Subscription
Metadata:
  Creation Timestamp:  2020-10-10T03:01:13Z
  Generation:          1
...
  Conditions:
    Last Transition Time:  2020-10-10T03:01:14Z
    Message:               all available catalogsources are healthy
    Reason:                AllCatalogSourcesHealthy
    Status:                False
    Type:                  CatalogSourcesUnhealthy
  Last Updated:            2020-10-10T03:01:14Z
Events:                    <none>

I change the status to ASSIGNED, and modify the TargetRelease to 4.7 since the 4.6 GA time is rush now.

Comment 16 Jian Zhang 2020-11-12 10:08:13 UTC
Hi Ben,

> They existed before this was reported and have not changed at all.

Thanks! I guess I need to create an RFE for this. It's better for the customer to get the event when checking the subscription.
Besides, I change the Status to VERIFIED since we fixed one problem.

Comment 19 errata-xmlrpc 2021-02-24 15:16:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.