Bug 1873030
Summary: | Subscriptions without any candidate operators should cause resolution to fail | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jian Zhang <jiazha> |
Component: | OLM | Assignee: | Ben Luddy <bluddy> |
OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | dsover, krizza, marobrie |
Version: | 4.6 | Keywords: | Reopened, UpcomingSprint |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Release Note | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:16:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jian Zhang
2020-08-27 08:12:06 UTC
There is no indication of failure in this particular example because it is being treated as a success. I'm editing this bug's summary to reflect what I see to be the primary issue: this should cause resolution to fail with an error. Once this is fixed, a "resolution failed" event should be created in this scenario. There are plans to improve the UX around communicating status/failures to users in nicer ways than the existing event (the closest thing I can find to track these improvements is https://issues.redhat.com/browse/OLM-1739), but please do open RFEs with specific suggestions (such as additions to Subscription status). With the latest change, I can create a subscription to a package that does not exist, like this: $cat << EOF | kubectl apply -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: test-subscription namespace: test-namespace spec: name: does-not-exist source: test-catalogsource sourceNamespace: test-namespace EOF ... $ kubectl get -n test-namespace event 3m4s Warning ResolutionFailed namespace/test-namespace constraints not satisfiable: does-not-exist has a dependency without any candidates to satisfy it, does-not-exist is mandatory As I mentioned above, there are more improvements to make to the experience, but at least now the resolver considers this case to be a failure instead of silently doing nothing. Cluster version is 4.6.0-0.nightly-2020-09-23-022756 mac:~ jianzhang$ oc -n openshift-operator-lifecycle-manager exec catalog-operator-85dc479b4d-468m9 -- olm --version OLM version: 0.16.1 git commit: d0746139120f09ceaf7b18d6429751e6eb2c98a5 Sorry, I couldn't find this warnning evnet. The reproduce steps as comment 0. mac:~ jianzhang$ oc get catalogsource NAME DISPLAY TYPE PUBLISHER AGE certified-operators Certified Operators grpc Red Hat 5h29m community-operators Community Operators grpc Red Hat 5h29m ocs-catalogsource Openshift Container Storage grpc Red Hat 14m qe-app-registry Production Operators grpc OpenShift QE 5h11m redhat-marketplace Red Hat Marketplace grpc Red Hat 5h29m redhat-operators Red Hat Operators grpc Red Hat 5h29m mac:~ jianzhang$ oc get packagemanifest|grep Storage ocs-operator Openshift Container Storage 14m mac:~ jianzhang$ oc describe sub ocs-subscription -n openshift-storage Name: ocs-subscription Namespace: openshift-storage Labels: operators.coreos.com/ocs-operator.openshift-storage= Annotations: <none> API Version: operators.coreos.com/v1alpha1 Kind: Subscription Metadata: ... Spec: Channel: alpha Config: Resources: Name: ocs-operator Source: ocs-catalogsource Source Namespace: openshift-marketplace Status: Catalog Health: ... Events: <none> mac:~ jianzhang$ oc get event -n openshift-storage No resources found in openshift-storage namespace mac:~ jianzhang$ oc version Client Version: 4.6.0-0.nightly-2020-09-24-030538 Server Version: 4.6.0-0.nightly-2020-09-23-022756 Kubernetes Version: v1.19.0+8a39924 Will document in release note to alert we are aware of issue ad will fix for 4.7 I'm sorry, there was a mistake in my last comment. I ran "sed" over the shell output and accidentally made it appear as though the event would appear in the same namespace as the subscription. The events are created in the "default" namespace. That is something that we do want to change, but it is a separate issue. To make sure, I ran the same steps you used on a 4.6 cluster: $ kubectl get -n default event 14s Warning ResolutionFailed namespace/openshift-storage constraints not satisfiable: ocs-operator has a dependency without any candidates to satisfy it, ocs-operator is mandatory Documentation should instruct users to look for events in the default namespace for dependency resolution failure information. Hi Ben, Thanks for your updates! Yes, I can see that in the `default` namespace. As the follows: mac:~ jianzhang$ oc get event -n default LAST SEEN TYPE REASON OBJECT MESSAGE 6m22s Warning ResolutionFailed namespace/openshift-storage constraints not satisfiable: ocs-operator has a dependency without any candidates to satisfy it, ocs-operator is mandatory 6m22s Warning ResolutionFailed namespace/openshift-storage constraints not satisfiable: ocs-operator is mandatory, ocs-operator has a dependency without any candidates to satisfy it But, this operator was installed in the `openshift-storage` namespace, why we pop the warnning event in the `default` namespace? Besides, the value of the subscription's `Events` field is `none`. I think we should display the warinning event on here at least. I change the status to the ASSIGNED. mac:~ jianzhang$ oc describe sub ocs-subscription -n openshift-storage ... Spec: Channel: alpha Config: Resources: Name: ocs-operator Source: ocs-catalogsource Source Namespace: openshift-marketplace Status: Catalog Health: ... Events: <none> Jian, These events are not new, and they have been created in the default namespace since they were introduced. The bug that was fixed is that a subscription that does not have any operators to satisfy it is an error. Before this fix, these subscriptions were ignored instead of being an error. Changing the namespace that events are created in is _not_ related to this issue and is not a trivial change to make as part of making subscriptions without any available operators an error. Opened https://bugzilla.redhat.com/show_bug.cgi?id=1882791 to track the UX issue separately from the failure mode. Hi Ben, Thanks for your information! > The bug that was fixed is that a subscription that does not have any operators to satisfy it is an error. Before this fix, these subscriptions were ignored instead of being an error. Yes, the destination of reporting this bug is to let the end user to know where is wrong clearly. To improve the user experience, we added the event report. That's good. But, ralely customers can aware of to find the related events in the `default` namespace. [root@preserve-olm-env data]# oc get event -n default LAST SEEN TYPE REASON OBJECT MESSAGE 14m Warning Unhealthy pod/iscsi-target Readiness probe failed: command timed out 17s Warning ResolutionFailed namespace/openshift-storage constraints not satisfiable: ocs-operator has a dependency without any candidates to satisfy it, ocs-operator is mandatory 18s Warning ResolutionFailed namespace/openshift-storage constraints not satisfiable: ocs-operator is mandatory, ocs-operator has a dependency without any candidates to satisfy it > Changing the namespace that events are created in is _not_ related to this issue and is not a trivial change to make as part of making subscriptions without any available operators an error. Yes, I understand. So, is it availabe to add this event to the Subsctiption status? If yes, that's easy for the customer to find it. But, now, the value of the Event field still is `None`. [root@preserve-olm-env data]# oc describe sub ocs-subscription Name: ocs-subscription Namespace: openshift-storage Labels: operators.coreos.com/ocs-operator.openshift-storage= Annotations: <none> API Version: operators.coreos.com/v1alpha1 Kind: Subscription Metadata: Creation Timestamp: 2020-10-10T03:01:13Z Generation: 1 ... Conditions: Last Transition Time: 2020-10-10T03:01:14Z Message: all available catalogsources are healthy Reason: AllCatalogSourcesHealthy Status: False Type: CatalogSourcesUnhealthy Last Updated: 2020-10-10T03:01:14Z Events: <none> I change the status to ASSIGNED, and modify the TargetRelease to 4.7 since the 4.6 GA time is rush now. Hi Ben,
> They existed before this was reported and have not changed at all.
Thanks! I guess I need to create an RFE for this. It's better for the customer to get the event when checking the subscription.
Besides, I change the Status to VERIFIED since we fixed one problem.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |