Bug 1696074

Summary: [Marketplace] the "Operator Hub" UI often crash
Product: OpenShift Container Platform Reporter: Jian Zhang <jiazha>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OperatorHub QA Contact: Fan Jia <jfan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: amerdler, anli, aravindh, bandrade, chezhang, dyan, jfan, mifiedle, scolange, scuppett, spadgett, zitang
Version: 4.1.0Keywords: BetaBlocker
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:47:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1700504    
Bug Blocks:    

Description Jian Zhang 2019-04-04 06:39:11 UTC
Description of problem:
The "Operator Hub" section lost and the "Operator Management" section display "Oh no! Something went wrong." on the UI.

Version-Release number of selected component (if applicable):
OLM version:  io.openshift.build.commit.id=9ba3512c5406b62179968e2432b284e9a30c321e
Marketplace version:
 io.openshift.build.commit.id=e274d6b40505e977e12061becf27218f5eb717fb

How reproducible:
often, I met three times today.

Steps to Reproduce:
1. Install the OCP 4.0 with payload: 4.0.0-0.nightly-2019-04-02-133735
2. Log in the cluster as the kubeadmin user on the Web console.
3. "Catalog"->"OperatorHub"-> Install the "federation" operator.

Actual results:
The "Operator Hub" section lost and the "Operator Management" section display "Oh no! Something went wrong." on the UI.

TypeError
Description:
Cannot read property 'phase' of undefined
Component Trace:
    in t
    in div
    in div
    in SubscriptionDetails
    in Unknown
    in e
    in e
    in div
    in k
    in StatusBox
    in div
    in div
    in t
    in div
    in Unknown
    in Connect(Component)
    in t
    in Connect(t)
    in Unknown
    in t
    in DetailsPage
    in SubscriptionDetailsPage
    in t
    in Unknown
    in ResourceDetailsPage
    in e
    in e
    in div
    in div
    in section
    in h
    in main
    in div
    in e
    in t
    in e
    in e
    in o

Stack Trace:
TypeError: Cannot read property 'phase' of undefined
    at https://console-openshift-console.apps.jian-444.qe.devcluster.openshift.com/static/main-chunk-f59d3d04f9d24bf2d019.min.js:1:235327
    at t.render (https://console-openshift-console.apps.jian-444.qe.devcluster.openshift.com/static/main-chunk-f59d3d04f9d24bf2d019.min.js:1:235513)
    at h (https://console-openshift-console.apps.jian-444.qe.devcluster.openshift.com/static/vendors~main-chunk-19ad6a51d52d713866be.min.js:107:42945)
    at beginWork (https://console-openshift-console.apps.jian-444.qe.devcluster.openshift.com/static/vendors~main-chunk-19ad6a51d52d713866be.min.js:107:50982)
    at o (https://console-openshift-console.apps.jian-444.qe.devcluster.openshift.com/static/vendors~main-chunk-19ad6a51d52d713866be.min.js:107:54823)
    at a (https://console-openshift-console.apps.jian-444.qe.devcluster.openshift.com/static/vendors~main-chunk-19ad6a51d52d713866be.min.js:107:55102)
    at x (https://console-openshift-console.apps.jian-444.qe.devcluster.openshift.com/static/vendors~main-chunk-19ad6a51d52d713866be.min.js:107:58045)
    at _ (https://console-openshift-console.apps.jian-444.qe.devcluster.openshift.com/static/vendors~main-chunk-19ad6a51d52d713866be.min.js:107:57588)
    at b (https://console-openshift-console.apps.jian-444.qe.devcluster.openshift.com/static/vendors~main-chunk-19ad6a51d52d713866be.min.js:107:57425)
    at m (https://console-openshift-console.apps.jian-444.qe.devcluster.openshift.com/static/vendors~main-chunk-19ad6a51d52d713866be.min.js:107:56804)

Expected results:
The "Operator Hub" works well.

Additional info:
The URL: https://console-openshift-console.apps.jian-444.qe.devcluster.openshift.com/k8s/ns/default/operators.coreos.com~v1alpha1~Subscription/federation
The pods of marketplace works well, no errors found. And, refresh this page, it will be OK.

Comment 1 Fan Jia 2019-04-04 09:29:26 UTC
Does the operator "federation" from custom's app registory and the operator miss the necessary spec. For now please validate what you are pushing to Quay before using it in your tests. Please follow these (https://github.com/operator-framework/operator-lifecycle-manager/blob/master/Documentation/design/building-your-csv.md and https://github.com/operator-framework/community-operators/blob/master/docs/required-fields.md) guides when constructing your custom CSVs. You should also follow the guidelines here (https://github.com/operator-framework/community-operators/blob/master/docs/testing-operators.md#testing-operators) for pushing to Quay.

Comment 2 Jian Zhang 2019-04-04 09:45:52 UTC
We met the issue too many times in different cluster. Increase Severity.
mac:books jianzhang$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-04-03-202419   True        False         10h       Cluster version is 4.0.0-0.nightly-2019-04-03-202419
[jzhang@dhcp-140-18 444]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-04-02-133735   True        False         3h5m    Cluster version is 4.0.0-0.nightly-2019-04-02-133735

Comment 3 Jian Zhang 2019-04-04 09:51:55 UTC
Jia,

> Does the operator "federation" from custom's app registory and the operator miss the necessary spec.

No, I don't think this is a specific operator issue. Almost all operators installation will encounter this issue. Such as Couchbase, AMQ Streams, etcd etc.

Comment 4 Aravindh Puthiyaparambil 2019-04-04 13:12:38 UTC
@jian Zhang, please confirm that you are installing the operators from the OperatorSources that are pre-installed and not from a custom OperatorSource that you created.

Comment 5 Jian Zhang 2019-04-07 01:29:35 UTC
@Aravindh

I'm sure I used the pre-installed OperatorSources.

Comment 7 Alec Merdler 2019-04-08 15:58:51 UTC
Issue is because the associated `InstallPlan` for the `Subscription` has not been created yet, and causes an NPE (https://github.com/openshift/console/blob/master/frontend/public/components/operator-lifecycle-manager/subscription.tsx#L167). I will add more defensive code here.

Comment 10 Anping Li 2019-04-16 02:12:34 UTC
I have to deploy two operators elasticsearch-operator and cluster-logging operator in cluster-logging deployment. It took me 5 minutes as the operate-hub disappears frequently. I had to try again and again. I expected this bug can be fixed soon.

Comment 12 Alec Merdler 2019-04-16 16:36:18 UTC
Fixed in https://github.com/openshift/console/pull/1439

Comment 16 Jian Zhang 2019-04-19 06:40:27 UTC
It works well. I didn't encounter this issue after running the cluster for hours. LGTM, verify it.

Cluster version is 4.1.0-0.nightly-2019-04-18-210657
Console version:
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d31eb5a32a2bc292e2096a6f97d80c0c5a5d3de93bb028670ee2a253edb99e50
             io.openshift.build.commit.id=8a2df3e00d50c88f630e329159e6bcd98f6b2767
             io.openshift.build.commit.url=https://github.com/openshift/console/commit/8a2df3e00d50c88f630e329159e6bcd98f6b2767
             io.openshift.build.source-location=https://github.com/openshift/console
OLM version info:
               io.openshift.build.commit.id=c718ec855bb26a111d66ba2ba193d30e54f7feb1

Comment 17 Salvatore Colangelo 2019-04-29 12:42:44 UTC
Seems the problem now is present after cancel 

This link https://console-openshift-console.apps.gpei-0429.qe1.devcluster.openshift.com/operatormanagement/ns/default/catalogsources often a blank page

oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-04-28-064010   True        False         174m    Cluster version is 4.1.0-0.nightly-2019-04-28-064010

Comment 18 Samuel Padgett 2019-04-29 12:49:27 UTC
scolange - What do you mean by "after cancel"? Can you paste the stack you're seeing from the browser JavaScript console? This might be a different problem.

Comment 19 Salvatore Colangelo 2019-04-29 12:57:28 UTC
I cancelled a CSV from a Operators.

https://console-openshift-console.apps.gpei-0429.qe1.devcluster.openshift.com/operatormanagement/ns/scolange/catalogsources

Below the console stack:

websocket error: /api/kubernetes/apis/operators.coreos.com/v1alpha1/namespaces/scolange/catalogsources?watch=true&resourceVersion=566192
operator-group.tsx:51 websocket closed: /api/kubernetes/apis/operators.coreos.com/v1alpha1/namespaces/scolange/catalogsources?watch=true&resourceVersion=566192 CloseEvent {isTrusted: true, wasClean: false, code: 1006, reason: "", type: "close", …}
operator-group.tsx:51 WS closed abnormally - starting polling loop over!
operator-group.tsx:51 destroying websocket: /api/kubernetes/apis/operators.coreos.com/v1alpha1/namespaces/scolange/catalogsources?watch=true&resourceVersion=566192
operator-group.tsx:51 catalogsources---{"ns":"scolange"} timed out - restarting polling
operator-group.tsx:51 destroying websocket: /api/kubernetes/apis/operators.coreos.com/v1alpha1/subscriptions?watch=true&resourceVersion=566193
/operatormanagement/ns/scolange/catalogsources:1 WebSocket connection to 'wss://console-openshift-console.apps.gpei-0429.qe1.devcluster.openshift.com/api/kubernetes/apis/operators.coreos.com/v1alpha1/subscriptions?watch=true&resourceVersion=566193&x-csrf-token=9iYuKrn%2FsmZkZ%2Bf%2B%2Boq67H%2BVl0n6O6lV%2FCCBesuC9ckpbB8nqGKu%2FWcnyhQRzQHiD2fewMHXpqRsYDX2izQHPQ%3D%3D' failed: WebSocket is closed before the connection is established.
operator-group.tsx:51 websocket error: /api/kubernetes/apis/operators.coreos.com/v1alpha1/subscriptions?watch=true&resourceVersion=566193
operator-group.tsx:51 websocket closed: /api/kubernetes/apis/operators.coreos.com/v1alpha1/subscriptions?watch=true&resourceVersion=566193 CloseEvent {isTrusted: true, wasClean: false, code: 1006, reason: "", type: "close", …}
operator-group.tsx:51 WS closed abnormally - starting polling loop over!
operator-group.tsx:51 destroying websocket: /api/kubernetes/apis/operators.coreos.com/v1alpha1/subscriptions?watch=true&resourceVersion=566193
operator-group.tsx:51 subscriptions timed out - restarting polling
operator-group.tsx:51 destroying websocket: /api/kubernetes/apis/operators.coreos.com/v1/operatorgroups?watch=true&resourceVersion=566193
/operatormanagement/ns/scolange/catalogsources:1 WebSocket connection to 'wss://console-openshift-console.apps.gpei-0429.qe1.devcluster.openshift.com/api/kubernetes/apis/operators.coreos.com/v1/operatorgroups?watch=true&resourceVersion=566193&x-csrf-token=9iYuKrn%2FsmZkZ%2Bf%2B%2Boq67H%2BVl0n6O6lV%2FCCBesuC9ckpbB8nqGKu%2FWcnyhQRzQHiD2fewMHXpqRsYDX2izQHPQ%3D%3D' failed: WebSocket is closed before the connection is established.
operator-group.tsx:51 websocket error: /api/kubernetes/apis/operators.coreos.com/v1/operatorgroups?watch=true&resourceVersion=566193
operator-group.tsx:51 websocket closed: /api/kubernetes/apis/operators.coreos.com/v1/operatorgroups?watch=true&resourceVersion=566193 CloseEvent {isTrusted: true, wasClean: false, code: 1006, reason: "", type: "close", …}
operator-group.tsx:51 WS closed abnormally - starting polling loop over!
operator-group.tsx:51 destroying websocket: /api/kubernetes/apis/operators.coreos.com/v1/operatorgroups?watch=true&resourceVersion=566193
operator-group.tsx:51 operatorgroups timed out - restarting polling
react-dom.production.min.js:13 TypeError: Cannot read property 'filter' of undefined
    at operator-group.tsx:51
    at operator-group.tsx:51
    at Array.some (<anonymous>)
    at Row (operator-group.tsx:51)
    at beginWork (react-dom.production.min.js:13)
    at o (react-dom.production.min.js:13)
    at a (react-dom.production.min.js:13)
    at x (react-dom.production.min.js:13)
    at _ (react-dom.production.min.js:13)
    at b (react-dom.production.min.js:13)
yr @ react-dom.production.min.js:13
commitErrorLogging @ react-dom.production.min.js:13
E @ react-dom.production.min.js:13
x @ react-dom.production.min.js:13
_ @ react-dom.production.min.js:13
b @ react-dom.production.min.js:13
m @ react-dom.production.min.js:13
p @ react-dom.production.min.js:13
enqueueSetState @ react-dom.production.min.js:13
C.setState @ react.production.min.js:10
o.onStateChange @ FileSaver.js:182
m @ index.js:8
(anonymous) @ tabbable.js:19
dispatch @ tabbable.js:19
(anonymous) @ operator-group.tsx:51
(anonymous) @ operator-group.tsx:51
(anonymous) @ operator-group.tsx:51
(anonymous) @ operator-group.tsx:51
s @ operator-group.tsx:51
Promise.then (async)
c @ operator-group.tsx:51
(anonymous) @ operator-group.tsx:51
m @ operator-group.tsx:51
c @ operator-group.tsx:51
(anonymous) @ operator-group.tsx:51
(anonymous) @ operator-group.tsx:51
(anonymous) @ operator-group.tsx:51
(anonymous) @ operator-group.tsx:51
m @ operator-group.tsx:51
l @ operator-group.tsx:51
(anonymous) @ operator-group.tsx:51
(anonymous) @ tabbable.js:19
(anonymous) @ tabbable.js:19
(anonymous) @ operator-group.tsx:51
t.start @ operator-group.tsx:51
t.UNSAFE_componentWillMount @ operator-group.tsx:51
mountClassInstance @ react-dom.production.min.js:13
beginWork @ react-dom.production.min.js:13
o @ react-dom.production.min.js:13
a @ react-dom.production.min.js:13
x @ react-dom.production.min.js:13
_ @ react-dom.production.min.js:13
b @ react-dom.production.min.js:13
interactiveUpdates @ react-dom.production.min.js:13
_n @ react-dom.production.min.js:13
operator-group.tsx:51 stopped watching packagemanifests---{"labelSelector":{"matchExpressions":[{"key":"olm-visibility","operator":"DoesNotExist"},{"key":"openshift-marketplace","operator":"DoesNotExist"}]},"ns":"scolange"} before finishing incremental loading with error TypeError: Cannot read property 'filter' of undefined!

Comment 20 Samuel Padgett 2019-04-29 15:51:31 UTC
This looks like a different problem. Please open a new Bugzilla bug.

(In reply to Salvatore Colangelo from comment #19)
> I cancelled a CSV from a Operators.

Sorry, I don't know what this means.

Comment 21 Jian Zhang 2019-04-30 03:34:25 UTC
@Salvatore

I think the issue you raise is a known issue, see bug 1686668.
The root cause is that somebody used their custom CSV files. And, something wrong in their CSV files.

@Samuel
That issue will lead to a blank page, it's very confusing for other users.
It would be great if we can pop-up warning info instead of a blank page. Do we have a plan to do this?

Comment 23 errata-xmlrpc 2019-06-04 10:47:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Comment 24 Red Hat Bugzilla 2023-09-14 05:26:28 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days