Description of problem: When an operator installed failed, the UI display the "Installing" all the time. It should display the error info. See the screenshot: https://user-images.githubusercontent.com/15416633/89140958-b606a480-d575-11ea-8f30-21cfdcd4328d.png Not sure if this is the correct component, feel free to move on it to the OLM component. Version-Release number of selected component (if applicable): Cluster version is 4.6.0-0.nightly-2020-08-02-091622 mac:~ jianzhang$ oc exec catalog-operator-869bbf9969-hsvw6 -- olm --version OLM version: 0.16.0 git commit: 0984504f2ec07f99014bdc6211fe87525e0b0087 How reproducible: always Steps to Reproduce: 1. Install OCP 4.6. 2. Subscribe the OCS operator(4.4.1 version) on the WebConsole. Actual results: Display the "Installing" all the time. But, in fact, it failed to install on the backed end. E0803 02:34:50.343893 1 queueinformer_operator.go:290] sync "openshift-storage" failed: [found duplicate entries for ocs-operator.v4.4.1 in {community-operators openshift-marketplace}, found duplicate entries for ocs-operator.v4.2.3 in {community-operators openshift-marketplace}] I0803 02:34:50.344014 1 event.go:278] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"openshift-storage", UID:"90e24d18-c76a-43af-b27d-70f9218fe171", APIVersion:"v1", ResourceVersion:"75129", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' [found duplicate entries for ocs-operator.v4.4.1 in {community-operators openshift-marketplace}, found duplicate entries for ocs-operator.v4.2.3 in {community-operators openshift-marketplace}] Expected results: It should display the error info, or in the failed status. Additional info:
jiazha - Can you give me some more details on how to reproduce this error? I am seeing that OCS specifically can get in a bad state when you install, uninstall, and then reinstall. Can you get this failure with any other operator? Also, if you delete the namespace openshift-storage after uninstalling, do you get an error next time around? I am reliably able to install OCS everytime on a clean system so far, although I can messed up things if I try. I am trying to decide if this is something that can be fixed on the front end "Installing" page or if we need to get OLM/OCS involved.
I spoke to the OLM team and it seems there is no status.state value set when things go terribly wrong. Sadly, my code triggers off that status.state to know when we are installing, when we are finished, and when there is a failure. We will need to change how the installing page activates; so instead of activating immediately, we will need to wait most likely for the install plan to be created. As a side note, the OLM team is aware of this problem and is working on a solution but timing is TBD.
We were able to "reproduce" it with AMQ Streams operator. It's already migrated to new bundle image format and when I tried it last time (fc-1 build), there were only AMQ Streams operator in OperatorHub. The installation didn't even started properly - install plan wasn't created. We discuss that with Shawn Hurley and according to him it's the same problem.
Since the underlying problem here is that we are not getting a meaningful status from the Subscription, the actual bug lies on the OLM side of the house. I am transferring this bug to that team. I will create a new bug to improve the usability of the console.
Hi Zac, > Can you give me some more details on how to reproduce this error? I am seeing that OCS specifically can get in a bad state when you install, uninstall, and then reinstall. Can you get this failure with any other operator? Also, if you delete the namespace openshift-storage after uninstalling, do you get an error next time around? Apologize for the late reply. Yes, the OCS are installed well for OCP4.6 becase OCS team fix that. Now, you can subscribe the AMQ Stream operator to reeproduce it.
Hi Mohan, I'm not sure why you add the `TestBlocker` label. I remove it now. Please feel free to add it back. Thanks!
From the customer perspective, The `Installing` staus is very confusing. I has been asked about this problem for many times. It's better to fix it in 4.6, besides, it's blocking other team's test. Higher the priority.
The issue which we hit in AMQ Streams is a little bit different - iib image for 4.6 contains some wrong binaries according to the debug which we did with Lance Galletti which cause that any operator from it cannot be installed on 4.6.
*** Bug 1882791 has been marked as a duplicate of this bug. ***
*** Bug 1884534 has been marked as a duplicate of this bug. ***
Hi Sam, With the release of the Operator API in 4.6, I think this bug is resolvable through some design changes in the way that the console interacts here. Most likely, this error information is available as part of the status object of the InstallPlan that was attempting to execute the installation, which should now be aggregated there. I'm reassigning to the console here. If there's more in depth work that is non trivial, we may want to close this and instead file an RFE. But to start, I think some trivial improvement can be made by starting to watch that on this page, rather than aggregate the combination of CSV+Subscription by watching both objects.
I tried installing several different operators today and then deleting them and reinstalling, then deleting the namespace and then reinstalling. However, every time the install worked (eventually) making this difficult to debug. When this was first assigned to the console team, I could only get OCS to fail and that was specifically fixed and was deemed OCS specific. At this point, I am curious if anyone else can create an error specific to the original issue description. I will leave this open for now but if there is not a specific reproducible way to show this behavior, I will close this as being fix with fixes to other areas of the code.
(In reply to Zac Herman from comment #21) > I tried installing several different operators today and then deleting them > and reinstalling, then deleting the namespace and then reinstalling. > However, every time the install worked (eventually) making this difficult to > debug. When this was first assigned to the console team, I could only get > OCS to fail and that was specifically fixed and was deemed OCS specific. > At this point, I am curious if anyone else can create an error specific to > the original issue description. I will leave this open for now but if there > is not a specific reproducible way to show this behavior, I will close this > as being fix with fixes to other areas of the code. Simple way how to reproduce is to have a typo in the channel name as described in BZ1884534
This bug has mutated and changed from the original problem and now references other bugs that were closed as a duplicate of this one. I am having a difficult time understanding and reproducing the problem especially given the steps listed above. I am closing this out and ask that a new bug be opened describing what issue is occurring.