Bug 1287725

Summary: [Webadmin] Error message appears when failing to change host cluster compatibility version
Product: [oVirt] ovirt-engine Reporter: Nikolai Sednev <nsednev>
Component: Frontend.WebAdminAssignee: Alexander Wels <awels>
Status: CLOSED WORKSFORME QA Contact: Pavel Stehlik <pstehlik>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.1.2CC: bugs, ecohen, gklein, nsednev, oourfali
Target Milestone: ovirt-3.6.2Keywords: Regression
Target Release: ---Flags: oourfali: ovirt-3.6.z?
rule-engine: blocker?
rule-engine: planning_ack?
nsednev: devel_ack?
rule-engine: testing_ack?
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-14 19:45:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Screenshot from 2015-12-02 16:26:19.png
none
engine and server logs none

Description Nikolai Sednev 2015-12-02 14:30:44 UTC
Description of problem:
While trying to change the host cluster compatobility mode on 3.6 engine from 3.4 to 3.6, and then failing because of the https://bugzilla.redhat.com/show_bug.cgi?id=1287136 , The error as described bellow occurred in WEBUI.
 
Uncaught exception occurred. Please try reloading the page. Details: (TypeError) __gwt$exception: <skipped>: zab(..



Version-Release number of selected component (if applicable):
Engine:
ovirt-engine-extension-aaa-jdbc-1.0.3-1.el6ev.noarch
ovirt-host-deploy-1.4.1-1.el6ev.noarch
ovirt-vmconsole-1.0.0-1.el6ev.noarch
ovirt-host-deploy-java-1.4.1-1.el6ev.noarch
ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch
rhevm-3.6.1-0.2.el6.noarch
Linux version 2.6.32-573.8.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Fri Sep 25 19:24:22 EDT 2015


How reproducible:
after upgrade to 3.4->3.5->3.6.
100%

Steps to Reproduce:
1.Upgrade the engine 3.4->3.5->3.6.
2.Try moving host cluster from 3.4 to 3.6.
3.Receive the error.

Actual results:
Uncaught exception occurred. Please try reloading the page. Details: (TypeError) __gwt$exception: <skipped>: zab(..

Expected results:
Error should not happen.

Additional info:
screenshot and logs attached.

Comment 1 Nikolai Sednev 2015-12-02 14:32:31 UTC
Created attachment 1101512 [details]
Screenshot from 2015-12-02 16:26:19.png

Comment 2 Nikolai Sednev 2015-12-02 14:40:14 UTC
Created attachment 1101516 [details]
engine and server logs

Comment 3 Nikolai Sednev 2015-12-02 15:37:13 UTC
This also happens on clean 3.6.1, so no direct relation to upgrade.

Comment 4 Red Hat Bugzilla Rules Engine 2015-12-03 05:59:11 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 5 Yaniv Kaul 2015-12-03 13:20:19 UTC
Why is this a regression? did you get to the same scenario, after hitting bug 1287136 , in previous releases?

Reducing to medium severity.

Comment 6 Oved Ourfali 2015-12-03 14:55:33 UTC
(In reply to Yaniv Kaul from comment #5)
> Why is this a regression? did you get to the same scenario, after hitting
> bug 1287136 , in previous releases?
> 
> Reducing to medium severity.

As far as I know, recently the UX guys added an ability to see UX errors in the UI. So, even if things work properly, and there is some UI exception, it will show there.

It is important, however, until we clear all of those, I think we might want to have this feature turned off, so that we will use it here in QE and DEV, but not for customers.

Einav - thoughts? Perhaps I'm wrong, but I've noticed that only recently.

Comment 7 Alexander Wels 2015-12-03 15:47:45 UTC
@Oved,

Yes its a new addition to 3.6. All it does is make an uncaught exception visible in the UI, to alert the user something is wrong. Before it would just break and the user would be none the wiser. The exception would be in the browser console, but since almost no one looks there it was never reported.

This encourages people to report so we can investigate and fix whatever issue caused the uncaught exception. We even provide a mapping file so we can look up what the obfuscated variables/classes are in the exception.

I would have to look at the patch, but I don't think we can turn it off selectively for customers vs qa.

@Nikolai,
Can you give me access to the system so I can see the reproduction steps in action? This will help me figure out what is going on.

Comment 8 Einav Cohen 2015-12-03 16:05:51 UTC
+1 to Alexander's comment. 

for everyone (dev, QA, users, customers) it is better to display these alerts:  
- it encourages the user to report the problem. 
- it suggests a temporary solution ("Please try reloading the page") in order to workaround any potential unexpected behaviors due to javascripts that stopped working. 

disabling these alerts would mean, obviously - potential unexpected behaviors due to javascripts that stopped working (e.g. things like infinite loading animations as originally reported in bug 1284584, etc.), which IMO is a worse choice than keeping these alerts.

Comment 10 Nikolai Sednev 2015-12-03 16:16:22 UTC
(In reply to Einav Cohen from comment #8)
> +1 to Alexander's comment. 
> 
> for everyone (dev, QA, users, customers) it is better to display these
> alerts:  
> - it encourages the user to report the problem. 
> - it suggests a temporary solution ("Please try reloading the page") in
> order to workaround any potential unexpected behaviors due to javascripts
> that stopped working. 
> 
> disabling these alerts would mean, obviously - potential unexpected
> behaviors due to javascripts that stopped working (e.g. things like infinite
> loading animations as originally reported in bug 1284584, etc.), which IMO
> is a worse choice than keeping these alerts.

I don't have any disagreement in this, just reporting it as a bug as it should not happen, we need to find out the root cause of this problem, so I attached all logs I thought you'd need and provided you with the access to environment.

Comment 11 Oved Ourfali 2015-12-03 16:29:26 UTC
Indeed a great addition. 

I didn't mean to say that we should remove that. However, we should make sure to indeed cover every issue we see first. 
As some of those might have no impact, but might be a bit scary for users. 

So, if we feel that we're stable with regards to that, then it should indeed be operating everywhere. 


I just happen to see that today in the same dialog, but everything worked properly with the dialog, but that might not be the common case.

Comment 13 Nikolai Sednev 2015-12-06 13:49:20 UTC
I was trying to reproduce this issue, while following these steps and failed, the issue not always reproducible, is there anything useful within the logs?

Comment 14 Alexander Wels 2015-12-07 15:17:37 UTC
Unfortunately the logs are all on the backend side while the actual problem is on the frontend side which has little to no information to tell us what is wrong. We just implemented some functionality to give us some indication there is a problem in the frontend (the popup with the uncaught exception message).

Most likely what is happening, is some somewhat unusual condition happens, which in the end causes some null pointer exception within the UI code. Its just a matter of finding the right steps to reproduce.

My main question is. Are the steps I did above what you did to produce the original problem, or is did you do something else? From the original bug report it was not entirely clear to me what the steps are to reproducing the problem. So I took what was there and tried to make some easy to follow steps from them but was unable to reproduce at that point.

TL;DR
I am trying to find some reproducer steps. Obviously the ones I tried didn't work, did I follow the same ones you did?

Comment 15 Nikolai Sednev 2015-12-07 17:40:23 UTC
(In reply to Alexander Wels from comment #14)
> Unfortunately the logs are all on the backend side while the actual problem
> is on the frontend side which has little to no information to tell us what
> is wrong. We just implemented some functionality to give us some indication
> there is a problem in the frontend (the popup with the uncaught exception
> message).
> 
> Most likely what is happening, is some somewhat unusual condition happens,
> which in the end causes some null pointer exception within the UI code. Its
> just a matter of finding the right steps to reproduce.
> 
> My main question is. Are the steps I did above what you did to produce the
> original problem, or is did you do something else? From the original bug
> report it was not entirely clear to me what the steps are to reproducing the
> problem. So I took what was there and tried to make some easy to follow
> steps from them but was unable to reproduce at that point.
> 
> TL;DR
> I am trying to find some reproducer steps. Obviously the ones I tried didn't
> work, did I follow the same ones you did?
Yes, you've followed exactly the same steps, but again I'm too was unable to reproduce this issue, while was following the same steps a few days ago.
I did exactly the same steps as you did, but unfortunately not always saw this WEBUI problem, now for example I see this issue while creating template from VM and then creating VMs from the template, this issue being reproduced occasionally and there are few more other flows which lead to this issue, some QA engineers saw the same error while were adding or deleting storage domains or trying to edit VMs.

Comment 16 Alexander Wels 2015-12-10 18:59:16 UTC
Okay so when you see that message in different places the problem is somewhere in whatever flow you are working with. When you hover over the exception message it will show you a stack trace of obfuscated code. You will also see that stack trace in the browser console.

The important part for us is that stack trace. Together with the symbolMap package we can piece together which part of the source code is causing the exception.

So when you say some QE engineers saw that exception during adding/deleting storage domains. That is actually two separate problems in the code and should be reported as two separate bugs. The same with issues creating VMs from templates.

So please when you see that exception check the browser console and copy and paste the stack trace into a new bug. Also give detailed steps on how to reproduce for when we ask access to the system so we can reproduce.

Comment 17 Nikolai Sednev 2015-12-13 11:52:43 UTC
(In reply to Alexander Wels from comment #16)
> Okay so when you see that message in different places the problem is
> somewhere in whatever flow you are working with. When you hover over the
> exception message it will show you a stack trace of obfuscated code. You
> will also see that stack trace in the browser console.
> 
> The important part for us is that stack trace. Together with the symbolMap
> package we can piece together which part of the source code is causing the
> exception.
> 
> So when you say some QE engineers saw that exception during adding/deleting
> storage domains. That is actually two separate problems in the code and
> should be reported as two separate bugs. The same with issues creating VMs
> from templates.
> 
> So please when you see that exception check the browser console and copy and
> paste the stack trace into a new bug. Also give detailed steps on how to
> reproduce for when we ask access to the system so we can reproduce.

The reproduction steps were fine, it's not reproduced any more using these steps, but once it was possible to get the reproduction using exactly the same steps.

If there is nothing else the attached logs can provide you with, please close this bug as missing information or works for me.

Comment 18 Einav Cohen 2015-12-14 19:45:21 UTC
(In reply to Nikolai Sednev from comment #17)
> If there is nothing else the attached logs can provide you with, please
> close this bug as missing information or works for me.

closing on works-for-me; please feel free to re-open if you have the relevant client (javascript) logs (specifically - we are looking for the full javascript exception stacktrace).

Comment 20 Red Hat Bugzilla 2023-09-14 03:14:13 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days