Bug 1123396

Summary: Admin Portal: Unresponsive script leading to Virtual Machines not being displayed any more
Product: Red Hat Enterprise Virtualization Manager Reporter: Martin Tessun <mtessun>
Component: ovirt-engine-webadmin-portalAssignee: Lior Vernia <lvernia>
Status: CLOSED ERRATA QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: medium    
Version: 3.3.0CC: bazulay, ecohen, gklein, gshereme, iheim, jentrena, lvernia, mburman, mtessun, myakove, nyechiel, oourfali, rbalakri, Rhev-m-bugs, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: All   
OS: Linux   
Whiteboard: network
Fixed In Version: org.ovirt.engine-root-3.5.0-21 Doc Type: Bug Fix
Doc Text:
Previously, infrastructural GUID computation for certain entities was highly inefficient. When many virtual machines had to be displayed in the specified sub-tab, this inefficient computation became visible as the browser would wait on it to display the virtual machines. This caused general sluggishness in the browser, and sometimes triggered an "unresponsive script" error message. Now, the GUID computation has been optimized so that the tab data is loaded as fast as other tabs with comparable data sets (the Virtual Machines main tab, for example).
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-11 18:06:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
sscreencast: nice and fast with UUID generation disabled
none
patch -- disable UUID generation (test only!)
none
screencast: nice and fast with UUID generation disabled none

Description Martin Tessun 2014-07-25 14:10:38 UTC
Description of problem:
Select the following Tabs in the Admin Portal:
1. Network Tab
2. Select the first network
3. Select "Virtual Machines" in the "sub Tab" (at the bottom)
4. go through the different Networks by clicking them
5. After some while the browser shows an "unresponsive script message"
6. Click "Stop Script" (as "Continue" just shows the message again and again
7. Select e.g. the "Hosts" Subtab (at the bottom)
9. Go back to the "Virtual Machines" Subtab (at the bottom)


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Network Tab
2. Select the first network
3. Select "Virtual Machines" in the "sub Tab" (at the bottom)
4. go through the different Networks by clicking them
5. After some while the browser shows an "unresponsive script message"
6. Click "Stop Script" (as "Continue" just shows the message again and again
7. Select e.g. the "Hosts" Subtab (at the bottom)
9. Go back to the "Virtual Machines" Subtab (at the bottom)

Actual results:
The Virtual Machines are never shown again, just the three grey dots keep blinking there.

Expected results:
The "unresponsive script" should not pop up and the portal should continue working as expected (showing the Virtual Machines again).

Additional info:
A restart of the Browser solves the problem. Just logging out of the portal and logging in again does not help, but shows the "unresponsive script" message during login already.
Shift-Reload does help, but the browser stays a lot slower and more unresponsive as before.
So this might be related to BZ #1098598 somehow.

Comment 3 Martin Tessun 2014-07-25 14:45:41 UTC
Sorry for the copy-paste mistakes.
Once again therefore:

Description of problem:
Under some conditions the "Virtual Machine" Subtab in the "Network" tab does not show any content, but continues to show the "three grey dots".

Version-Release number of selected component (if applicable):
RHEV-M 3.3.4
Firefox 24.4.0 (CSB)

How reproducible:
Always

Steps to Reproduce:
1. Network Tab
2. Select the first network
3. Select "Virtual Machines" in the "sub Tab" (at the bottom)
4. go through the different Networks by clicking them
5. After some while the browser shows an "unresponsive script message"
6. Click "Stop Script" (as "Continue" just shows the message again and again
7. Select e.g. the "Hosts" Subtab (at the bottom)
9. Go back to the "Virtual Machines" Subtab (at the bottom)

Actual results:
The Virtual Machines are never shown again, just the three grey dots keep blinking there.

Expected results:
The "unresponsive script" should not pop up and the portal should continue working as expected (showing the Virtual Machines again).

Additional info:
A restart of the Browser solves the problem. Just logging out of the portal and logging in again does not help, but shows the "unresponsive script" message during login already.
Shift-Reload does help, but the browser stays a lot slower and more unresponsive as before.
So this might be related to BZ #1098598 somehow.

Comment 4 Oved Ourfali 2014-07-27 05:06:33 UTC
The flow starts with network dialogs, so putting on network team for now to examine this issue.

Comment 5 Lior Vernia 2014-07-29 14:01:39 UTC
I will be very surprised if this turns out to be network-specific.

This reminds me of Bug 906394. I can't seem to reproduce this on my deployments (3.4/3.5), which strengthens this suspicion (as this was fixed in 3.4).

Comment 27 Lior Vernia 2014-10-22 14:50:15 UTC
Just updating that this still exists on the master branch as well - on Firefox 22 the browser becomes sluggish, on Firefox 32 I get the non-responsive script message.

Will try to pin-point the issue, fix it on master and then see where it's convenient enough to backport; not promising anything, 3.5 would be a good candidate if I can fix this soon. Then we'll see about older versions.

Comment 28 Lior Vernia 2014-10-28 12:56:15 UTC
So I haven't been able to find the issue causing this yet, but I've all but ruled out the following theses:

1. The problem doesn't seem to be in NetworkVmListModel, it seems to be very similar to say VmListModel (which also works on pretty much the same size input). I thought it might be the sorting, which in the case of VmListModel is performed in the backend, but commenting out the sorting code in NetworkVmListModel didn't improve the situation.

2. It also doesn't seem to be in the backend query. GetVmsAndNetworkInterfacesByNetworkIdQuery seemed to be sub-optimal as it performs some artificial "joins" via Java, but simplifying the query to do nothing more than fetching the VM data and putting mock interface data in the pairs didn't significantly improve performance.

3. The last point also ruled out another theory I had, that the multiple joins producing vm_interface_view were inefficient (though all columns on which joins are performed are indexed).

This leads me to believe that perhaps there's some peculiar GWT issue, unique to the compilation of this subtab's code to JavaScript. I'm not sure what's special about this subtab - it might have the biggest data set among the subtab models backed by pairs of entities (rather than pure entities).

I think it might be necessary to debug the JavaScript code itself, or perform some profiling to see where time is "wasted".

Comment 29 Lior Vernia 2014-10-28 12:57:39 UTC
Assigning back to ux where more competent frontend developers might be able to deal with it :)

Comment 32 Greg Sheremeta 2014-11-01 13:40:44 UTC
The slowness is being caused by UUID (Guid class) generation for the PairQueryable's. If I comment out the generation and just use a counter, it's nice and responsive. (see attached video)

In PairQueryable.java:

+    static private int counter = 1;
    public Object getQueryableId() {
+        return counter++;
-        return getMemberId(getFirst()) + '.' + getMemberId(getSecond());
    }

It's unclear to me why calling getMemberId causes UUID generation. Perhaps that's the bug.

Assigning back to Lior, since I don't know this code. Patch attached so you can replicate.

Comment 33 Greg Sheremeta 2014-11-01 13:41:42 UTC
Created attachment 952744 [details]
sscreencast: nice and fast with UUID generation disabled

Comment 34 Greg Sheremeta 2014-11-01 13:42:27 UTC
Created attachment 952745 [details]
patch -- disable UUID generation (test only!)

Comment 35 Greg Sheremeta 2014-11-01 14:10:00 UTC
Created attachment 952746 [details]
screencast: nice and fast with UUID generation disabled

Comment 37 Lior Vernia 2014-11-13 14:06:34 UTC
Greg has definitely pointed me in the right direction. The fix will be very infrastructural so needs to be properly tested, but the code changes won't be big. I think we can make it for 3.5, and I'm optimistic about backporting to 3.4. Updates to come soon...

Comment 38 Michael Burman 2014-11-24 08:46:36 UTC
Scale team should verify this bug, thank you.

Comment 39 Michael Burman 2014-11-24 10:30:14 UTC
Verified on - 3.5.0-0.21.el6ev
With 2 VM pools, each pool with 200 VM's(no disks, one nic per VM).

Comment 41 errata-xmlrpc 2015-02-11 18:06:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0158.html