Bug 1293920 - [scale] - vms tab under cluster tab running slow performance and crashed the UI
[scale] - vms tab under cluster tab running slow performance and crashed the UI
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: Frontend.WebAdmin (Show other bugs)
3.6.1.1
x86_64 Linux
high Severity high (vote)
: ovirt-4.1.0-alpha
: 4.1.0
Assigned To: Greg Sheremeta
Eldad Marciano
:
Depends On: 1368101
Blocks: 1388462
  Show dependency treegraph
 
Reported: 2015-12-23 09:09 EST by Eldad Marciano
Modified: 2017-02-01 09:49 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-02-01 09:49:55 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.1+
rule-engine: planning_ack+
oourfali: devel_ack+
eberman: testing_ack+


Attachments (Terms of Use)
server.log (562 bytes, text/plain)
2016-08-02 08:56 EDT, guy chen
no flags Details
engine log (8.92 MB, text/plain)
2016-08-02 08:58 EDT, guy chen
no flags Details
engine log after shutting down hosts and VMS (48.37 KB, application/zip)
2016-08-24 10:25 EDT, guy chen
no flags Details

  None (edit)
Description Eldad Marciano 2015-12-23 09:09:44 EST
Description of problem:
when browsing to rhevm > click on specific DC (in tree) > click on clusters tab > click \ choose some loaded (3K vms) cluster > click on the vm tab. 


not clear if's caused by bad DB query, or UI limitation 

Version-Release number of selected component (if applicable):
rhevm 3.6.1.1

How reproducible:
100%

Steps to Reproduce:
1. describes above


Actual results:
slow performance and UI crashed

Expected results:
good performance, stable UI.

Additional info:
Comment 1 Oved Ourfali 2016-01-08 03:03:07 EST
Eldad, any logs? Jprofiler data? DB traces?
Comment 2 Oved Ourfali 2016-02-18 08:57:48 EST
Targeting to 4.0, assuming we'll get more feedback.
If not, the bug will be closed.
Comment 3 Eldad Marciano 2016-03-30 11:03:08 EDT
seems like this query running slow:
SELECT * FROM ((SELECT distinct vms.* FROM  vms   WHERE  vms.vds_group_name LIKE 'fake\\_cluster\\_2' )  ORDER BY vm_name ASC ) as T1 OFFSET (1 -1) LIMIT 2147483647

~14 sec. UI crash didnt reproduced.

i'll keep investigate it.
Comment 4 Eldad Marciano 2016-03-30 12:04:43 EDT
(In reply to Eldad Marciano from comment #3)
> seems like this query running slow:
> SELECT * FROM ((SELECT distinct vms.* FROM  vms   WHERE  vms.vds_group_name
> LIKE 'fake\\_cluster\\_2' )  ORDER BY vm_name ASC ) as T1 OFFSET (1 -1)
> LIMIT 2147483647
> 
> ~14 sec. UI crash didnt reproduced.
> 
> i'll keep investigate it.


by deeper investigation the query runs pretty well. ~2 sec.
also the query explain show this
"Total runtime: 1423.638 ms"

it might be related to the UI since there is no limitation for the objects that loads into this tab, unlike the original vms tab which has 100 object limitation.

when i tried to scroll down as long as i can, at some point the UI freeze.
Comment 5 Greg Sheremeta 2016-03-31 12:04:31 EDT
> it might be related to the UI since there is no limitation for the objects that loads into this tab, unlike the original vms tab which has 100 object limitation.

That would be my guess. It seems like we need sub-tab paging.

Also possibly just a duplicate of Bug 1294678. Let's test this problem with Bug 1294678's patches applied.
Comment 6 Greg Sheremeta 2016-04-21 20:51:23 EDT
@Eldad, can you please re-test this on 3.6.5 and report back?
Comment 7 Sandro Bonazzola 2016-05-02 06:03:03 EDT
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
Comment 8 Yaniv Lavi (Dary) 2016-05-23 09:18:15 EDT
oVirt 4.0 beta has been released, moving to RC milestone.
Comment 9 Yaniv Lavi (Dary) 2016-05-23 09:22:18 EDT
oVirt 4.0 beta has been released, moving to RC milestone.
Comment 10 Oved Ourfali 2016-05-25 10:12:18 EDT
Moving to 4.1, hoping that it is fixed now with the latest changes.
Comment 11 Eldad Marciano 2016-07-31 09:00:25 EDT
(In reply to Greg Sheremeta from comment #6)
> @Eldad, can you please re-test this on 3.6.5 and report back?

It reproduce on top of 3.6.8, and we facing it when browsing to landing page as well.
Comment 12 Eldad Marciano 2016-07-31 09:28:01 EDT
both on Firefox and chrome
Comment 13 Oved Ourfali 2016-07-31 13:18:43 EDT
Sounds weird that landing page is related. 
Can you specify exactly what you're doing? 
Logs as well?
Comment 14 guy chen 2016-08-02 08:56 EDT
Created attachment 1186814 [details]
server.log
Comment 15 guy chen 2016-08-02 08:58 EDT
Created attachment 1186815 [details]
engine log
Comment 16 guy chen 2016-08-02 08:59:26 EDT
It's on several pages, sometime on landing page, sometime on hosts page, and on Virtual Machines page.
The symptom is page get timed out, and we get a browser message "something went wrong while displaying this webpage" message.
The scenario is entering the page.
Comment 17 Yaniv Kaul 2016-08-22 06:11:14 EDT
(In reply to guy chen from comment #15)
> Created attachment 1186815 [details]
> engine log

Please compress large logs.
Comment 18 Yaniv Kaul 2016-08-22 06:22:36 EDT
(In reply to guy chen from comment #15)
> Created attachment 1186815 [details]
> engine log

This logs shows a very ill system:
[ykaul@ykaul Downloads]$ egrep -c "Failed to fetch vms info for host|VDSGenericException" attachment.cgi.txt 
18594
[ykaul@ykaul Downloads]$ cat attachment.cgi.txt |wc -l
35765

So about 1/2 of the log is about inability to fetch VMs data from the (fake) hosts. 



So really this host just tries to connect to hosts and get data. It's way too busy in that area. Does it happen in a reasonable host, alive and running?
Comment 19 guy chen 2016-08-24 10:25 EDT
Created attachment 1193661 [details]
engine log after shutting down hosts and VMS
Comment 20 Yaniv Kaul 2016-08-24 10:29:13 EDT
(In reply to guy chen from comment #19)
> Created attachment 1193661 [details]
> engine log after shutting down hosts and VMS

This is even worse than before. It might be a different bug - erratic behavior when you have so many hosts in ill conditions, but this log is useless for this bug.
Comment 21 guy chen 2016-08-24 10:34:49 EDT
we don't have another environment to test it on right now, but tested the following scenario to check if these errors are the root cause.
I have shut down all hosts and VMS and restarted the server.
the error "Failed to fetch vms info for host|VDSGenericException" do not reproduce.
we do have some errors on authentication, but small numbers :
[root@bkr-hv05 ~]#  egrep -c ERROR /var/log/ovirt-engine/server.log
20
The UI keep freezing after shutting down the hosts and VMS.
Log of the server is attached.
Comment 23 Oved Ourfali 2016-09-29 05:29:00 EDT
Hopefully fixing the leak will resolve this as well.
Adding a dependency.
Comment 24 Alexander Wels 2016-09-29 08:27:55 EDT
Note that the memory leak fixes we are doing ONLY apply to popup dialogs, not to main tabs and sub tabs. Those are all singletons and thus will not be destroyed. So I highly doubt that the memory leak fixes we are doing will solve whatever this problem is.
Comment 25 Greg Sheremeta 2016-11-29 19:23:52 EST
(In reply to Alexander Wels from comment #24)
> Note that the memory leak fixes we are doing ONLY apply to popup dialogs,
> not to main tabs and sub tabs. Those are all singletons and thus will not be
> destroyed. So I highly doubt that the memory leak fixes we are doing will
> solve whatever this problem is.

Actually, we did memory leak fixes for tooltips, and we've also switched over to gwt-rpc. There is a good chance this is much improved now.

Moving to MODIFIED. Please test a healthy scale system and fail this if it's still an issue. UX team's scale results show pretty good performance in 4.1 master and 4.0.6.
Comment 26 Sandro Bonazzola 2016-12-12 08:56:15 EST
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.
Comment 27 Eldad Marciano 2017-01-09 09:53:23 EST
For now, the user interface is not crashing, but when scrolling down in the vms tab, the UI behaves very slow (specially the scrolling).

Greg, please advise
Comment 28 Oved Ourfali 2017-01-09 10:17:39 EST
What browser are you using?
Comment 29 Eldad Marciano 2017-01-09 11:16:33 EST
(In reply to Oved Ourfali from comment #28)
> What browser are you using?

I verified for Chrome and Firefox.

chrome version:
Version 55.0.2883.95 (64-bit)

Firefox version:
45.3.0

ovirt-engine:
ovirt-engine-4.1.0-0.3.beta2.el7.noarch
Comment 30 Oved Ourfali 2017-01-09 14:19:05 EST
So from UX perspective the issue is fixed and verified. You can consider opening a virt bug on improving the DB query, or the backend query to call in this use case to pass lass data, or to query only for a partial list of vms. This sub tab doesn't seem that valuable for a large number of vms anyway.... I'd consider removing it entirely.
Comment 31 Greg Sheremeta 2017-01-09 14:52:46 EST
(In reply to Oved Ourfali from comment #30)
> So from UX perspective the issue is fixed and verified. You can consider
> opening a virt bug on improving the DB query, or the backend query to call
> in this use case to pass lass data, or to query only for a partial list of
> vms. This sub tab doesn't seem that valuable for a large number of vms
> anyway.... I'd consider removing it entirely.

I'd suggest the sub-tab implement paging. If we wanted to keep it, I think that's the only thing that would fix the issue.
Comment 32 Eldad Marciano 2017-01-12 05:55:57 EST
(In reply to Oved Ourfali from comment #30)
> So from UX perspective the issue is fixed and verified. You can consider
> opening a virt bug on improving the DB query, or the backend query to call
> in this use case to pass lass data, or to query only for a partial list of
> vms. This sub tab doesn't seem that valuable for a large number of vms
> anyway.... I'd consider removing it entirely.

so moving to verified.
Comment 33 Eldad Marciano 2017-01-12 05:56:25 EST
(In reply to Greg Sheremeta from comment #31)
> (In reply to Oved Ourfali from comment #30)
> > So from UX perspective the issue is fixed and verified. You can consider
> > opening a virt bug on improving the DB query, or the backend query to call
> > in this use case to pass lass data, or to query only for a partial list of
> > vms. This sub tab doesn't seem that valuable for a large number of vms
> > anyway.... I'd consider removing it entirely.
> 
> I'd suggest the sub-tab implement paging. If we wanted to keep it, I think
> that's the only thing that would fix the issue.

we'll open a new bug for it later on.

Note You need to log in before you can comment on or make changes to this bug.