Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1293920

Summary:

[scale] - vms tab under cluster tab running slow performance and crashed the UI

Product:

[oVirt] ovirt-engine

Reporter:

Eldad Marciano <emarcian>

Component:

Frontend.WebAdmin

Assignee:

Greg Sheremeta <gshereme>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Eldad Marciano <emarcian>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.6.1.1

CC:

awels, bugs, eberman, emarcian, gshereme, guchen, michal.skrivanek, mperina, oourfali

Target Milestone:

ovirt-4.1.0-alpha

Flags:

rule-engine: ovirt-4.1+
rule-engine: planning_ack+
oourfali: devel_ack+
eberman: testing_ack+

Target Release:

4.1.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-02-01 14:49:55 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Infra

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1368101

Bug Blocks:

1388462

Attachments:

Description	Flags
server.log	none
engine log	none
engine log after shutting down hosts and VMS	none

Description Eldad Marciano 2015-12-23 14:09:44 UTC

Description of problem:
when browsing to rhevm > click on specific DC (in tree) > click on clusters tab > click \ choose some loaded (3K vms) cluster > click on the vm tab. 


not clear if's caused by bad DB query, or UI limitation 

Version-Release number of selected component (if applicable):
rhevm 3.6.1.1

How reproducible:
100%

Steps to Reproduce:
1. describes above


Actual results:
slow performance and UI crashed

Expected results:
good performance, stable UI.

Additional info:

Comment 1 Oved Ourfali 2016-01-08 08:03:07 UTC

Eldad, any logs? Jprofiler data? DB traces?

Comment 2 Oved Ourfali 2016-02-18 13:57:48 UTC

Targeting to 4.0, assuming we'll get more feedback.
If not, the bug will be closed.

Comment 3 Eldad Marciano 2016-03-30 15:03:08 UTC

seems like this query running slow:
SELECT * FROM ((SELECT distinct vms.* FROM  vms   WHERE  vms.vds_group_name LIKE 'fake\\_cluster\\_2' )  ORDER BY vm_name ASC ) as T1 OFFSET (1 -1) LIMIT 2147483647

~14 sec. UI crash didnt reproduced.

i'll keep investigate it.

Comment 4 Eldad Marciano 2016-03-30 16:04:43 UTC

(In reply to Eldad Marciano from comment #3)
> seems like this query running slow:
> SELECT * FROM ((SELECT distinct vms.* FROM  vms   WHERE  vms.vds_group_name
> LIKE 'fake\\_cluster\\_2' )  ORDER BY vm_name ASC ) as T1 OFFSET (1 -1)
> LIMIT 2147483647
> 
> ~14 sec. UI crash didnt reproduced.
> 
> i'll keep investigate it.


by deeper investigation the query runs pretty well. ~2 sec.
also the query explain show this
"Total runtime: 1423.638 ms"

it might be related to the UI since there is no limitation for the objects that loads into this tab, unlike the original vms tab which has 100 object limitation.

when i tried to scroll down as long as i can, at some point the UI freeze.

Comment 5 Greg Sheremeta 2016-03-31 16:04:31 UTC

> it might be related to the UI since there is no limitation for the objects that loads into this tab, unlike the original vms tab which has 100 object limitation.

That would be my guess. It seems like we need sub-tab paging.

Also possibly just a duplicate of Bug 1294678. Let's test this problem with Bug 1294678's patches applied.

Comment 6 Greg Sheremeta 2016-04-22 00:51:23 UTC

@Eldad, can you please re-test this on 3.6.5 and report back?

Comment 7 Sandro Bonazzola 2016-05-02 10:03:03 UTC

Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 8 Yaniv Lavi 2016-05-23 13:18:15 UTC

oVirt 4.0 beta has been released, moving to RC milestone.

Comment 9 Yaniv Lavi 2016-05-23 13:22:18 UTC

oVirt 4.0 beta has been released, moving to RC milestone.

Comment 10 Oved Ourfali 2016-05-25 14:12:18 UTC

Moving to 4.1, hoping that it is fixed now with the latest changes.

Comment 11 Eldad Marciano 2016-07-31 13:00:25 UTC

(In reply to Greg Sheremeta from comment #6)
> @Eldad, can you please re-test this on 3.6.5 and report back?

It reproduce on top of 3.6.8, and we facing it when browsing to landing page as well.

Comment 12 Eldad Marciano 2016-07-31 13:28:01 UTC

both on Firefox and chrome

Comment 13 Oved Ourfali 2016-07-31 17:18:43 UTC

Sounds weird that landing page is related. 
Can you specify exactly what you're doing? 
Logs as well?

Comment 14 guy chen 2016-08-02 12:56:39 UTC

Created attachment 1186814 [details]
server.log

Comment 15 guy chen 2016-08-02 12:58:28 UTC

Created attachment 1186815 [details]
engine log

Comment 16 guy chen 2016-08-02 12:59:26 UTC

It's on several pages, sometime on landing page, sometime on hosts page, and on Virtual Machines page.
The symptom is page get timed out, and we get a browser message "something went wrong while displaying this webpage" message.
The scenario is entering the page.

Comment 17 Yaniv Kaul 2016-08-22 10:11:14 UTC

(In reply to guy chen from comment #15)
> Created attachment 1186815 [details]
> engine log

Please compress large logs.

Comment 18 Yaniv Kaul 2016-08-22 10:22:36 UTC

(In reply to guy chen from comment #15)
> Created attachment 1186815 [details]
> engine log

This logs shows a very ill system:
[ykaul@ykaul Downloads]$ egrep -c "Failed to fetch vms info for host|VDSGenericException" attachment.cgi.txt 
18594
[ykaul@ykaul Downloads]$ cat attachment.cgi.txt |wc -l
35765

So about 1/2 of the log is about inability to fetch VMs data from the (fake) hosts. 



So really this host just tries to connect to hosts and get data. It's way too busy in that area. Does it happen in a reasonable host, alive and running?

Comment 19 guy chen 2016-08-24 14:25:49 UTC

Created attachment 1193661 [details]
engine log after shutting down hosts and VMS

Comment 20 Yaniv Kaul 2016-08-24 14:29:13 UTC

(In reply to guy chen from comment #19)
> Created attachment 1193661 [details]
> engine log after shutting down hosts and VMS

This is even worse than before. It might be a different bug - erratic behavior when you have so many hosts in ill conditions, but this log is useless for this bug.

Comment 21 guy chen 2016-08-24 14:34:49 UTC

we don't have another environment to test it on right now, but tested the following scenario to check if these errors are the root cause.
I have shut down all hosts and VMS and restarted the server.
the error "Failed to fetch vms info for host|VDSGenericException" do not reproduce.
we do have some errors on authentication, but small numbers :
[root@bkr-hv05 ~]#  egrep -c ERROR /var/log/ovirt-engine/server.log
20
The UI keep freezing after shutting down the hosts and VMS.
Log of the server is attached.

Comment 23 Oved Ourfali 2016-09-29 09:29:00 UTC

Hopefully fixing the leak will resolve this as well.
Adding a dependency.

Comment 24 Alexander Wels 2016-09-29 12:27:55 UTC

Note that the memory leak fixes we are doing ONLY apply to popup dialogs, not to main tabs and sub tabs. Those are all singletons and thus will not be destroyed. So I highly doubt that the memory leak fixes we are doing will solve whatever this problem is.

Comment 25 Greg Sheremeta 2016-11-30 00:23:52 UTC

(In reply to Alexander Wels from comment #24)
> Note that the memory leak fixes we are doing ONLY apply to popup dialogs,
> not to main tabs and sub tabs. Those are all singletons and thus will not be
> destroyed. So I highly doubt that the memory leak fixes we are doing will
> solve whatever this problem is.

Actually, we did memory leak fixes for tooltips, and we've also switched over to gwt-rpc. There is a good chance this is much improved now.

Moving to MODIFIED. Please test a healthy scale system and fail this if it's still an issue. UX team's scale results show pretty good performance in 4.1 master and 4.0.6.

Comment 26 Sandro Bonazzola 2016-12-12 13:56:15 UTC

The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.

Comment 27 Eldad Marciano 2017-01-09 14:53:23 UTC

For now, the user interface is not crashing, but when scrolling down in the vms tab, the UI behaves very slow (specially the scrolling).

Greg, please advise

Comment 28 Oved Ourfali 2017-01-09 15:17:39 UTC

What browser are you using?

Comment 29 Eldad Marciano 2017-01-09 16:16:33 UTC

(In reply to Oved Ourfali from comment #28)
> What browser are you using?

I verified for Chrome and Firefox.

chrome version:
Version 55.0.2883.95 (64-bit)

Firefox version:
45.3.0

ovirt-engine:
ovirt-engine-4.1.0-0.3.beta2.el7.noarch

Comment 30 Oved Ourfali 2017-01-09 19:19:05 UTC

So from UX perspective the issue is fixed and verified. You can consider opening a virt bug on improving the DB query, or the backend query to call in this use case to pass lass data, or to query only for a partial list of vms. This sub tab doesn't seem that valuable for a large number of vms anyway.... I'd consider removing it entirely.

Comment 31 Greg Sheremeta 2017-01-09 19:52:46 UTC

(In reply to Oved Ourfali from comment #30)
> So from UX perspective the issue is fixed and verified. You can consider
> opening a virt bug on improving the DB query, or the backend query to call
> in this use case to pass lass data, or to query only for a partial list of
> vms. This sub tab doesn't seem that valuable for a large number of vms
> anyway.... I'd consider removing it entirely.

I'd suggest the sub-tab implement paging. If we wanted to keep it, I think that's the only thing that would fix the issue.

Comment 32 Eldad Marciano 2017-01-12 10:55:57 UTC

(In reply to Oved Ourfali from comment #30)
> So from UX perspective the issue is fixed and verified. You can consider
> opening a virt bug on improving the DB query, or the backend query to call
> in this use case to pass lass data, or to query only for a partial list of
> vms. This sub tab doesn't seem that valuable for a large number of vms
> anyway.... I'd consider removing it entirely.

so moving to verified.

Comment 33 Eldad Marciano 2017-01-12 10:56:25 UTC

(In reply to Greg Sheremeta from comment #31)
> (In reply to Oved Ourfali from comment #30)
> > So from UX perspective the issue is fixed and verified. You can consider
> > opening a virt bug on improving the DB query, or the backend query to call
> > in this use case to pass lass data, or to query only for a partial list of
> > vms. This sub tab doesn't seem that valuable for a large number of vms
> > anyway.... I'd consider removing it entirely.
> 
> I'd suggest the sub-tab implement paging. If we wanted to keep it, I think
> that's the only thing that would fix the issue.

we'll open a new bug for it later on.