Description of problem: when browsing to rhevm > click on specific DC (in tree) > click on clusters tab > click \ choose some loaded (3K vms) cluster > click on the vm tab. not clear if's caused by bad DB query, or UI limitation Version-Release number of selected component (if applicable): rhevm 3.6.1.1 How reproducible: 100% Steps to Reproduce: 1. describes above Actual results: slow performance and UI crashed Expected results: good performance, stable UI. Additional info:
Eldad, any logs? Jprofiler data? DB traces?
Targeting to 4.0, assuming we'll get more feedback. If not, the bug will be closed.
seems like this query running slow: SELECT * FROM ((SELECT distinct vms.* FROM vms WHERE vms.vds_group_name LIKE 'fake\\_cluster\\_2' ) ORDER BY vm_name ASC ) as T1 OFFSET (1 -1) LIMIT 2147483647 ~14 sec. UI crash didnt reproduced. i'll keep investigate it.
(In reply to Eldad Marciano from comment #3) > seems like this query running slow: > SELECT * FROM ((SELECT distinct vms.* FROM vms WHERE vms.vds_group_name > LIKE 'fake\\_cluster\\_2' ) ORDER BY vm_name ASC ) as T1 OFFSET (1 -1) > LIMIT 2147483647 > > ~14 sec. UI crash didnt reproduced. > > i'll keep investigate it. by deeper investigation the query runs pretty well. ~2 sec. also the query explain show this "Total runtime: 1423.638 ms" it might be related to the UI since there is no limitation for the objects that loads into this tab, unlike the original vms tab which has 100 object limitation. when i tried to scroll down as long as i can, at some point the UI freeze.
> it might be related to the UI since there is no limitation for the objects that loads into this tab, unlike the original vms tab which has 100 object limitation. That would be my guess. It seems like we need sub-tab paging. Also possibly just a duplicate of Bug 1294678. Let's test this problem with Bug 1294678's patches applied.
@Eldad, can you please re-test this on 3.6.5 and report back?
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
oVirt 4.0 beta has been released, moving to RC milestone.
Moving to 4.1, hoping that it is fixed now with the latest changes.
(In reply to Greg Sheremeta from comment #6) > @Eldad, can you please re-test this on 3.6.5 and report back? It reproduce on top of 3.6.8, and we facing it when browsing to landing page as well.
both on Firefox and chrome
Sounds weird that landing page is related. Can you specify exactly what you're doing? Logs as well?
Created attachment 1186814 [details] server.log
Created attachment 1186815 [details] engine log
It's on several pages, sometime on landing page, sometime on hosts page, and on Virtual Machines page. The symptom is page get timed out, and we get a browser message "something went wrong while displaying this webpage" message. The scenario is entering the page.
(In reply to guy chen from comment #15) > Created attachment 1186815 [details] > engine log Please compress large logs.
(In reply to guy chen from comment #15) > Created attachment 1186815 [details] > engine log This logs shows a very ill system: [ykaul@ykaul Downloads]$ egrep -c "Failed to fetch vms info for host|VDSGenericException" attachment.cgi.txt 18594 [ykaul@ykaul Downloads]$ cat attachment.cgi.txt |wc -l 35765 So about 1/2 of the log is about inability to fetch VMs data from the (fake) hosts. So really this host just tries to connect to hosts and get data. It's way too busy in that area. Does it happen in a reasonable host, alive and running?
Created attachment 1193661 [details] engine log after shutting down hosts and VMS
(In reply to guy chen from comment #19) > Created attachment 1193661 [details] > engine log after shutting down hosts and VMS This is even worse than before. It might be a different bug - erratic behavior when you have so many hosts in ill conditions, but this log is useless for this bug.
we don't have another environment to test it on right now, but tested the following scenario to check if these errors are the root cause. I have shut down all hosts and VMS and restarted the server. the error "Failed to fetch vms info for host|VDSGenericException" do not reproduce. we do have some errors on authentication, but small numbers : [root@bkr-hv05 ~]# egrep -c ERROR /var/log/ovirt-engine/server.log 20 The UI keep freezing after shutting down the hosts and VMS. Log of the server is attached.
Hopefully fixing the leak will resolve this as well. Adding a dependency.
Note that the memory leak fixes we are doing ONLY apply to popup dialogs, not to main tabs and sub tabs. Those are all singletons and thus will not be destroyed. So I highly doubt that the memory leak fixes we are doing will solve whatever this problem is.
(In reply to Alexander Wels from comment #24) > Note that the memory leak fixes we are doing ONLY apply to popup dialogs, > not to main tabs and sub tabs. Those are all singletons and thus will not be > destroyed. So I highly doubt that the memory leak fixes we are doing will > solve whatever this problem is. Actually, we did memory leak fixes for tooltips, and we've also switched over to gwt-rpc. There is a good chance this is much improved now. Moving to MODIFIED. Please test a healthy scale system and fail this if it's still an issue. UX team's scale results show pretty good performance in 4.1 master and 4.0.6.
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.
For now, the user interface is not crashing, but when scrolling down in the vms tab, the UI behaves very slow (specially the scrolling). Greg, please advise
What browser are you using?
(In reply to Oved Ourfali from comment #28) > What browser are you using? I verified for Chrome and Firefox. chrome version: Version 55.0.2883.95 (64-bit) Firefox version: 45.3.0 ovirt-engine: ovirt-engine-4.1.0-0.3.beta2.el7.noarch
So from UX perspective the issue is fixed and verified. You can consider opening a virt bug on improving the DB query, or the backend query to call in this use case to pass lass data, or to query only for a partial list of vms. This sub tab doesn't seem that valuable for a large number of vms anyway.... I'd consider removing it entirely.
(In reply to Oved Ourfali from comment #30) > So from UX perspective the issue is fixed and verified. You can consider > opening a virt bug on improving the DB query, or the backend query to call > in this use case to pass lass data, or to query only for a partial list of > vms. This sub tab doesn't seem that valuable for a large number of vms > anyway.... I'd consider removing it entirely. I'd suggest the sub-tab implement paging. If we wanted to keep it, I think that's the only thing that would fix the issue.
(In reply to Oved Ourfali from comment #30) > So from UX perspective the issue is fixed and verified. You can consider > opening a virt bug on improving the DB query, or the backend query to call > in this use case to pass lass data, or to query only for a partial list of > vms. This sub tab doesn't seem that valuable for a large number of vms > anyway.... I'd consider removing it entirely. so moving to verified.
(In reply to Greg Sheremeta from comment #31) > (In reply to Oved Ourfali from comment #30) > > So from UX perspective the issue is fixed and verified. You can consider > > opening a virt bug on improving the DB query, or the backend query to call > > in this use case to pass lass data, or to query only for a partial list of > > vms. This sub tab doesn't seem that valuable for a large number of vms > > anyway.... I'd consider removing it entirely. > > I'd suggest the sub-tab implement paging. If we wanted to keep it, I think > that's the only thing that would fix the issue. we'll open a new bug for it later on.