Bug 1301545

Summary: [scale] - loading XHR objects taking too long on landing page on WAN, with many objects (525 hosts, 11071 VMs)
Product: [oVirt] ovirt-engine Reporter: Eldad Marciano <emarcian>
Component: Frontend.WebAdminAssignee: Greg Sheremeta <gshereme>
Status: CLOSED CURRENTRELEASE QA Contact: eberman
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.2CC: bugs, dagur, eberman, vszocs, ykaul
Target Milestone: ovirt-4.2.0Keywords: Performance
Target Release: 4.2.0Flags: rule-engine: ovirt-4.2+
rule-engine: planning_ack+
rule-engine: devel_ack+
eberman: testing_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-12 12:56:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: UX RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Eldad Marciano 2016-01-25 11:17:25 UTC
Description of problem:
very long response time when browsing to rhevm-webadmin page.
it takes too long to load the landing page.
most of the time spent on XHR objects.

see the following link
http://pastebin.test.redhat.com/343246

more over seems like some XHR (GWT) queries repeat them self.

it takes something like ~10min to completely load the page.


engine profiling snapshots - in progress.

Version-Release number of selected component (if applicable):
rhevm 3.6.2

How reproducible:
100%

Steps to Reproduce:
1. login into loaded rhevm.


Actual results:
very long response time for landing page.

Expected results:
nice & fast response time for landing page.

Additional info:

Comment 1 Yaniv Kaul 2016-01-25 11:25:20 UTC
Eldad,
- Does this happen all the time? I assume only on high scale environment, right? 
- What are the numbers of objects that you have in the system?
- What browser is used? did you try with different browsers?
- Is the engine and/or the client maxed out, CPU or memory-wise?
- Is the engine and the UI on the same LAN (at least on the same site) ?

Comment 2 Eldad Marciano 2016-01-25 15:44:57 UTC
(In reply to Yaniv Kaul from comment #1)
> Eldad,
> - Does this happen all the time? I assume only on high scale environment,
> right? 
yes, specially on high scale.
> - What are the numbers of objects that you have in the system?
DC 2, Clusters 3, SD 13, hosts 525, vms 11071.
> - What browser is used? did you try with different browsers?
firefox and chrome - firefox working much better, but still slow as i mention above.
> - Is the engine and/or the client maxed out, CPU or memory-wise?
engine is pretty idle, but the client running the browser shows high performance.
> - Is the engine and the UI on the same LAN (at least on the same site) ?
No - WAN, we need compare it to LAN.

Comment 4 Vojtech Szocs 2016-01-29 15:03:31 UTC
Hi Eldad,

> see the following link
> http://pastebin.test.redhat.com/343246

Those are GWT RPC request payloads. Each request specifies the remote interface (GenericApiGWTService), specific method of that interface (runMultipleQueries etc) along with any method parameters.

As you wrote, GWT RPC requests are issued via XMLHttpRequest (XHR). Obviously, HTTP over WAN has dramatic impact on response times. When GWT RPC response arrives on client, it is evaluated which can take up some more time (bigger response payloads take more time to process). So the response times you've recorded are essentially a combination of high network latency + big response payload (lots of VMs etc).

Unfortunately, we can't do much about GWT RPC response processing since that code is part of GWT SDK itself. We could attempt to "hack" it but I would strongly object to that simply due to the risk involved.

What we can do is to reduce/optimize client's requests in environments like WAN + scale combo. For example, have WebAdmin fetch and display a limited amount of data, which should make it more responsive.

> more over seems like some XHR (GWT) queries repeat them self.

In WebAdmin, each main tab grid automatically refreshes its data. This is by design as we currently don't have any mechanism to push data from server to client.

> it takes something like ~10min to completely load the page.

There are lots of factors involved behind this number:

* specific browser (rendering engine + JS engine performance)
* network latency (LAN vs WAN etc) to fetch remote resources
  - including any HTML/JS/CSS/etc fetched beyond initial load
  - including any GWT RPC requests
* amount of initial data to display (GWT RPC response processing)

Some things we can't influence, but we can make optimizations:

1, reduce amount of data displayed in high latency / scale env.
   - UI grids showing less data rows (to be made configurable)
2, reduce number of necessary HTTP roundtrips
   - don't refresh UI grids too often (configurable already)
   - inline GWT selector script inside initial page

What are your thoughts?

Comment 13 Greg Sheremeta 2017-12-30 13:29:08 UTC
@Eldad -- please retest and verify / fail, or close currentrelease.

Comment 15 Yaniv Kaul 2017-12-31 10:08:19 UTC
Please see comment #2 - it was reported on WAN, not LAN. Did you test it with some latency / bandwidth constraints (I personally believe latency is the main factor here, not the bandwidth!)?

Comment 16 Daniel Gur 2018-04-25 09:23:54 UTC
Removing Need Info as this bug is already closed.