Bug 1141515 - [scale] engine webadmin response time delayed 5 min due to '/ovirt-engine/webadmin/<GUID>.cache.html'
Summary: [scale] engine webadmin response time delayed 5 min due to '/ovirt-engine/web...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.5.0
Assignee: Alexander Wels
QA Contact: Eldad Marciano
URL:
Whiteboard: ux
Depends On:
Blocks: rhev35rcblocker rhev35gablocker
TreeView+ depends on / blocked
 
Reported: 2014-09-14 09:52 UTC by Eldad Marciano
Modified: 2015-01-18 08:38 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-09 18:07:05 UTC
oVirt Team: ---


Attachments (Terms of Use)
chrome breakdown snapshot (152.28 KB, image/png)
2014-09-23 08:47 UTC, Eldad Marciano
no flags Details

Description Eldad Marciano 2014-09-14 09:52:21 UTC
Description of problem:
In trying to log-in to engine (UI)
i have experiencing long response time delay.
by first analyze '/ovirt-engine/webadmin/<GUID>.cache.html' is the most heavy element (5 min) for receiving.

it might be related to scale setup(500 hosts \ 10K vms).
but important to say in scale setup of 3.4 this this does not happens.

cache.html size \ content = 4.3MB

Version-Release number of selected component (if applicable):
3.5

How reproducible:
100%

Steps to Reproduce:
1. 

Actual results:
cache.html latency regression. 

Expected results:
similar to 3.4 (2.29 sec)

Additional info:
 attached web breakdown screenshot

Comment 1 Alexander Wels 2014-09-18 13:04:04 UTC
Hi,

I am having a little trouble understanding the issue, so let me ask some questions:

1. If you go to http://<engine-host>/ovirt-engine that is fast? (<1 second)
2. If you go to http://<engine-host>/ovirt-engine/webadmin the login page shows up fast? (<1 second)
3. If you log into the webadmin it takes 5 minutes for the UI to show up or for the grid to be populated? From your description it seems like it takes a long time for the UI to show up, but I want to make sure.
4. Does it not take a long time if you don't have any hosts or VMs? What about only a few 2-5 hosts and < 100 vms?
5. Can you see any errors or warnings in the engine.log or server.log?

Comment 2 Einav Cohen 2014-09-18 16:44:41 UTC
Eldad, in addition to Alexander's questions in Comment #1:

you mentioned in the description:

> Additional info:
>  attached web breakdown screenshot

can you please attach it?

thanks.

Comment 3 Vojtech Szocs 2014-09-19 12:13:17 UTC
<md5>.cache.html is GWT application permutation file for given browser and locale, containing all application code (JavaScript) required for the initial download and display (for given browser and locale).

These permutation files should be cached by browser upon first download.

Please clear your browser cache, open Firefox / developer tools / network monitor and reload WebAdmin page:
* look for HTTP request to <md5>.cache.html and inspect timings of this request
* ensure this HTTP request contains appropriate Cache-Control (big max age) + Expires (now + 1 year) response headers

Now, reload WebAdmin page again:
* look for HTTP request to webadmin.nocache.js -- it shuld be 304 (Not Modified)
* because webadmin.nocache.js is 304, <md5>.cache.html should not appear on request list

Comment 4 Eldad Marciano 2014-09-23 08:47:01 UTC
Created attachment 940319 [details]
chrome breakdown snapshot

Comment 5 Vojtech Szocs 2014-09-24 11:41:44 UTC
It's also likely that WebAdmin UI is slow simply due to having to process big amounts of entity data (lots of VMs and Hosts).

(GWT RPC serialization performance degrades with big amounts of data transferred between client and server.)

As Alex wrote above, please try comparing user experience in WebAdmin using scale setup vs. WebAdmin using few VMs / Hosts.

Comment 6 Vojtech Szocs 2014-09-24 11:52:08 UTC
Hm, looking at "Chrome breakdown snapshot" screenshot, it takes 5 min to fetch GWT permutation file (cache.html) from Engine application server.

Can you please compare this with fetching this GWT permutation file (cache.html) directly, i.e. via browser tab or wget?

Can you please make REST API call for comparison (Engine responsiveness), i.e. /ovirt-engine/api ?

Comment 7 Einav Cohen 2014-09-24 13:13:40 UTC
(In reply to Eldad Marciano from comment #4)
> Created attachment 940319 [details]
> chrome breakdown snapshot

Thanks, Eldad. 
In order to better understand the root cause of the problem, can you please also answer Alexander's questions in Comment #1?

Comment 10 Alexander Wels 2014-10-03 16:45:23 UTC
Eldad,

Can you please answer the specific questions in comment #1, or ping me on IRC, so we can see if we can work out what is going on.

Thanks,
Alexander

Comment 11 Eldad Marciano 2014-10-05 14:54:38 UTC
Alexander,
the number i have mention at the bug description related to WAN SSL authentication.
i have rebuild my setup using no SSL and i can see extremely faster results
i'll rebuild it using SSL in order to reproduced the problem. 

see the following results:

1. http://<engine-host>/ovirt-engine = 12 requests in 328 ms.
2. http://<engine-host>/ovirt-engine/webadmin = 22 requests in 2.9 seconds
3. 
yep the UI is delayed but and the object which delayed is the cahch.html ( see the attached snapshot) 

reproduced on SSL.

4. not sure yet i'll build another lab in order to test that.
5. nope, but once im logged in, there is 500 response for this:
 /ovirt-engine/services/reports-interface-proxy?command=status

the response is:
<html><head><title>JBoss Web/7.4.8.Final-redhat-4 - JBWEB000064: Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>JBWEB000065: HTTP Status 500 - java.net.MalformedURLException: no protocol: /ovirt/reports-interface</h1><HR size="1" noshade="noshade"><p><b>JBWEB000309: type</b> JBWEB000066: Exception report</p><p><b>JBWEB000068: message</b> <u>java.net.MalformedURLException: no protocol: /ovirt/reports-interface</u></p><p><b>JBWEB000069: description</b> <u>JBWEB000145: The server encountered an internal error that prevented it from fulfilling this request.</u></p><p><b>JBWEB000070: exception</b> <pre>javax.servlet.ServletException: java.net.MalformedURLException: no protocol: /ovirt/reports-interface
	org.ovirt.engine.core.uutils.servlet.ProxyServletBase.doGet(ProxyServletBase.java:194)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
</pre></p><p><b>JBWEB000071: root cause</b> <pre>java.net.MalformedURLException: no protocol: /ovirt/reports-interface
	java.net.URL.&lt;init&gt;(URL.java:585)
	java.net.URL.&lt;init&gt;(URL.java:482)
	java.net.URL.&lt;init&gt;(URL.java:431)
	org.ovirt.engine.core.uutils.servlet.ProxyServletBase.mergeQuery(ProxyServletBase.java:148)
	org.ovirt.engine.core.uutils.servlet.ProxyServletBase.doGet(ProxyServletBase.java:192)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
</pre></p><p><b>JBWEB000072: note</b> <u>JBWEB000073: The full stack trace of the root cause is available in the JBoss Web/7.4.8.Final-redhat-4 logs.</u></p><HR size="1" noshade="noshade"><h3>JBoss Web/7.4.8.Final-redhat-4</h3></body></html>

Comment 12 Alexander Wels 2014-10-06 14:52:28 UTC
(In reply to Eldad Marciano from comment #11)

> i have rebuild my setup using no SSL and i can see extremely faster results
> i'll rebuild it using SSL in order to reproduced the problem. 

This right here tells me you had an issue with your configuration. The application is not aware of if you have ssl or not, it simply doesn't care. If turning off SSL makes it fast, and turning it on makes it slow, it is almost certainly a configuration issue, and the application can't do anything to solve this. If I would have to guess is that your local machine can't resolve its own FQDN. I have see one other instance where the application was super slow for some reason, and once the machine was able to resolve its own FQDN the slowness disappeared.

@Einav,
At this point I think we can close this issue as I believe it has nothing to do with the code.

> 
> see the following results:
> 
> 1. http://<engine-host>/ovirt-engine = 12 requests in 328 ms.
> 2. http://<engine-host>/ovirt-engine/webadmin = 22 requests in 2.9 seconds
> 3. 
> yep the UI is delayed but and the object which delayed is the cahch.html (
> see the attached snapshot) 
> 
> reproduced on SSL.
> 
> 4. not sure yet i'll build another lab in order to test that.

The error below is an unrelated issue with the reporting.

> 5. nope, but once im logged in, there is 500 response for this:
>  /ovirt-engine/services/reports-interface-proxy?command=status
> 
> the response is:
> <html><head><title>JBoss Web/7.4.8.Final-redhat-4 - JBWEB000064: Error
> report</title><style><!--H1
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;
> font-size:22px;} H2
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;
> font-size:16px;} H3
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;
> font-size:14px;} BODY
> {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
> P
> {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:
> 12px;}A {color : black;}A.name {color : black;}HR {color :
> #525D76;}--></style> </head><body><h1>JBWEB000065: HTTP Status 500 -
> java.net.MalformedURLException: no protocol:
> /ovirt/reports-interface</h1><HR size="1"
> noshade="noshade"><p><b>JBWEB000309: type</b> JBWEB000066: Exception
> report</p><p><b>JBWEB000068: message</b> <u>java.net.MalformedURLException:
> no protocol: /ovirt/reports-interface</u></p><p><b>JBWEB000069:
> description</b> <u>JBWEB000145: The server encountered an internal error
> that prevented it from fulfilling this request.</u></p><p><b>JBWEB000070:
> exception</b> <pre>javax.servlet.ServletException:
> java.net.MalformedURLException: no protocol: /ovirt/reports-interface
> 	org.ovirt.engine.core.uutils.servlet.ProxyServletBase.
> doGet(ProxyServletBase.java:194)
> 	javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> 	javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> </pre></p><p><b>JBWEB000071: root cause</b>
> <pre>java.net.MalformedURLException: no protocol: /ovirt/reports-interface
> 	java.net.URL.&lt;init&gt;(URL.java:585)
> 	java.net.URL.&lt;init&gt;(URL.java:482)
> 	java.net.URL.&lt;init&gt;(URL.java:431)
> 	org.ovirt.engine.core.uutils.servlet.ProxyServletBase.
> mergeQuery(ProxyServletBase.java:148)
> 	org.ovirt.engine.core.uutils.servlet.ProxyServletBase.
> doGet(ProxyServletBase.java:192)
> 	javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> 	javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> </pre></p><p><b>JBWEB000072: note</b> <u>JBWEB000073: The full stack trace
> of the root cause is available in the JBoss Web/7.4.8.Final-redhat-4
> logs.</u></p><HR size="1" noshade="noshade"><h3>JBoss
> Web/7.4.8.Final-redhat-4</h3></body></html>

Comment 13 Einav Cohen 2014-10-06 15:32:46 UTC
Thanks, Alexander. 
I am closing this issue, as it has nothing to do with the GUI code. 

@Eldad: if you still think that this issue should be addressed, please re-open this BZ / file a new BZ with either 'integration' or 'infra' in the Whiteboard and 'distribution' as the component. thanks.

Comment 14 Eldad Marciano 2014-12-08 10:48:20 UTC
this bug reproduced sporadically on VT13.1
<cacheid>.cache.html running over 2 min.

no sure what the root cause, further investigation required.
i have taken 3 thread dump samples (every 5 sec) while the cache html getting stacked and i found this thread running over 1 min.

"ajp-/127.0.0.1:8702-7" daemon prio=10 tid=0x00007f216c009800 nid=0x2f3d runnable [0x00007f236e02b000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
	at org.apache.coyote.ajp.AjpProcessor$SocketOutputBuffer.doWrite(AjpProcessor.java:1330)
	at org.apache.coyote.Response.doWrite(Response.java:594)
	at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:421)
	at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:450)
	at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:351)
	at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:449)
	at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:438)
	at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:90)
	at org.apache.catalina.servlets.DefaultServlet.copyRange(DefaultServlet.java:2123)
	at org.apache.catalina.servlets.DefaultServlet.copy(DefaultServlet.java:1866)
	at org.apache.catalina.servlets.DefaultServlet.serveResource(DefaultServlet.java:829)
	at org.apache.catalina.servlets.DefaultServlet.doGet(DefaultServlet.java:363)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:295)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.ovirt.engine.core.utils.servlet.HeaderFilter.doFilter(HeaderFilter.java:94)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.ovirt.engine.ui.frontend.server.gwt.GwtCachingFilter.doFilter(GwtCachingFilter.java:132)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.ovirt.engine.core.branding.BrandingFilter.doFilter(BrandingFilter.java:72)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.ovirt.engine.core.utils.servlet.LocaleFilter.doFilter(LocaleFilter.java:64)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.ovirt.engine.core.aaa.filters.SessionMgmtFilter.doFilter(SessionMgmtFilter.java:31)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.ovirt.engine.core.aaa.filters.LoginFilter.doFilter(LoginFilter.java:74)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.ovirt.engine.core.aaa.filters.NegotiationFilter.doFilter(NegotiationFilter.java:132)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.ovirt.engine.core.aaa.filters.BasicAuthenticationFilter.doFilter(BasicAuthenticationFilter.java:75)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.ovirt.engine.core.aaa.filters.SessionValidationFilter.doFilter(SessionValidationFilter.java:73)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:231)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:512)
	at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:145)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97)
	at org.jboss.web.rewrite.RewriteValve.invoke(RewriteValve.java:466)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344)
	at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:490)
	at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(AjpProtocol.java:420)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:926)
	at java.lang.Thread.run(Thread.java:745)

Comment 15 Einav Cohen 2014-12-08 12:49:58 UTC
@Alexander: can you please contact Eldad and take a look at it?

Comment 16 Alexander Wels 2014-12-08 14:16:34 UTC
The answer will be the same, this is a configuration issue, it has nothing to do with the UI code. In fact the UI code hasn't run yet, since it hasn't been send to the browser yet. The stack traces in comment 14 indicate the server is in the process of sending the data to the browser.

So this means one of the following is happening:
1. The client is overloaded and not acecpting the data.
2. The network is not passing the data for some reason.
3. The server is overloaded and/or not sending the information in a timely fashion. I feeling is that this one is the case. This can be caused by a variety of reasons all outside of the ability for the UI to influence. For instance:
  a. If this is a VM and the host is overloaded.
  b. If this is a VM and doesn't have enough memory it might be swapping causing the bits to send slowly.

Again I don't think this has anything to do with the code and it is a configuration issue. I will see if I can contact Eldad to get access to his infra to check.

Comment 17 Einav Cohen 2014-12-08 18:04:57 UTC
Thanks, Alexander; please report back here (and clear 'needinfo') once you have final conclusions after checking Eldad's env.

Comment 18 Alexander Wels 2014-12-08 18:53:10 UTC
So my assessment in comment #16 is wrong. The engine machine is bare metal, it is reporting 24 processors and 64G of memory. The load on the machine is average between 0.1 and 0.2. The only process doing anything is java (engine), and its using 10-20% cpu most of the time.

The only strange thing about that machine is that the entropy is super low, it averages < 200 for the time I watched it. On my laptop it averages > 1500. I can make it happen fairly consistently, but not all the time.

This engine appears to be part of the scale project and contains 500 fake hosts (all of which are in maintaince). I thought it might be the number of file descriptors open, but that was fairly consistent at ~750. While the ulimit was set to unlimited.

In short, the engine machine is pretty much idle. I can create it on my machine which is also pretty much idle, so that eliminates 1 and 3 from the equation, so it must be an issue with the network.

Comment 19 Alexander Wels 2014-12-08 20:21:23 UTC
One more update on my testing. I disabled the automatic redirect to https to see the difference between http and https (I expect https to be a lot slower, but still acceptable). The difference is staggering, the https version sits there for minutes while the http version loads within seconds.

I tried this with some different browsers, and in FF the https version was slightly slower than the http version but not much. In Chrome the different was huge. So I imported the ca.crt from http://<host>/ca.crt into Chrome and restarted Chrome. Now Chrome is about as fast as Firefox.

@Eldad, can you try importing the ca.crt into your browser and restarting the browser and let me know if this still happens? If not then I think we have found the culprit.

Comment 20 Eldad Marciano 2014-12-09 14:06:44 UTC
I tried to add the crt file to chrome but still facing 2 min delay for cache.html

when i use FF i have faced extremely faster results.

Comment 21 Alexander Wels 2014-12-09 18:07:05 UTC
After doing more tests, it seems Eldad has some network issues that are causing this issue. With his agreement, I am closing not a bug. Re-open if you disagree.


Note You need to log in before you can comment on or make changes to this bug.