Bug 833799
Summary: | [RFE] portals: use HTTP keep-alive to minimize latencies | ||
---|---|---|---|
Product: | [Retired] oVirt | Reporter: | David Jaša <djasa> |
Component: | ovirt-engine-installer | Assignee: | Sandro Bonazzola <sbonazzo> |
Status: | CLOSED WONTFIX | QA Contact: | |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | unspecified | CC: | acathrow, alonbl, djasa, ecohen, iheim, juan.hernandez, mgoldboi, pstehlik, tdosek, vszocs, yeylon |
Target Milestone: | --- | Keywords: | FutureFeature, Triaged |
Target Release: | 3.4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | integration | ||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-01-06 08:13:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Jaša
2012-06-20 11:37:39 UTC
This RFE probably needs a little clarification. AFAIU HTTP Keep-Alive, it works by not dropping TCP connection after a response finishes but the HTTP on top of it shouldn't need to do anything more than set a header. So the workflow on any refreshes is: 1. portal makes a request 2. browser asks for server IP, gets it from DNS server 3. browser established TCP/TLS connection (~ 5 roundtrips) 4. browser does HTTP request 5. browser gets HTTP response 6. connection is dropped 7. portal makes another request -> goto 2. target state is: 1. portal makes first request 2. browser asks for server IP, gets it from DNS server 3. browser established TCP/TLS connection (~ 5 roundtrips) 4. browser does HTTP request 5. browser gets HTTP response // so far the same, now comes the difference 6. connection is kept 7. portal makes another request 8. browser sends the request over existing connection stuff like HTTP push are yet another story for yet another RFE in my opinion. vojtech - doesn't the browser use HTTP 1.1 by default doing this out of the box? According to [1] all HTTP/1.1 connections are considered persistent by default, which means both HTTP/1.1 servers & clients should use Keep-Alive semantics for incoming & outgoing requests. For example, Apache web server has Keep-Alive behavior turned on by default, with Keep-Alive-Timeout=5sec [2]. Internet Explorer browser, being HTTP/1.1 client, honors default Keep-Alive behavior, using Keep-Alive-Timeout=60sec. In this example, the TCP connection will be reused for HTTP request/response processing for a maximum of 5 seconds -> min(5sec,60sec). In case of Engine, assuming JBoss AS honors default Keep-Alive behavior as HTTP/1.1 server, it should be as simple as increasing Keep-Alive-Timeout value on server. In general, high Keep-Alive-Timeout is not recommended for servers with many concurrent users (different origins) as it consumes more RAM, but I don't think this is the case for Engine, as there are only few well-defined entry points like REST API, web GUI, etc. To summarize: we should find a way how to increase Keep-Alive-Timeout in JBoss AS (Engine), assuming web clients (Firefox, IE, etc.) already use high-enough Keep-Alive-Timeout values, e.g. IE default 60sec, Firefox default 115sec [1]. So this BZ is more about JBoss AS configuration change, rather than UI code change. [1] http://en.wikipedia.org/wiki/HTTP_persistent_connection [2] http://abdussamad.com/archives/169-Apache-optimization:-KeepAlive-On-or-Off.html > TCP connection will be reused for HTTP request/response processing for a maximum of 5 seconds -> min(5sec,60sec).
Sorry, the above should be:
TCP connection will be reused for HTTP request/response processing, and dropped after 5 seconds of request/response inactivity -> min(5sec,60sec).
So with high-enough Keep-Alive-Timeout on both client & server (higher than default data-refresh-polling frequency), TCP connection should be always reused.
In JBoss AS 7 the keep alive is enabled by default, its value is 30 seconds, and as far as know there is no way to change it. However, most of our requests come via the Apache web server. There it is also enabled by default and if we want to change it it is a matter of adding to /etc/httpd/conf/httpd.conf something like this: KeepAliveTimeout 60 We already do some modifications to that httpd.conf file from engine-setup, so I guess we can do this additional one as well. Thanks Juan, indeed there seems to be no way to configure Keep-Alive-Timeout for web/connector in JBoss AS 7. 30 seconds should be good enough for most UI data refresh rates, we can use "KeepAliveTimeout 30" in Apache, no need for higher value. do we have this by default for 3.3 (in which we force apache frontend)? To have this in 3.3 we need to modify the setup tool so that it changes the apache configuration, same as we do with SSL, for example. I assume this did not happen for 3.3. if not, current keepalive is the default which is 5 seconds. worth a short discussion on engine-devel? sounds like a simple patch? Hi, Note from apache: """ The number of seconds Apache will wait for a subsequent request before closing the connection. Once a request has been received, the timeout value specified by the Timeout directive applies. Setting KeepAliveTimeout to a high value may cause performance problems in heavily loaded servers. The higher the timeout, the more server processes will be kept occupied waiting on connections with idle clients. """ I think that 5 second for session reuse is good enough as we already pulling server for statuses. So if we have 2 concurrent sessions per browser most likely these 2 sessions will be reused at least once in 5 seconds. Do we actually need to change anything? Thanks, (In reply to David Jaša from comment #1) > This RFE probably needs a little clarification. AFAIU HTTP Keep-Alive, it > works by not dropping TCP connection after a response finishes but the HTTP > on top of it shouldn't need to do anything more than set a header. So the > workflow on any refreshes is: > > 1. portal makes a request > 2. browser asks for server IP, gets it from DNS server this is cached. > 3. browser established TCP/TLS connection (~ 5 roundtrips) this is cached based on tls cookie, even if socket is disconnected, tls session is 'logically' kept. > 4. browser does HTTP request > 5. browser gets HTTP response > 6. connection is dropped not in http-1.1 for duration of 5 seconds per default. > 7. portal makes another request -> goto 2. the designated step (dns, tls, http).is retrieved from cache if available. we are not unique application in this regard. keeping timeouts longer does not imply that the entire solution will react better. > we are not unique application in this regard. True, but we should be able to configure Apache per individual application's needs, based on application's URL scheme. Using same Apache config values for all applications isn't optimal anyway. > keeping timeouts longer does not imply that the entire solution will react better. Alon, the idea was to reuse existing network connection with HTTP client - for example, WebAdmin requesting fresh data each X seconds. As you quoted Apache docs: > Setting KeepAliveTimeout to a high value may cause performance problems in heavily loaded servers. The higher the timeout, the more server processes will be kept occupied waiting on connections with idle clients. My understanding is that "heavily loaded" typically means "lots of concurrent clients" (multi-tenancy). For WebAdmin, this isn't the case and therefore HTTP keep-alive should be increased to deal with increased request frequency. For UserPortal, this might be the case and HTTP keep-alive should be probably left to its default value. (In reply to vszocs from comment #12) > > we are not unique application in this regard. > > True, but we should be able to configure Apache per individual application's > needs, based on application's URL scheme. Using same Apache config values > for all applications isn't optimal anyway. This does not apply to specific directory: Context: server config, virtual host > > > keeping timeouts longer does not imply that the entire solution will react better. > > Alon, the idea was to reuse existing network connection with HTTP client - > for example, WebAdmin requesting fresh data each X seconds. As I wrote there should be no need to do so, as reuse of sessions already exist if we pull engine for status. > My understanding is that "heavily loaded" typically means "lots of > concurrent clients" (multi-tenancy). For WebAdmin, this isn't the case and > therefore HTTP keep-alive should be increased to deal with increased request > frequency. For UserPortal, this might be the case and HTTP keep-alive should > be probably left to its default value. Why isn't this is the case? Once again, this is server wide configuration. And as I explained in comment#10, comment#11 it should not be not actually required. (In reply to Alon Bar-Lev from comment #13) > > True, but we should be able to configure Apache per individual application's > > needs, based on application's URL scheme. Using same Apache config values > > for all applications isn't optimal anyway. > > This does not apply to specific directory: > > Context: server config, virtual host You're right. We could use <Directory> inside <VirtualHost> but that's not feasible I guess. > > Alon, the idea was to reuse existing network connection with HTTP client - > > for example, WebAdmin requesting fresh data each X seconds. > > As I wrote there should be no need to do so, as reuse of sessions already > exist if we pull engine for status. I assume by "sessions" you mean "network connections"? i.e. this BZ has nothing to do with "session" in a traditional (server-side) sense. WebAdmin has default (main tab) grid refresh rate 5 seconds, which can be modified by user to [10,20,30,60] seconds. For Apache, KeepAliveTimeout default value is 5 seconds. For JBoss AS, analogous KeepAliveTimeout option has default value 30 seconds. So in practice, if client sends another request in min(5,30) = 5 second time window, network connection will be reused. Otherwise, network connection will be dropped. (Note that each browser has its own analogous KeepAliveTimeout option, too.) If WebAdmin user increases grid refresh rate above 5 seconds, each GWT RPC request will spawn new network connection in Apache. > > My understanding is that "heavily loaded" typically means "lots of > > concurrent clients" (multi-tenancy). For WebAdmin, this isn't the case and > > therefore HTTP keep-alive should be increased to deal with increased request > > frequency. For UserPortal, this might be the case and HTTP keep-alive should > > be probably left to its default value. > > Why isn't this is the case? Because WebAdmin is conceptually designed as admin web interface, in practice there won't be as many concurrent users as compared to UserPortal. > Once again, this is server wide configuration. Yes. > And as I explained in comment#10, comment#11 it should not be not actually > required. I assume you refer to the caching aspect. Maybe David can respond on this, I'm not too familiar with HTTP request processing internals. (In reply to vszocs from comment #14) > I assume by "sessions" you mean "network connections"? i.e. this BZ has > nothing to do with "session" in a traditional (server-side) sense. > > WebAdmin has default (main tab) grid refresh rate 5 seconds, which can be > modified by user to [10,20,30,60] seconds. > > For Apache, KeepAliveTimeout default value is 5 seconds. For JBoss AS, > analogous KeepAliveTimeout option has default value 30 seconds. So in > practice, if client sends another request in min(5,30) = 5 second time > window, network connection will be reused. Otherwise, network connection > will be dropped. (Note that each browser has its own analogous > KeepAliveTimeout option, too.) If client sends *ANY* request, so if refresh timeout is 5 seconds sessions will be reused. Statistically even in 10 seconds refresh session will be reused as there are other requests. Any longer value, the penalty of re-establish session is lower than keeping it around. > If WebAdmin user increases grid refresh rate above 5 seconds, each GWT RPC > request will spawn new network connection in Apache. This is valid, I do not see the problem, the problem with session re-establish is in short cycles. > Statistically even in 10 seconds refresh session will be reused as there are other requests. I'm not familiar with Apache but I think it implements connection queue [1] alias ListenBackLog - processing max. 1 active connection & queue-ing subsequent connections for future processing into ListenBackLog. [1] http://httpd.apache.org/docs/2.2/mod/mpm_common.html#listenbacklog However, I'd say KeepAliveTimeout doesn't take "connection waiting for processing" delay into account, so I'd say that anything more than 5 seconds will simply close given (persistent) connection. > Any longer value, the penalty of re-establish session is lower than keeping it around. I'd say this depends on network traffic, there is no silver bullet, it's always a compromise that reflects expected server load. > This is valid, I do not see the problem, the problem with session re-establish is in short cycles. I agree, today 5 second refresh cycle should always keep-alive given connection. Longer refresh cycles will close given connection and re-establish one again if necessary. > Longer refresh cycles will close given connection and re-establish one again if necessary
So it comes down whether we want to address ^^ use case, or not. David, what do you think?
david - do you actually see session drops / re-initated using tcpdump with 5 seconds refresh? with 10 or 20 second refresh? It depends on actual activity, my idle system reaches up to 80 seconds without FIN in 5 second refresh interval, but sometime pulling is not accurate so new session is opened. Multiple sessions are opened while idle are closed as expected. There are no DNS queries when session is re-established. The TLS session is resumed from cache. Normal web application behavior. Maybe setting default to 4 second interval will make people happier. > Maybe setting default to 4 second interval will make people happier.
Interesting point, however note that this sacrifices client-side performance, i.e. GWT RPC processing + IE8 + less than 5 sec refresh = slow application.
I originally noticed the issue right because of the DNS queries every n seconds. There are now gone (FF25 @ Fedora 20) but the issue of not reusing existing TCP and SSL connection is still present: look into the attachment. It contains dump of 4 round of ongoing communication. In short, it seems that right after the useful data is exchanged, server issues SSL Encrypted Alert followed by server-sent TCP FIN packet - in effect, it looks like HTTP Keepalive not being used at all. (At least, session tickets are used so one of the redundant roundtrips are saved). Now let's assume that the keepalive works as it should and the SSL Alert + TCP FIN are sent at the end of keepalive interval. What use is then keepalive time <= refresh rate? If the each request takes rather long time, the connection is likely to persist with occasional need to re-handshake. If they are rather short (which is the case, the idle time in the dump approaches full refresh interval), then such keepalive is only good for ticking "keepalive" item in some requirements list but the real-world behaviour is as if no keepalive is present... TL;DR for Itamar: the actual connection takes a fraction of second and is closed right away so yes, I see connection drops even with shortest available refresh rate. Please remember that this is web technology based application, 5 second pulling of server is very short interval and is good for 1-5 people but has scale issue in enterprise wide solution. Regardless of this specific discussion, I suggest the opposite, set default pulling interval to 15 seconds, and allow server to free resources. If in future we are to customize the http protocol behavior we should open our own custom application port at server and not use apache for status pulling. (In reply to David Jaša from comment #21) > in effect, it looks like HTTP Keepalive not being used at all. Indeed: # grep -riIsn '^[^#]*keep.*alive' /etc/httpd /etc/httpd/conf.d/ssl.conf:219: nokeepalive ssl-unclean-shutdown \ /etc/httpd/conf.d/ssl.conf.20130820184141:219: nokeepalive ssl-unclean-shutdown \ /etc/httpd/conf/httpd.conf:74:KeepAlive Off ^^^ /etc/httpd/conf/httpd.conf:81:MaxKeepAliveRequests 100 /etc/httpd/conf/httpd.conf:87:KeepAliveTimeout 15 /etc/httpd/conf/httpd.conf:886:BrowserMatch "Mozilla/2" nokeepalive /etc/httpd/conf/httpd.conf:887:BrowserMatch ".*MSIE [2-5]\..*" nokeepalive downgrade-1.0 force-response-1.0 Note that default keepalive doesn't help clients with defocused portal, in my case, refresh rate then increases to 40 s so keepalive north of that value should deliver most value. (In reply to Alon Bar-Lev from comment #10) > ... > """ > The number of seconds Apache will wait for a subsequent request before > closing the connection. Once a request has been received, the timeout value > specified by the Timeout directive applies. > > Setting KeepAliveTimeout to a high value may cause performance problems in > heavily loaded servers. The higher the timeout, the more server processes > will be kept occupied waiting on connections with idle clients. > """ > ... a.k.a. Slow Lorries IIRC. Best mitigated by MPM-Event (if Jboss or whatever next is in the chain can also process multiple connections per thread of course). In response to Alon's comment, I think the whole idea of GUI polling server periodically for updates is just inefficient and causes lots of problems. I think that in long term, we should think about utilizing websocket or similar technology to avoid polling server entirely, i.e. allow for server -> client data push. (In reply to David Jaša from comment #24) > (In reply to David Jaša from comment #21) > > in effect, it looks like HTTP Keepalive not being used at all. > > /etc/httpd/conf/httpd.conf:74:KeepAlive Off Now I get it!!! you are looking at rhel... for some reason they forced it off... in Fedora and Gentoo I see this is on. Oh... from [1] I get see the following... was not updated since fedora-11?!?! --- commit 3b6e5535f3bbe1b904d4abb51384d4ded6e10c25 Author: dgregor <dgregor> Date: Wed May 20 20:03:10 2009 +0000 mass-import of httpd-2.2.11-8.src.rpm from dist-f11 --- sysadmin can always enable this option to acquire more performance. [1] http://pkgs.devel.redhat.com/cgit/rpms/httpd/tree/httpd.conf?h=rhel-6.5 (In reply to vszocs from comment #25) > In response to Alon's comment, I think the whole idea of GUI polling server > periodically for updates is just inefficient and causes lots of problems. I > think that in long term, we should think about utilizing websocket or > similar technology to avoid polling server entirely, i.e. allow for server > -> client data push. I always considered websocket to be the ultimate solution, until I had to dig into the protocol for the novnc... the websocket breaks the http spec by not allowing a proxy to exist between client and server. So using websocket in application introduces limitation. Maybe this can be evaluated as minor in this case, but still... Another issue is that at server side a standard web container cannot be used in large scale without using large amount of resources. For now, I would like to, at least, see a cache at jboss side to sample database once per interval (may be different interval for each object type) on behalf of all users create a status in memory and return results out of memory. This will be a great improvement over engine load and user interface response time. (In reply to Alon Bar-Lev from comment #23) > Please remember that this is web technology based application, 5 second > pulling of server is very short interval and is good for 1-5 people but has > scale issue in enterprise wide solution. > 5s polling is indeed inefficient but how do you want to maintain user-friendly behaviour without push notifications? That would need deeper changes of course. > Regardless of this specific discussion, I suggest the opposite, set default > pulling interval to 15 seconds, and allow server to free resources. > I don't have particular numbers so: * at what number of concurrent client connections starts httpd choking? * at what number of concurrent client connections starts jboss choking? * at what number of concurrent client connections starts engine choking? etc. IOW if the rest of application stack handles extra connections, why should apache create an artificial chokepoint? > If in future we are to customize the http protocol behavior we should open > our own custom application port at server and not use apache for status > pulling. What is a point of http proxy when it is not used for all connections? Anyway, the ultimate goal in this respect should be some kind of server-push - but as far as I understand, all of them need to keep HTTP connection around for long polling, websocket connection or such... BTW based on quick googling, it seems that Jboss 7 already supports some of these. (In reply to Alon Bar-Lev from comment #26) > ... > Now I get it!!! you are looking at rhel... for some reason they forced it > off... in Fedora and Gentoo I see this is on. it's so long since reporting that I didn't notice the product the bug was reported against... > sysadmin can always enable this option to acquire more performance. > IMO when the httpd is configured anyway, it should be configured for best performance possible. (In reply to David Jaša from comment #28) > IMO when the httpd is configured anyway, it should be configured for best > performance possible. In 3.3 we tried to detach as much as possible from the past approach of a single application (us) owning the entire server, and start play nicely with other products installed at same host. I hope we can continue to to seek that goal. In response to Alon's comment #27: > the websocket breaks the http spec Websocket has nothing to do with HTTP protocol in general. Quote from Wikipedia: "Its only relationship to HTTP is that its handshake is interpreted by HTTP servers as an Upgrade request." Websocket is not a complement or supplement of HTTP, it's a different protocol on top of TCP. > I would like to, at least, see a cache at jboss side to sample database once per interval This is a different kind of optimization, but I agree, it typically proves useful as opposed to doing DB queries each time. AFAIK, we already use Infinispan ("distributed in-memory key/value data grid and cache") in our engine.ear deployment. (In reply to David Jaša from comment #28) > 5s polling is indeed inefficient but how do you want to maintain > user-friendly behaviour without push notifications? That would need deeper > changes of course. Exactly, this is what I mean when I say "WebAdmin is not a typical CLI-style request/response application - it's a dynamic web application working with data in real time". The problem is getting notified of changes, assuming you want to pick up these changes in real time. If you initiate the change via your own browser instance, we already have infra to request updates from server immediately [1,2]. [1] http://www.ovirt.org/Features/Design/UIRefreshSynchronization [2] http://gerrit.ovirt.org/#/c/21057/ [merged into master] If someone else initiates the change (i.e. REST API call), our only chance today is polling each X seconds. From web application perspective, server to client data push seems optimal. > Anyway, the ultimate goal in this respect should be some kind of server-push > - but as far as I understand, all of them need to keep HTTP connection > around for long polling, websocket connection or such... > BTW based on quick googling, it seems that Jboss 7 already supports some of > these. Java EE 7 spec natively supports Websocket, i.e. JBoss can expose Websocket endpoint that accepts HTTP request and upgrades it to Websocket connection. After Websocket connection is established, AFAIK there is no HTTP stuff happening anymore. After discussing Websocket vs. Proxy Servers with Alon, I have to admit that Websocket spec (the part for initiating connection via HTTP Upgrade request) doesn't work nicely with HTTP proxies: http://www.foxweave.com/websockets-and-http-proxies/ http://www.infoq.com/articles/Web-Sockets-Proxy-Servers AFAIK it can be worked around (i.e. re-setting HTTP Upgrade & Connection header for another network hop), but sounds hacky to me. Following up on a mutual agreement of GSS and Engineering I'm closing this bugzilla. The reason is that the issue limits users really only in the case of very large amount of users working with the portals simultaneously. For now we will handle such situations by following KBase article: https://access.redhat.com/site/solutions/660883 More long term solution is to have a guideline for scalability "tips and tricks", which would sum up all possible method for enhancing RHEV scalability by changing configuration of third party applications that RHEV uses (postgres, apache, jboss, even kernel via tuned, etc.). |