Bug 1571223
Summary: | [upstream][v2v] Manage IQ performs slowly over remote site | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | Mor <mkalfon> | ||||||||||||||||
Component: | Performance | Assignee: | Martin Hradil <mhradil> | ||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Yadnyawalk Tale <ytale> | ||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||
Priority: | high | ||||||||||||||||||
Version: | 5.9.0 | CC: | bthurber, cbudzilo, cpelland, dagur, dmetzger, hkataria, istein, kbrock, lavenel, mhradil, mlehrer, mpovolny, mshriver, obarenbo, simaishi, smallamp | ||||||||||||||||
Target Milestone: | GA | Keywords: | Reopened | ||||||||||||||||
Target Release: | 5.10.0 | ||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||
Whiteboard: | v2v | ||||||||||||||||||
Fixed In Version: | 5.10.0.22 | Doc Type: | If docs needed, set a value | ||||||||||||||||
Doc Text: |
This release of Red Hat CloudForms implements a UI optimization which improves performance when using Chrome as the browser accessing an appliance.
|
Story Points: | --- | ||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||
Last Closed: | 2019-02-07 23:01:43 UTC | Type: | Bug | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | CFME Core | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Attachments: |
|
Description
Mor
2018-04-24 10:55:11 UTC
Created attachment 1425947 [details]
environment 1 - evm.log
Created attachment 1425949 [details]
environment 2 - evm.log
Closing this ticket as it is for ManageIQ, BZ tickets are for CloudForms. Please open a Github Issue instead. In the meantime I've added this issue to the performance team whiteboard. Removing Need Info as this bug is already closed. Reopening As this is high priority V2V related bug, It is actually slowing the V2V work flow and our testing efforts. We did not had such issues in CFME 5.9 or other CFME versions MIQ is usually not tested by QE - just for this V2V effort. That why we just see it now. Daniel, I closed this ticket (re-closing) because we do not track ManageIQ issues via Bugzilla. We only track CFME issues/bugs via Bugzilla, which is why I requested the issue be raised on the upstream Github for ManageIQ (mkalfon thanks for doing that). We are actively investigating this issue and will update the Github issue going forward. If you reproduce e this behavior with a CFME appliance, file a ticket for that. Created attachment 1426997 [details]
First evm.log of 2nd environment
Hello Dennis, You are right in any usual case we do not track ManageIQ issues via Bugzilla. But for the V2V initiative that currently in progress - the instructions were changed and as currently v2v is only supported by MIQ, QE was specifically instructed to open Bugzilla on such bugs and mark it V2V in the Keywords, I will forward you the mail from Brett Thurber with those instructions. There are many more bugs QE opened on V2V upstream lately and they are all handled, this one is no different from them. Check this filter : https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=POST&bug_status=MODIFIED&bug_status=ON_DEV&bug_status=ON_QA&bug_status=VERIFIED&bug_status=RELEASE_PENDING&bug_status=CLOSED&list_id=8740349&query_format=advanced&short_desc=%5Bupstream%5D%5Bv2v%5D&short_desc_type=allwordssubstr To avoid further Ping pong pong of the bug closing and reopening, please if you have any questions please contact me before closing it. Daniel. Created attachment 1427105 [details]
evm full logs (from clean install)
I have included old logs from this setup, please let me know if you need more logs. The entire logs in a compressed archive, take about 60MB which is higher than what is available here. A few questions and a request: 1. Can you provide an archive log set from the appliance(s) encountering the issue. 2. Which provider(s) are being managed by the appliance(s)? 3. What is a inventory size of the provider(s) being managed? 4. Has the provider(s) being used changed in the time the performance has been observed to slow down? Changed includes different providers or significant inventory size change of the provider(s)? 5. Are all pages / menu "slow", or is there a specific set of pages / menus? 6. Can you provide a DB dump of the appliance DB? 1. See the latest attached archive, it contains the log from the first run of MIQ (after the clean install). 2. RHV 4.2 and VMware 6.5. 3. In RHV - we manage 6 clusters, 407 hosts, 8 data-stores, 4045 VMs. In VMware 4 hosts, around 15 VMs. 4. No the provider(s) did not change. The issue appears right after you run MIQ on a clean install. 5. All of them are slow, it feels like a general slow no matter what you do or enter in the UI (including login). 6. Can you direct me how to provide that dump? Regarding requested logs, I'm looking for the full archive log set (created by the CEE script https://github.com/redhat-cfme-support/cloudforms_4.X_log_collection/blob/master/collect_CFME_archive_script.sh) or similar, which provides a more comprehensive set of logs / view of the appliance. I use pg_dump to create a DB dump: pg_dump -U root -h localhost -F custom -f DumpFileName vmdb_production New commit detected on ManageIQ/manageiq-appliance/master: https://github.com/ManageIQ/manageiq-appliance/commit/7bd338eea1f2c609ccb0f636a592ea93562ab1c1 commit 7bd338eea1f2c609ccb0f636a592ea93562ab1c1 Author: Keenan Brock <keenan> AuthorDate: Thu May 3 13:46:33 2018 -0400 Commit: Keenan Brock <keenan> CommitDate: Thu May 3 13:46:33 2018 -0400 Add cache headers for webpack We convert ETags to max-age for asset pipeline assets ETag: - the server needs to calculate the checksum - the client needs to ask the server if it has changed incurring the cost of an HTTP request, but not transferring the assets max-age: - the server just sends the current date plus one year - the assets have a checksum in the filename. so a file will never change - the client does not need to ask the server if a date has changed no HTTP request Chrome seems to mostly ignore cache headers, but firefox better follows the contract. For assets using ETag, firefox is hitting the server checking assets often. We were using max-age for asset pipeline, but etags for webpack assets. Now, we are using max-age for both. In addition, we are now setting Cache-control: public stating proxy servers can cache these assets Changing this value reduces the requests made to the server, especially for firefox. This makes most impact for users with a higher latency connection https://bugzilla.redhat.com/show_bug.cgi?id=1571223 COPY/etc/httpd/conf.d/manageiq-https-application.conf | 9 + 1 file changed, 9 insertions(+) The perf_primed_cache.png is the issue. It shows 10 pages being served from cache for me. Also, custom.css - that file is one of the slowest files for me. We need to add a location or something for that file. It will probably only be changed once by the customer, but tricky to know when Issue still very relevant on CFME 5.10.0.0.20180613200131_887cc81 Updating the status (if its not clear yet): I'm browsing the interface from TLV on RDU (remote site). When I try to work from RDU internally, the performance improves significantly. I would like to restate the problem: When a client accesses a new server, the client does not have the css/js resources cached. So this will download all the resources. This BZ describes a new server, which falls into the above description. So this will affect any new client, regardless of how long the server has been running. For some reason, it takes the client a long time to determine which resources can be cached. Most pages require a dozen http requests, and most of those requests result in downloading of data. The html page (rails response) tend to be only 10k, but the extra requests for static resources (mostly css and js files, though png are included) take over 500mb. For local connections, 0.6GB per page is a little slow, but for a remote office, this takes a long time to download and is very susceptible to VPN congestion. Firefox (especially on Fedora) seems to properly respect headers, while Chrome is more lenient. This results in Fedora downloading all the resources over and over again while Chrome downloads fewer resources. Firefox ends up being slow. The solution to this problem seems to be a typical rails / httpd server optimization: - reduce the number of assets needed. (Since we use webpack and asset pipeline, we have a bunch of duplicates here) - change headers to return long running timeouts instead of using etag where possible. - custom css (~100 bytes) takes up a large amount of time for local machines. Which is due to not being able to properly set the header names. finding another way to bundle this with the other css files would gain a lot, but the difficulty of this task is potentially tricky. the difficulty being the need to educate/walk customers through the bundling process. In the previous PRs, I did change the product to use long page cache timeouts for webpack files - but it apparently hasn't made enough of a difference. Throughput AND latency both will be a factor here. Since we are requesting many resources and we are downloading javascript which is very large. I think figuring out headers would be a bigger payoff than reducing the size of the javascript files. Although, it does look like we have quite a large amount of javascript to be downloaded. ugh, so sorry, the units in my previous comment were completely wrong. login page file | count / mb | requests / cached -----|------------|------------ js | 5 / 10.2 | 5 / 0 css | 2 / 0.5 | 2 / 0.2kb png | 4 / 0.1 | 4 / 0 html | 1 / 4k | 1 / 4k ems_infra (empty) file | count / mb | requests / cached -----|------------|------------ js | 5 / 10.2 | 5 / 0 css | 2 / 0.5 | 2 / 0.2kb png | 4 / 0.1 | 4 / 0 html | 4 / 0.06 | 1 / 4k xhr | 3 / 0.004 | 3 / 4k note: the apache logs seem to suggest some of these files are not downloaded. QUESTION: Mor, Please confirm that this is still an issue with an upstream master appliance. I will defer to MartinH as to our current efforts. Other than that, if we want to prioritize some UI optimization for the next release, I'm sure we could come up with some things to research and implement. Martin, please re-read through some of the comments above and see if there is something we can do to correct the caching of the assets in the various browsers. Thx, Dan This issue is still relevant on upstream master.20180619230249_84e9fa9. The JS asset (../assets/application-82b76a4b24285414b6b3e9bbc645af16dc593370c8ec31a007c74d410dd70c8a.js) ~7.5MB in size is downloaded every time I switch between menu items. It takes ~13 sec to complete over a good VPN tunnel connection (bandwidth and latency). I haven't checked on TLV locally, but I assume that this asset still takes the most to download when switching menu items. So if we can define cache for this asset, we will gain significant improvement. Mor... are you sure you cache is not disabled when looking at the network tab? Or, are any non-manageiq proxies involved? I can confirm I'm seeing the same JS asset being downloaded and taking 7.5 MB. (The same asset hash, but the appliance version is 5.10.0.1.20180619163011_900fdc4 in my case.) When going to any other menu item, that same asset gets server from memory cache on chrome. The same is happening on firefox, the file gets served from cache the first time I change menu items and on any subsequent try. That's firefox 60 and chromium 67. So, from what I can tell, this is already fixed. Attaching screenshots, maybe somebody can tell me what I'm trying wrong. Created attachment 1453716 [details]
works in chrome 67
Created attachment 1453717 [details]
works in firefox 60
5.10 is different from what we run. We run on MIQ master. I'm running Firefox 60, and I still see the JS being downloaded. Can you please try to login to the server and check? (see comment #35 for hostname). OK, tried on 10.12.69.26. Please note that that machine's version is 5.9.3.2.20180619200710_4f909bc which does not match the "master" claim. --- Observing the same result in both chrome and firefox, attached screenshots for both. (The only thing I notice, firefox outputs [full size / transfered size], so 10.44 MB / 627.18 KB still looks like a lot, until you only read the second number.) Created attachment 1454327 [details]
screenshots for 10.12.69.26
(In reply to Martin Hradil from comment #45) > OK, tried on 10.12.69.26. > > Please note that that machine's version is 5.9.3.2.20180619200710_4f909bc > which does not match the "master" claim. > > --- > > Observing the same result in both chrome and firefox, attached screenshots > for both. > (The only thing I notice, firefox outputs [full size / transfered size], so > 10.44 MB / 627.18 KB still looks like a lot, until you only read the second > number.) Sorry, the server name was wrong. This is the correct FQDN: https://manageiq.rhev.openstack.engineering.redhat.com/ Please give it a try OK, I tried on that machine, I'm still observing the second request goes from the cache. But: having waited a while after that and trying yet another menu item, I can indeed see that the file was downloaded again.. But seeing this only in firefox, not chrome. So.. yes, I can see there's a bug somewhere. But.. IMO the cache headers are correct, and firefox is simply choosing to aggresively re-download. (Cache-Control comes straight from the MDN guide , and Expires is supposed to be ignored when max-age is provided) If that's so, maybe the solution could be simple: we could add `immutable` to the Cache-Control header, since these files will never change anyway. .. but alas, no, firefox seems to happily ignore even the immutable flag. I'm sorry, right now, my only solution is: use chrome. A different idea: maybe the file doesn't get cached because it's too big. But, looks like current versions of Firefox have the maximum set to 51 megabytes, which should be enough. This can be checked in about:config, under browser.cache.disk.max_entry_size I used mitmproxy to look at the information outside of firefox. I think this is unrelated to the original effort. It is looking like the headers are working great in firefox 60.0.2 (Mac) This is frustrating because everything looks good. I currently going on the assumption that a) a cache hit is better than b) a 304, and those are better than c) a full download of a resource. I'm also going on the assumption that we are getting hit by the number of resources rather than the quantity of data downloaded. I'm also assuming the size of a download is insignificant until it is 50k or larger. 1. https://support.mozilla.org/en-US/questions/1169302 -- you want to turn this off. I've noticed a lot of traffic from by browser for this. It was introduced firefox 52 thoughts on the server 1. would be nice if we could have the menu use the proper urls (e.g.: /cloud_volume vs /cloud_volume/show_list) It is a 140ms delay on a 756ms page (20%) That delay ends up delaying all resources since we can't determine the other files to download until after the redirect occurs. I'm assuming this resource is slower over a wan/vpn. Think this fix would be in menu.rb. 2. report_data (177ms 0.6k) seems like it could be encoded directly into the page. Not sure if it would slow the page itself down or if this request is even possible from an architectural point of view. 3. haml templates like notification-heading.html and notification-subheading.html (200ms / 0.1k) have a bunch of 200 requests over xhr. MartinH mentioned the ability to precompile these. I noticed the dashboard downloading at least 4 copies of this file and using another 8 cached copies. From my naive perspective, this sounds like more effort than it is worth, but I wanted to mention it here. 1. agreed, definitely doable, app/presenters/menu/default_menu.rb 2. true, but this would go directly against the goal of consuming report data from the API, so not sure it is worth it (that said, the current hybrid approach generates some data twice, so finishing that might speed this up too) 3. the only complication there is that we'd need a separate bundle for each language, and we'd likely lose the ability to use ruby helpers in haml files (as they would get compiled by javascript code if we were precompiling) ... but maybe we could start with a fix to the angular loader, so that it does not try to request a resource multiple times in paralel if a request has already been made New commit detected on ManageIQ/manageiq-ui-classic/master: https://github.com/ManageIQ/manageiq-ui-classic/commit/408ac643d4b7fc4c5fc872890d3bb07cf8b47a88 commit 408ac643d4b7fc4c5fc872890d3bb07cf8b47a88 Author: Martin Hradil <mhradil> AuthorDate: Wed Oct 10 11:56:05 2018 -0400 Commit: Martin Hradil <mhradil> CommitDate: Wed Oct 10 11:56:05 2018 -0400 Default menu - fix all menu items to use full url previously there were a lot of Menu::Item instances using an url like `/container`. This is OK, but it means a redirect to `/container/show_list` every time that menu item is accessed. Updating to make all menu items use the default redirect URL. This means the only non-external urls remaining without a method part are: `/bottlenecks`, `/graphql_explorer`, `/planning` and `/utilization` - all of these work without a redirect. https://bugzilla.redhat.com/show_bug.cgi?id=1571223 app/presenters/menu/default_menu.rb | 124 +- 1 file changed, 62 insertions(+), 62 deletions(-) New commit detected on ManageIQ/manageiq-ui-classic/hammer: https://github.com/ManageIQ/manageiq-ui-classic/commit/80cf760f5b9acd316cc5a054f91e332da10b9a19 commit 80cf760f5b9acd316cc5a054f91e332da10b9a19 Author: Milan Zázrivec <mzazrivec> AuthorDate: Thu Oct 11 05:10:03 2018 -0400 Commit: Milan Zázrivec <mzazrivec> CommitDate: Thu Oct 11 05:10:03 2018 -0400 Merge pull request #4752 from himdel/specific-menu Default menu - fix all menu items to use full url (cherry picked from commit eafe89a4016ab4379b4a4b4a33620c4920694f93) https://bugzilla.redhat.com/show_bug.cgi?id=1571223 app/presenters/menu/default_menu.rb | 124 +- 1 file changed, 62 insertions(+), 62 deletions(-) Created https://github.com/ManageIQ/manageiq-ui-classic/pull/4813 to reduce the number of notification-related http requests from 6 to 2. Marking the PR as fixing this bz, as I don't think there's any more we can do here, not without a specific problem. New commit detected on ManageIQ/manageiq-ui-classic/master: https://github.com/ManageIQ/manageiq-ui-classic/commit/97d83f2997d27951582d64c7da14e1bfc3ad9e6b commit 97d83f2997d27951582d64c7da14e1bfc3ad9e6b Author: Martin Hradil <mhradil> AuthorDate: Tue Oct 23 08:04:49 2018 -0400 Commit: Martin Hradil <mhradil> CommitDate: Tue Oct 23 08:04:49 2018 -0400 notifications - notificationBodyInclude - use render :partial instead of ng-include large template, so not inlining, but at least we can drop the extra async request for it Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1571223 app/assets/javascripts/controllers/notifications/notification-drawer.directive.js | 1 - app/views/layouts/_notifications_drawer.html.haml | 1 - app/views/static/notification_drawer/_notification-body.html.haml | 32 + app/views/static/notification_drawer/notification-body.html.haml | 32 - app/views/static/notification_drawer/notification-drawer.html.haml | 4 +- 5 files changed, 34 insertions(+), 36 deletions(-) New commit detected on ManageIQ/manageiq-ui-classic/hammer: https://github.com/ManageIQ/manageiq-ui-classic/commit/0cd47efe5127962aed5f24d838b1d107ddbfe82d commit 0cd47efe5127962aed5f24d838b1d107ddbfe82d Author: Milan Zázrivec <mzazrivec> AuthorDate: Wed Oct 24 02:22:13 2018 -0400 Commit: Milan Zázrivec <mzazrivec> CommitDate: Wed Oct 24 02:22:13 2018 -0400 Merge pull request #4813 from himdel/ng-include Remove ng-include in notifications (cherry picked from commit ec520f4f81accba173aee2d045329b18ea20e591) Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1571223 app/assets/javascripts/controllers/notifications/notification-drawer.directive.js | 4 - app/views/layouts/_notifications_drawer.html.haml | 3 - app/views/static/notification_drawer/_notification-body.html.haml | 32 + app/views/static/notification_drawer/notification-body.html.haml | 32 - app/views/static/notification_drawer/notification-drawer.html.haml | 16 +- app/views/static/notification_drawer/notification-heading.html.haml | 2 - app/views/static/notification_drawer/notification-subheading.html.haml | 2 - 7 files changed, 40 insertions(+), 51 deletions(-) Martin, Thank you for all the work done to improve things. Moving bug to verified, as I experience remote slowness, only on the known Firefox issue. Also adding require_doc_text to document the remote connection limitation. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0212 |