Bug 1859921
Summary: | GenericApiGWTService causing additional load on engine | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | mlehrer | ||||
Component: | General | Assignee: | Hilda Stastna <hstastna> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Pavel Novotny <pnovotny> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.4.1.8 | CC: | bugs, dfodor, gdeolive, pnovotny, sgratch | ||||
Target Milestone: | ovirt-4.4.6 | Keywords: | Performance | ||||
Target Release: | --- | Flags: | sgratch:
ovirt-4.4?
sgratch: planning_ack? pm-rhel: devel_ack+ pm-rhel: testing_ack+ |
||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | ovirt-engine-4.4.6.6 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-05-14 07:28:20 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | UX | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1171924 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
mlehrer
2020-07-23 10:12:15 UTC
There are few options to handle this: We need to dig into the code in order to understand why there are too many queries with an idle browser. We are pretty sure that no changes were done on 4.4 frontend UI so I guess it's not a regression. And also AFAIK no customer complained about that. There might be changes on backend queries complexity that influenced the postgres load. This anyway requires investigation. Was there any filter/search query used for reproducing the issue? Or is it reproduced even when there is no filter used (search field is empty)? If there was a query used then maybe it was a heavy one that made postgres more loaded. Anyway, we can consider supporting a "dynamic refresh interval" solution: we can gradually increasing the refresh interval default to more than 5 secs only in case the browser was idle for a period of time. We don't want to increase it anyway since the user wants to get an up to date and accurate view and it seems that waiting for more than 5 secs with an active UI is too long. Another solution is to consider increasing the refresh interval default to more than 5 secs only in case a (heavy) search/query is used. (In reply to Sharon Gratch from comment #1) > Was there any filter/search query used for reproducing the issue? Or is it > reproduced even when there is no filter used (search field is empty)? If > there was a query used then maybe it was a heavy one that made postgres more > loaded. @Mordechai, can you please reply on the above? Or maybe just send a screenshot of the browser view? Thanks. (In reply to Sharon Gratch from comment #1) > There are few options to handle this: > We need to dig into the code in order to understand why there are too many > queries with an idle browser. We are pretty sure that no changes were done > on 4.4 frontend UI so I guess it's not a regression. And also AFAIK no > customer complained about that. Not saying this is a regression based on current data. > There might be changes on backend queries complexity that influenced the > postgres load. This anyway requires investigation. > > Was there any filter/search query used for reproducing the issue? Or is it > reproduced even when there is no filter used (search field is empty)? If Should repeat irregardless if query filter is empty or query filter is used - more painful the query the bigger the impact. When our scale system is back up in (should be in a few days) I will provide updated info to this bz. > there was a query used then maybe it was a heavy one that made postgres more > loaded. Total impact of constant querying from GenericApiGWTService is correlated to cost of query being run. > > Anyway, we can consider supporting a "dynamic refresh interval" solution: > we can gradually increasing the refresh interval default to more than 5 secs > only in case the browser was idle for a period of time. We don't want to > increase it anyway since the user wants to get an up to date and accurate > view and it seems that waiting for more than 5 secs with an active UI is too > long. Agreed, I would prefer a fix to the issue rather than forcing users to have to wait longer for UI refreshes. > > Another solution is to consider increasing the refresh interval default to > more than 5 secs only in case a (heavy) search/query is used. once we have the scale lab back up I will compile some traces and ping you offline to give you the full picture, then you'll be able to suggest what works best. Leaving the needinfo on me until the traces are supplied. (In reply to mlehrer from comment #3) > > > > Another solution is to consider increasing the refresh interval default to > > more than 5 secs only in case a (heavy) search/query is used. > > once we have the scale lab back up I will compile some traces and ping you > offline to give you the full picture, then you'll be able to suggest what > works best. > Leaving the needinfo on me until the traces are supplied. Mordechai, any update on this? Created attachment 1721143 [details]
Collection of trace html reports for idle vm search page
The uploaded attachment contains several(single page) html reports that are individually zipped. Each report correlates to a 'slow trace' event. Each slow trace event correlates to the api call listed in the report in this:
/ovirt-engine/webadmin/GenericApiGWTService
Open the html report in any browser and note the following:
Breakdown section shows how much time is spent via http, or in jdbc query time or getting a connection, the count row explains how many times this was executed.
Click on or expand "Query Stats" to see a list of what unique queries were run and how long all initiated by this specific instance of /ovirt-engine/webadmin/GenericApiGWTService api call.
In a larger setup the response times and query duration are far worse than the whats shown in these examples as we have less assets loaded, but whats important is to show which queries, and how many are being run and these reports show that.
Lastly there's png file which just shows an overview of the traces happening the VM window is open and a search was made.
In this example the search was: 'cluster = L0_Group_0 and host = f0*'
Traces were generated on system with 3122 vms, and 272 hosts, on 5000 vms, and 500 hosts the duration of the same traces simply take longer as there are more assets.
Please reach out if you have any questions about the reports.
After discussing this issue, we suggest that as a first phase solution we can start by increasing the default refresh interval from 5 sec to 10 sec for all grid tables and regardless to filtering query existence. The user will be able to change the default to 5 sec manually or by user settings configuration (should be implemented as part of user settings https://bugzilla.redhat.com/show_bug.cgi?id=1171924) This is an easy solution that might decrease the load without too much effect on the user experience. We can start by that and check if other suggested solutions mentioned above are required. Please note that the fix is as detailed on comment 6 - increasing the default refresh interval from 5 sec to 10 sec. Verified in ovirt-engine-4.4.6.6-0.10.el8ev.noarch ovirt-engine-webadmin-portal-4.4.6.6-0.10.el8ev.noarch The default data table refresh interval is now 10 seconds (changing this value is not permanent, but this issue is handled in a separate task). The GenericApiGWTService API calls are reflecting this value. Tried with all the options 5, 10, 20, 30, 60 seconds refresh interval and the API calls were fired with the same time interval. |