1880698 – Performance of the `QueryBrowser` graphs have degraded since 4.5

Bug 1880698 - Performance of the `QueryBrowser` graphs have degraded since 4.5

Summary: Performance of the `QueryBrowser` graphs have degraded since 4.5

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Andrew Pickering
QA Contact:	hongyan li
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1795401 (view as bug list)
Depends On:
Blocks:	1804922
TreeView+	depends on / blocked

Reported:	2020-09-19 03:39 UTC by Andrew Pickering
Modified:	2020-10-27 16:43 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-27 16:42:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift console pull 6685	None	closed	Bug 1880698: Improve graph render performance	2021-01-06 16:57:39 UTC
Github	openshift console pull 6749	None	closed	Bug 1880698: Query Browser: Improve graph render speed by not using VictoryTooltip	2021-01-06 16:57:36 UTC
Red Hat Product Errata	RHBA-2020:4196	None	None	None	2020-10-27 16:43:15 UTC

Description Andrew Pickering 2020-09-19 03:39:32 UTC

Graphs on the Alerting, Metrics and Dashboards pages for both Dev and Admin perspectives all use the `QueryBrowser` component to render their graphs. The render time of this component has got slower since the 4.5 release.

Comment 2 hongyan li 2020-09-24 02:28:47 UTC

I tested with Performance tab of chrome developer tools, you fix is in build 4.6.0-0.nightly-2020-09-21-230455

1. Installed ocp cluster with payload 4.6.0-0.nightly-2020-09-21-093308 which doesn't include your fix
2. Installed ocp cluster with payload 4.6.0-0.nightly-2020-09-23-022756 which include your fix
3. Perform example query and collect performance data, see no performance enhancement
       sort_desc(sum(sum_over_time(ALERTS{alertstate="firing"}[24h])) by (alertname))
4. Perform query query with big data and collect performance data, filed product bug https://bugzilla.redhat.com/show_bug.cgi?id=1880698
       cluster_quantile:apiserver_request_duration_seconds:histogram_quantile


I collect the following data five time for each ENV, suppose the Painting time or the Rendering time should be enhanced, see no enhancement for all the times
38 ms Loading
3204 ms Scripting
422 ms Rendering
24 ms Painting
561 ms System
4622 ms Idle
8871 ms Total

Comment 3 hongyan li 2020-09-24 03:37:08 UTC

Full test result
https://files.slack.com/files-pri/T027F3GAJ-F01BZTTSYJC/image.png

Comment 4 hongyan li 2020-09-24 03:42:31 UTC

The following bug is not caused by the current fix. 
https://bugzilla.redhat.com/show_bug.cgi?id=1880698

I reopen the bug for my test see no performance enhancement.

Comment 5 hongyan li 2020-09-24 04:46:21 UTC

Will do performance again with two cluster with same data series

Comment 6 hongyan li 2020-09-24 07:55:59 UTC

Launch chrome in incognito mode and collect performance data with Chrome developer tools, test results is as below: From the results we can know that the performance is enhanced with bug's fix

sort_desc(sum(sum_over_time(ALERTS{alertstate="firing"}[24h])) by (alertname))		
fix is not in 2 time series		
1 3204	ms	Scripting
2 3160	ms	Scripting
3 3007	ms	Scripting
4 2863	ms	Scripting
5 2806	ms	Scripting
total:15040		
		
fix is in 2 time series		
2044	ms	Scripting
2224	ms	Scripting
2314	ms	Scripting
2359	ms	Scripting
2468	ms	Scripting
11409(total)		
		
topk(5, cluster_quantile:apiserver_request_duration_seconds:histogram_quantile)		
fix is not in(634 time series)		
13479	Ms 	Scripting
4164	ms	Scripting
11556	ms	Scripting
5993	ms	Scripting
5756	ms	Scripting
40948(total)		
fix is in (742 time series)		
8457	ms	Scripting
9596	 ms	Scripting
3134	ms	Scripting
3856	ms	Scripting
8953	ms	Scripting
33996(total)

Comment 7 Samuel Padgett 2020-09-25 13:33:15 UTC

We have a follow on fix that improves performance a bit more. Moving back to assigned.

Comment 9 hongyan li 2020-09-29 07:15:09 UTC

Test with payload 4.6.0-0.nightly-2020-09-28-171716
sort_desc(sum(sum_over_time(ALERTS{alertstate="firing"}[24h])) by (alertname)) 5 time series
2062	msScripting
2117	msScripting
2187	msScripting
2251	msScripting
1756	msScripting
10373	


cluster_quantile:apiserver_request_duration_seconds:histogram_quantile 858 timeseries	
2161	msScripting
1550	msScripting
2761	msScripting
2350	msScripting
2587	msScripting
11409

Comment 10 hongyan li 2020-09-29 07:18:27 UTC

From the results, we can know that performance improve greatly when the query returns big data

Comment 11 Samuel Padgett 2020-10-01 12:18:18 UTC

*** Bug 1795401 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2020-10-27 16:42:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.