2025525 – View cluster overview page consume high CPU if the cluster has thousand of PODs

Bug 2025525 - View cluster overview page consume high CPU if the cluster has thousand of PODs

Summary: View cluster overview page consume high CPU if the cluster has thousand of PODs

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Console Kubevirt Plugin
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Yaacov Zamir
QA Contact:	Guohua Ouyang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-11-22 12:53 UTC by Guohua Ouyang
Modified:	2023-09-18 04:28 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-03-09 01:09:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift console pull 10693	0	None	open	Bug 2025525: overview page consume high CPU	2021-12-19 12:58:45 UTC

Description Guohua Ouyang 2021-11-22 12:53:09 UTC

Description of problem:
View cluster overview page consume high CPU if the cluster has thousand of VMs,  looks like the overview is also loading VMs data.

Version-Release number of selected component (if applicable):
OCP 4.9.6

How reproducible:


Steps to Reproduce:
1. create about 2000 VMs in the cluster
2. view cluster overview page
3.

Actual results:
it consumes very high CPU

Expected results:
CPU consuming is normal

Additional info:

Comment 2 Guohua Ouyang 2021-12-21 02:09:28 UTC

Tested on master today, the overview page still consume high CPU

Comment 4 Gilad Lekner 2021-12-26 13:43:46 UTC

@gouyang there's nothing more that can be done from kubevirt side, the VMs are the only resource being watched now.
The VMs status is being fetched from the YAML, there is no longer a computation in the client side, and the watched resources count went down from 6 resources to 1.

If the bug persists, we should move this to core

Comment 6 sgott 2022-01-05 13:31:25 UTC

It's not clear to me why this was moved to Virt Core. Can you please clarify what process is consuming CPU? Client side? Server side?

Comment 7 Guohua Ouyang 2022-01-06 00:46:57 UTC

(In reply to sgott from comment #6)
> It's not clear to me why this was moved to Virt Core. Can you please clarify
> what process is consuming CPU? Client side? Server side?

I moved the bug to virt core based on c#4, I should be wrong.

@Kobi, do you know which component should we move the bug to for further investigation?
It seems most queries are related to prometheus.

Comment 8 Yaacov Zamir 2022-01-06 07:17:24 UTC

(In reply to Guohua Ouyang from comment #7)
> (In reply to sgott from comment #6)
> > It's not clear to me why this was moved to Virt Core. Can you please clarify
> > what process is consuming CPU? Client side? Server side?
> 
> I moved the bug to virt core based on c#4, I should be wrong.
> 
> @Kobi, do you know which component should we move the bug to for further
> investigation?
> It seems most queries are related to prometheus.

AFAIU, the cluster overview page execute queries to k8s API and Prometheus to show cards related to cluster resources.
In the case of CNV the queries are defined in kubevirt plugin, kubevirt plugin is doing the resource queries and return the values to the overview pages (cluster + project).

Moving to kubevirt plugin because the queries definitions are defined in the plugin.

Gilad hi,
I think you worked on using printableStatus instead of using calculated status when calculating the queries for the overview page, 
Do you remember if we had a bug for that (It may be a duplicate of this one)?

Comment 10 Guohua Ouyang 2022-01-10 07:55:40 UTC

Could also see this issue by creating 600 hundrad PODs, move the bug to management console for investigation.
Reproduce steps:
1. create 600+ PODs in a namespace
2. Visit cluster dashboard page
3. Exec "$ top" in terminal to monitor the CPU usage, the CPU usage is over 100% for about 5-10s.

Comment 11 Jakub Hadvig 2022-01-12 17:45:33 UTC

Guohua could you please specify the version that is being affected? Is it 4.10 or 4.9?
Also the name of the BZ refers to "thousand of pods" but description to VM's. Which one is it?
Tested running 700 pods on the cluster and visiting Dashboard page (per https://bugzilla.redhat.com/show_bug.cgi?id=2025525#c10) but havent seen any mayor spike in CPU. Wondering if that could be HW related, since both me and @Robb  are on pretty strong MAC machines. What HW are you using for testing?
-Thanks

Comment 12 Guohua Ouyang 2022-01-13 07:49:36 UTC

(In reply to Jakub Hadvig from comment #11)
> Guohua could you please specify the version that is being affected? Is it
> 4.10 or 4.9?

Could see the issue on 4.9 and 4.10

> Also the name of the BZ refers to "thousand of pods" but description to
> VM's. Which one is it?

I notice this issue with VMs firstly, after some time I tried it with only PODs later, could also see this issue, so I rewrote the summary.

> Tested running 700 pods on the cluster and visiting Dashboard page (per
> https://bugzilla.redhat.com/show_bug.cgi?id=2025525#c10) but havent seen any
> mayor spike in CPU. Wondering if that could be HW related, since both me and
> @Robb  are on pretty strong MAC machines. What HW are you using for testing?

I'm using Lenovo thinkpad T470s whose CPU is Intel i7-7600U and mem is 16G, OS is Fedora 35.
I could still see the spike in CPU with a newly deployed cluster(OCP only, no CNV installed).

> -Thanks

Comment 13 Robb Hamilton 2022-01-17 15:52:34 UTC

> I could still see the spike in CPU with a newly deployed cluster(OCP only, no CNV installed).

I'm inclined to see this spike in CPU is expected.  When the page first loads, the network requests and rendering occur concurrently and the browser is very busy.  I think this is only problematic if the CPU usage never settles once the page is fully loaded and rendered.

@Guohua, does this CPU spike settle and resolve to "normal" usage levels once the page is loaded and rendered?

Comment 15 Guohua Ouyang 2022-01-24 10:46:32 UTC

I retested this by building a new cluster which has 1200+ PODs running, the performance of the cluster overview page is not bad.
Move the bug back to kubevirt plugin and will try to retest it with a new cluster.

Comment 16 Guohua Ouyang 2022-01-24 10:53:28 UTC

(In reply to Guohua Ouyang from comment #15)
> I retested this by building a new cluster which has 1200+ PODs running, the
> performance of the cluster overview page is not bad.

- The cpu consumer is not high if no other applications is running.
- The cpu consumer is high if the slack app is running.

Comment 20 Shiftzilla 2023-03-09 01:09:16 UTC

OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9032

Comment 21 Red Hat Bugzilla 2023-09-18 04:28:21 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.