Description of problem: View cluster overview page consume high CPU if the cluster has thousand of VMs, looks like the overview is also loading VMs data. Version-Release number of selected component (if applicable): OCP 4.9.6 How reproducible: Steps to Reproduce: 1. create about 2000 VMs in the cluster 2. view cluster overview page 3. Actual results: it consumes very high CPU Expected results: CPU consuming is normal Additional info:
Tested on master today, the overview page still consume high CPU
@gouyang there's nothing more that can be done from kubevirt side, the VMs are the only resource being watched now. The VMs status is being fetched from the YAML, there is no longer a computation in the client side, and the watched resources count went down from 6 resources to 1. If the bug persists, we should move this to core
It's not clear to me why this was moved to Virt Core. Can you please clarify what process is consuming CPU? Client side? Server side?
(In reply to sgott from comment #6) > It's not clear to me why this was moved to Virt Core. Can you please clarify > what process is consuming CPU? Client side? Server side? I moved the bug to virt core based on c#4, I should be wrong. @Kobi, do you know which component should we move the bug to for further investigation? It seems most queries are related to prometheus.
(In reply to Guohua Ouyang from comment #7) > (In reply to sgott from comment #6) > > It's not clear to me why this was moved to Virt Core. Can you please clarify > > what process is consuming CPU? Client side? Server side? > > I moved the bug to virt core based on c#4, I should be wrong. > > @Kobi, do you know which component should we move the bug to for further > investigation? > It seems most queries are related to prometheus. AFAIU, the cluster overview page execute queries to k8s API and Prometheus to show cards related to cluster resources. In the case of CNV the queries are defined in kubevirt plugin, kubevirt plugin is doing the resource queries and return the values to the overview pages (cluster + project). Moving to kubevirt plugin because the queries definitions are defined in the plugin. Gilad hi, I think you worked on using printableStatus instead of using calculated status when calculating the queries for the overview page, Do you remember if we had a bug for that (It may be a duplicate of this one)?
Could also see this issue by creating 600 hundrad PODs, move the bug to management console for investigation. Reproduce steps: 1. create 600+ PODs in a namespace 2. Visit cluster dashboard page 3. Exec "$ top" in terminal to monitor the CPU usage, the CPU usage is over 100% for about 5-10s.
Guohua could you please specify the version that is being affected? Is it 4.10 or 4.9? Also the name of the BZ refers to "thousand of pods" but description to VM's. Which one is it? Tested running 700 pods on the cluster and visiting Dashboard page (per https://bugzilla.redhat.com/show_bug.cgi?id=2025525#c10) but havent seen any mayor spike in CPU. Wondering if that could be HW related, since both me and @Robb are on pretty strong MAC machines. What HW are you using for testing? -Thanks
(In reply to Jakub Hadvig from comment #11) > Guohua could you please specify the version that is being affected? Is it > 4.10 or 4.9? Could see the issue on 4.9 and 4.10 > Also the name of the BZ refers to "thousand of pods" but description to > VM's. Which one is it? I notice this issue with VMs firstly, after some time I tried it with only PODs later, could also see this issue, so I rewrote the summary. > Tested running 700 pods on the cluster and visiting Dashboard page (per > https://bugzilla.redhat.com/show_bug.cgi?id=2025525#c10) but havent seen any > mayor spike in CPU. Wondering if that could be HW related, since both me and > @Robb are on pretty strong MAC machines. What HW are you using for testing? I'm using Lenovo thinkpad T470s whose CPU is Intel i7-7600U and mem is 16G, OS is Fedora 35. I could still see the spike in CPU with a newly deployed cluster(OCP only, no CNV installed). > -Thanks
> I could still see the spike in CPU with a newly deployed cluster(OCP only, no CNV installed). I'm inclined to see this spike in CPU is expected. When the page first loads, the network requests and rendering occur concurrently and the browser is very busy. I think this is only problematic if the CPU usage never settles once the page is fully loaded and rendered. @Guohua, does this CPU spike settle and resolve to "normal" usage levels once the page is loaded and rendered?
I retested this by building a new cluster which has 1200+ PODs running, the performance of the cluster overview page is not bad. Move the bug back to kubevirt plugin and will try to retest it with a new cluster.
(In reply to Guohua Ouyang from comment #15) > I retested this by building a new cluster which has 1200+ PODs running, the > performance of the cluster overview page is not bad. - The cpu consumer is not high if no other applications is running. - The cpu consumer is high if the slack app is running.
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-9032
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days