Bug 2217243
| Summary: | virt-handler memory and cpu usage are hardcoded and set too low for large scale | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Boaz <bbenshab> |
| Component: | Virtualization | Assignee: | sgott |
| Status: | NEW --- | QA Contact: | Kedar Bidarkar <kbidarka> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.13.1 | CC: | fdeutsch, iholder, jhopper, sradco |
| Target Milestone: | --- | ||
| Target Release: | 4.15.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Boaz
2023-06-25 10:24:58 UTC
For now I'm assuming that the PDB link is a red herring. Boaz, how much cpu and memory consumption do you see the virt handler using? My take would be that we take tehse values and use them+small delta as the defaults for our handler. Hey @fdeutsch I'm seeing 330-350 MB memory allocations and 40m -92m CPU used its fine to set those values+ headroom as a baseline but I would also like to have those tunable. Thanks Boaz. Yes, it would be great if we can identify a way of how we could derive these values from - static cluster properties (node count, cpu count, network bw, …) - or target workloads properties (expected churn, expected num vms, …) We probably want to have this in a dedictaed epic, just like "following the kubelet" for api server A general note: I think we should treat ExceedsRequestedCPU and ExceedsRequestedMemory completely differently since they are not of the same severity. ExceedsRequestedMemory is dangerous, since memory is an uncompressible resource. This means that if the node would be stressed w.r.t. memory, virt-handler might get killed, which is bad. But with ExceedsRequestedCPU things are completely different, since CPU is a compressible resource. If the node gets stressed w.r.t. CPU, in the worst case scenario virt-handler would be throttled to use only the amount of CPU it requested. If virt-handler needs more CPU than requested (this might only happen for a certain period of time) then the node will allow this as long as it has spare CPU resources. This completely aligns with Kubernetes (and cgroups) resource management model and is completely valid. The only reason to think there's a problem with virt-handler's request is if we are sure that it permanently (or at least for long periods of time) requires more CPU to do its job. But I think we need more data to actually be sure of that. We might even consider raising the fire time for KubeVirtComponentExceedsRequestedCPU, which is currently 5 minutes. The plan is to address this issue mentioned in the bug description as part of the Jira Epic, https://issues.redhat.com/browse/CNV-28746 which is currently Targeted for CY24. |