| Summary: | Numa sampling causes very high load on the hypervisor. | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Roman Hodain <rhodain> | |
| Component: | vdsm | Assignee: | Martin Polednik <mpoledni> | |
| Status: | CLOSED ERRATA | QA Contact: | Artyom <alukiano> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.6.9 | CC: | bazulay, bcholler, dfediuck, gklein, guchen, lsurette, mgoldboi, michal.skrivanek, mkalinin, mpoledni, srevivo, trichard, ycui, ykaul | |
| Target Milestone: | ovirt-4.1.0-alpha | Keywords: | Performance, Triaged, ZStream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Previously, NUMA sampling could cause an unnecessarily high load on a complex host. This update reduces the sample interval to 10 minutes, as that is enough for rarely-changing NUMA topology.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1401580 1401583 (view as bug list) | Environment: | ||
| Last Closed: | 2017-04-25 00:41:12 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 1401580, 1401583 | |||
MOM has nothing to do with NUMA. Moving to VDSM. There also were some big changes to monitoring in 4.0 so this might be just a matter of backporting. However, there is also the (fixed for at least 4.0 and up) bug about high load because of disk IO tune queries: https://bugzilla.redhat.com/show_bug.cgi?id=1366556 *** Bug 1398953 has been marked as a duplicate of this bug. *** msivak can we consider removing *VM* numa stats totally? it is for reporting only. 2nd option is to relax the interval, but I prefer that if we don't needed, just remove it msivak can we consider removing *VM* numa stats totally? it is for reporting only. 2nd option is to relax the interval, but I prefer that if we don't needed, just remove it It seems it is already removed in 4.1 engine. But we need to instruct VDSM to limit the collection frequency (and possibly remove the code) too. the code was dropped in 4.1 in bug 1148039 and it is unused in 3.6/4.0 as well, to minimize changes we can just increase the poll interval from 15s to 1h I meant 600s, that was actually tested in real setup already. Verified on vdsm-4.19.2-2.el7ev.x86_64 |
Description of problem: Numa sampling causes very high load on the hypervisor. The load on the hypervisor grows over the time. Version-Release number of selected component (if applicable): vdsm-4.17.35-1.el7ev.noarch How reproducible: 100% in a specific environment Steps to Reproduce: 1. see supervdsm logs Actual results: The load on the hypervisor is very high: 20:03:09 up 65 days, 23 min, 1 user, load average: 42.69, 41.55, 38.18 systemctl stop vdsmd 20:04:04 up 65 days, 24 min, 1 user, load average: 33.70, 39.56, 37.71 20:04:28 up 65 days, 24 min, 1 user, load average: 24.64, 36.98, 36.91 20:04:57 up 65 days, 25 min, 1 user, load average: 16.49, 33.83, 35.86 20:05:35 up 65 days, 25 min, 1 user, load average: 11.20, 30.59, 34.70 20:05:48 up 65 days, 26 min, 1 user, load average: 9.78, 29.33, 34.22 Additional info: The issue was workarounded by setting vm_sample_numa_interval = 600 numa stats are collected 3171 times in one hour for just 14 VMs