following up to general scaling bug 1177634 opening a specific SLA bug as per https://bugzilla.redhat.com/show_bug.cgi?id=1177634#c46 NUMA code introduced in 3.5 is very ineffective and when enabled will significantly slow down the high-profile getAllVmStats call The periodic parsing of private libvirt's xml is a very problematic approach and should be handled correctly, missing APIs should be requested to relevant components(libvirt) In any case it should be moved out of the stats call which is supposed to only collect information which are being gathered in a separate thread asynchronously (this is the "urgent" part of the bug since it affects the overall performance)
in addition see point 3 in https://bugzilla.redhat.com/show_bug.cgi?id=1185279#c1 for NUMA issue in host monitoring
3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2
The patch is posted and improvement was measured to be about 12ms per VM per call. Two NUMA enabled VMs caused the following difference in time for x in $(seq 100); do vdsClient -s 0 getAllVmStats >/dev/null; done Old VDSM: real 0m21.093s user 0m11.998s sys 0m1.690s Updated VDSM: real 0m18.485s user 0m12.009s sys 0m1.846s And a control timing of two VMs without NUMA: real 0m18.298s user 0m11.878s sys 0m1.699s As you can see the time difference for 100 calls was 2.5 seconds.
But just to make everything clear, all NUMA related code was introduced in 3.5. So it should not affect 3.4 and the issue there is something different.
The main issue is fixed.
Verified on vdsm-4.17.0-822.git9b11a18.el7.noarch Run two vms with two cpu's, without NUMA: [root@alma06 ~]# time for x in $(seq 100); do vdsClient -s 0 getAllVmStats >/dev/null; done real 0m14.549s user 0m11.643s sys 0m2.316s With NUMA: [root@alma06 ~]# time for x in $(seq 100); do vdsClient -s 0 getAllVmStats >/dev/null; done real 0m14.570s user 0m11.632s sys 0m2.370s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html