Bug 1182094 - vdsm NUMA code not effective, slowing down statistics retrieval
Summary: vdsm NUMA code not effective, slowing down statistics retrieval
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Martin Sivák
QA Contact: Artyom
URL:
Whiteboard:
Depends On:
Blocks: 1177634 1185279 1220113
TreeView+ depends on / blocked
 
Reported: 2015-01-14 12:39 UTC by Michal Skrivanek
Modified: 2016-03-09 19:29 UTC (History)
13 users (show)

Fixed In Version: ovirt-3.6.0-alpha1.2
Doc Type: Bug Fix
Doc Text:
Previously, NUMA statistics were collected every time VDSM was queried for host statistics. This resulted in a higher load and unnecessary delays as collecting the data was time consuming as an external process was executed. Now, NUMA statistic collection has been moved to the statistics threads and the host statistic query reports the last collected result.
Clone Of:
: 1220113 (view as bug list)
Environment:
Last Closed: 2016-03-09 19:29:13 UTC
oVirt Team: SLA
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0362 normal SHIPPED_LIVE vdsm 3.6.0 bug fix and enhancement update 2016-03-09 23:49:32 UTC
oVirt gerrit 36906 master MERGED Move NUMA collecting code to stats thread Never
oVirt gerrit 38564 master MERGED Cache the result of numaUtils.getVcpuPids Never

Description Michal Skrivanek 2015-01-14 12:39:21 UTC
following up to general scaling bug 1177634 opening a specific SLA bug as per https://bugzilla.redhat.com/show_bug.cgi?id=1177634#c46

NUMA code introduced in 3.5 is very ineffective and when enabled will significantly slow down the high-profile getAllVmStats call

The periodic parsing of private libvirt's xml is a very problematic approach and should be handled correctly, missing APIs should be requested to relevant components(libvirt)

In any case it should be moved out of the stats call which is supposed to only collect information which are being gathered in a separate thread asynchronously (this is the "urgent" part of the bug since it affects the overall performance)

Comment 1 Michal Skrivanek 2015-01-23 11:29:35 UTC
in addition see point 3 in https://bugzilla.redhat.com/show_bug.cgi?id=1185279#c1 for NUMA issue in host monitoring

Comment 2 Eyal Edri 2015-02-25 08:40:14 UTC
3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2

Comment 6 Martin Sivák 2015-03-10 16:48:55 UTC
The patch is posted and improvement was measured to be about 12ms per VM per call.

Two NUMA enabled VMs caused the following difference in time for x in $(seq 100); do vdsClient -s 0 getAllVmStats >/dev/null; done

Old VDSM:

real	0m21.093s
user	0m11.998s
sys	0m1.690s

Updated VDSM:

real	0m18.485s
user	0m12.009s
sys	0m1.846s

And a control timing of two VMs without NUMA:

real	0m18.298s
user	0m11.878s
sys	0m1.699s

As you can see the time difference for 100 calls was 2.5 seconds.

Comment 7 Martin Sivák 2015-03-10 16:50:22 UTC
But just to make everything clear, all NUMA related code was introduced in 3.5. So it should not affect 3.4 and the issue there is something different.

Comment 9 Martin Sivák 2015-05-11 08:36:24 UTC
The main issue is fixed.

Comment 10 Artyom 2015-05-26 14:14:58 UTC
Verified on vdsm-4.17.0-822.git9b11a18.el7.noarch
Run two vms with two cpu's, without NUMA:
[root@alma06 ~]# time for x in $(seq 100); do vdsClient -s 0 getAllVmStats >/dev/null; done

real    0m14.549s
user    0m11.643s
sys     0m2.316s

With NUMA:
[root@alma06 ~]# time for x in $(seq 100); do vdsClient -s 0 getAllVmStats >/dev/null; done

real    0m14.570s
user    0m11.632s
sys     0m2.370s

Comment 12 errata-xmlrpc 2016-03-09 19:29:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html


Note You need to log in before you can comment on or make changes to this bug.