Bug 1247075 - [scale] high vdsm threads overhead
Summary: [scale] high vdsm threads overhead
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Francesco Romani
QA Contact: Eldad Marciano
URL:
Whiteboard:
Depends On: 1279431
Blocks: 1265205
TreeView+ depends on / blocked
 
Reported: 2015-07-27 09:40 UTC by Michal Skrivanek
Modified: 2019-10-10 09:59 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, excessive thread usage in VDSM and Python runtime architecture caused poor VDSM performance on multicore hosts. VDSM now supports affinity in order to pin its processes to specific cores. CPU usage is reduced as a result of pinning VDSM threads to a smaller number of cores.
Clone Of:
: 1265205 (view as bug list)
Environment:
Last Closed: 2016-03-09 19:43:02 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2116711 0 None None None 2016-01-08 18:15:07 UTC
Red Hat Product Errata RHBA-2016:0362 0 normal SHIPPED_LIVE vdsm 3.6.0 bug fix and enhancement update 2016-03-09 23:49:32 UTC
oVirt gerrit 45282 0 None None None Never
oVirt gerrit 45738 0 master MERGED scale: limit cpu usage using cpu-affinity Never
oVirt gerrit 46502 0 ovirt-3.6 MERGED scale: limit cpu usage using cpu-affinity Never
oVirt gerrit 46522 0 ovirt-3.5 MERGED scale: limit cpu usage using cpu-affinity Never

Description Michal Skrivanek 2015-07-27 09:40:49 UTC
while testing on 240 core machine it was found out that the python threading overhead is very significant. Pinning vdsm process to less cores helps with GIL battle

Initial observation/test:

My goal was to test default behavior and then try to pin vdsm threads execution to smaller number of cores. Expectation was that the python threading suffers a lot on multi-core hosts due to the nature python implements thread switching(old but helpul [1]) - limiting the execution to a handful of cores should help without impact to the actual execution time as nothing runs during gil battle anyway

In a short period when the machine was available I was able to try a case of running >100VMs on a single beefy host with 240 cores on a stock RHEV 3.5.3

tested pinning all vdsm threads to cores when running 110 idle VMs:
hard to get a stable state, SD bouncing (lots of lvm processes from time to time), but there are some relatively stable periods when things sort of work. Total CPU usage is very low
240 cores - fluctuating between several hundred to several thousand percent (up to 10000%), 10s average still fluctuates 1000-2500%.
120 cores - pretty much similar
20 cores - fluctuates a lot, from several hundreds to 2000%, average 500%
5 cores - 50-500%, 10s average ~120%
2 cores, oscillates around 50% average, rarely peaks above 100%
1 core - average around 30%, rarely peaks close to 100%

in all cases running fine (apart from the terribly verbose default output, 180MB per hour, lot of storage related stuff, SLA's infamous metadata thing, etc)

then started additional 40 VMs 
- on 240 cores caused system instability, unresponsiveness, basically the whole thing exploded while VMs were started at a pace of 1 per minute, getting worse till I managed to cancel the remaining pending startups.
- on 2 cores worked ok, in few minutes started all of them, no overload anywhere, vdsm settling at 75% average(2 cores), 45%(1 core)
(for fun I started these 40 new VMs on 2-core to get to 150 VMs and then re-enabled all 240 and CPU usage skyrocketed to 6000% in average, peaks at 15000%)

This is all on a code without the 3.6 threads reduction by bulk stats in vdsm and without mom separated from vdsm, so certainly our stock 3.6 numbers will be better
But we still have quite a few threads (with few storage domains let's say 50) and it still may make sense to restrict by default to 1 or 2 cores only. The difference is _huge_ without any obvious drawbacks, worth investigating…

[1] http://www.dabeaz.com/python/UnderstandingGIL.pdf

Comment 1 Michal Skrivanek 2015-07-27 09:42:38 UTC
makes sense to have a vdsm.conf configurable to pin to a certain number of cores. 

We can consider backporting (disabled by default) if changes are not that intrusive

Comment 2 Ilanit Stein 2015-08-20 10:59:04 UTC
Michal,

Can you please set the severity?

Thanks,
Ilanit

Comment 3 Michal Skrivanek 2015-08-20 16:39:09 UTC
Didn't investigate much deeper, but the benefits are quite important/visible, so worth trying to do something configurable at least(e.g. default disable, default settings 2 cores 0,1, drop the pinning in cpopen so child processes are not constrained)

Infra's thoughts?

Comment 4 Oved Ourfali 2015-08-24 08:40:10 UTC
Dan/Yaniv - thoughts?

Comment 5 Francesco Romani 2015-08-25 08:19:37 UTC
first patch posted. Still very much WIP/RFC, but the concepts are there, so it is good for first evaluation.

Comment 6 Francesco Romani 2015-08-28 09:39:20 UTC
Two possible approach:

1. expose affinity syscalls to python level, then let vdsm itself adjust affinity
   (this is what patch 45282 starts to do).
2. just use taskset tool at startup


1. is much more complex than 2. and 2. is probably good enough. Need a bit of tinkering, however, to make it easily configurable.

Comment 7 Dan Kenigsberg 2015-09-09 14:40:14 UTC
Given the numbers in comment 0, what is the benefit of making this configurable?

What is the case against

 ExecStart=taskset 1 @VDSMDIR@/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null "@VDSMDIR@/vdsm"

Comment 8 Francesco Romani 2015-09-09 14:55:31 UTC
(In reply to Dan Kenigsberg from comment #7)
> Given the numbers in comment 0, what is the benefit of making this
> configurable?
> 
> What is the case against
> 
>  ExecStart=taskset 1 @VDSMDIR@/daemonAdapter -0 /dev/null -1 /dev/null -2
> /dev/null "@VDSMDIR@/vdsm"

Configurability? It is pretty brutal to always pin VDSM to cpu #1. Maybe the admin do not want pinning (!), maybe he want to pin it to a different cpu.

That said, this is not the hardest part (for some definition of "hard").
The hardest part is to make sure that helper processes spawned by VDSM are NOT pinned to the same CPU. There are unexpected woes, please check last comments about pidStat/pgrep: https://gerrit.ovirt.org/#/c/45738/9/tests/utilsTests.py

Comment 9 Francesco Romani 2015-09-09 15:06:38 UTC
restoring other NEEDINFO

Comment 10 Dan Kenigsberg 2015-09-10 06:29:14 UTC
(In reply to Francesco Romani from comment #8)
> Configurability? It is pretty brutal to always pin VDSM to cpu #1. Maybe the
> admin do not want pinning (!), maybe he want to pin it to a different cpu.

I admit it seems brutal, but I'm asking why would an admin want to avoid pinning, when there's such a big CPU benefit, and no reported down sides.

Comment 11 Yaniv Kaul 2015-09-13 09:23:49 UTC
danken - I'd just make sure not everyone is pinning themselves to processor 0 to solve such issues... How about randomly selecting which core to pin to?

Comment 12 Michal Skrivanek 2015-09-17 07:34:30 UTC
flagging as blocker - this is way too important for large setups

Comment 14 Francesco Romani 2016-01-20 08:46:20 UTC
Not sure the proper way to convey documentation about this BZ is the doc_text, as per last discussions. I added it just in case.

Comment 15 Eldad Marciano 2016-02-17 14:15:25 UTC
verified on top of vdsm-4.17.20-0.el7ev.noarch
host has 144 cores and vm per core (144 vms)
vdsm running 60% CPU utilization in avg, there is some peaks to 90-100% but not over it.

Test Case:
144 vms per 144 cores. for 15min duration.

Comment 16 Eldad Marciano 2016-02-17 14:16:28 UTC
more over, it has been tested with and without mom (disabled \ enabled).

Comment 18 errata-xmlrpc 2016-03-09 19:43:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html


Note You need to log in before you can comment on or make changes to this bug.