while testing on 240 core machine it was found out that the python threading overhead is very significant. Pinning vdsm process to less cores helps with GIL battle Initial observation/test: My goal was to test default behavior and then try to pin vdsm threads execution to smaller number of cores. Expectation was that the python threading suffers a lot on multi-core hosts due to the nature python implements thread switching(old but helpul [1]) - limiting the execution to a handful of cores should help without impact to the actual execution time as nothing runs during gil battle anyway In a short period when the machine was available I was able to try a case of running >100VMs on a single beefy host with 240 cores on a stock RHEV 3.5.3 tested pinning all vdsm threads to cores when running 110 idle VMs: hard to get a stable state, SD bouncing (lots of lvm processes from time to time), but there are some relatively stable periods when things sort of work. Total CPU usage is very low 240 cores - fluctuating between several hundred to several thousand percent (up to 10000%), 10s average still fluctuates 1000-2500%. 120 cores - pretty much similar 20 cores - fluctuates a lot, from several hundreds to 2000%, average 500% 5 cores - 50-500%, 10s average ~120% 2 cores, oscillates around 50% average, rarely peaks above 100% 1 core - average around 30%, rarely peaks close to 100% in all cases running fine (apart from the terribly verbose default output, 180MB per hour, lot of storage related stuff, SLA's infamous metadata thing, etc) then started additional 40 VMs - on 240 cores caused system instability, unresponsiveness, basically the whole thing exploded while VMs were started at a pace of 1 per minute, getting worse till I managed to cancel the remaining pending startups. - on 2 cores worked ok, in few minutes started all of them, no overload anywhere, vdsm settling at 75% average(2 cores), 45%(1 core) (for fun I started these 40 new VMs on 2-core to get to 150 VMs and then re-enabled all 240 and CPU usage skyrocketed to 6000% in average, peaks at 15000%) This is all on a code without the 3.6 threads reduction by bulk stats in vdsm and without mom separated from vdsm, so certainly our stock 3.6 numbers will be better But we still have quite a few threads (with few storage domains let's say 50) and it still may make sense to restrict by default to 1 or 2 cores only. The difference is _huge_ without any obvious drawbacks, worth investigating… [1] http://www.dabeaz.com/python/UnderstandingGIL.pdf
makes sense to have a vdsm.conf configurable to pin to a certain number of cores. We can consider backporting (disabled by default) if changes are not that intrusive
Michal, Can you please set the severity? Thanks, Ilanit
Didn't investigate much deeper, but the benefits are quite important/visible, so worth trying to do something configurable at least(e.g. default disable, default settings 2 cores 0,1, drop the pinning in cpopen so child processes are not constrained) Infra's thoughts?
Dan/Yaniv - thoughts?
first patch posted. Still very much WIP/RFC, but the concepts are there, so it is good for first evaluation.
Two possible approach: 1. expose affinity syscalls to python level, then let vdsm itself adjust affinity (this is what patch 45282 starts to do). 2. just use taskset tool at startup 1. is much more complex than 2. and 2. is probably good enough. Need a bit of tinkering, however, to make it easily configurable.
Given the numbers in comment 0, what is the benefit of making this configurable? What is the case against ExecStart=taskset 1 @VDSMDIR@/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null "@VDSMDIR@/vdsm"
(In reply to Dan Kenigsberg from comment #7) > Given the numbers in comment 0, what is the benefit of making this > configurable? > > What is the case against > > ExecStart=taskset 1 @VDSMDIR@/daemonAdapter -0 /dev/null -1 /dev/null -2 > /dev/null "@VDSMDIR@/vdsm" Configurability? It is pretty brutal to always pin VDSM to cpu #1. Maybe the admin do not want pinning (!), maybe he want to pin it to a different cpu. That said, this is not the hardest part (for some definition of "hard"). The hardest part is to make sure that helper processes spawned by VDSM are NOT pinned to the same CPU. There are unexpected woes, please check last comments about pidStat/pgrep: https://gerrit.ovirt.org/#/c/45738/9/tests/utilsTests.py
restoring other NEEDINFO
(In reply to Francesco Romani from comment #8) > Configurability? It is pretty brutal to always pin VDSM to cpu #1. Maybe the > admin do not want pinning (!), maybe he want to pin it to a different cpu. I admit it seems brutal, but I'm asking why would an admin want to avoid pinning, when there's such a big CPU benefit, and no reported down sides.
danken - I'd just make sure not everyone is pinning themselves to processor 0 to solve such issues... How about randomly selecting which core to pin to?
flagging as blocker - this is way too important for large setups
Not sure the proper way to convey documentation about this BZ is the doc_text, as per last discussions. I added it just in case.
verified on top of vdsm-4.17.20-0.el7ev.noarch host has 144 cores and vm per core (144 vms) vdsm running 60% CPU utilization in avg, there is some peaks to 90-100% but not over it. Test Case: 144 vms per 144 cores. for 15min duration.
more over, it has been tested with and without mom (disabled \ enabled).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html