Bug 1247075
Summary: | [scale] high vdsm threads overhead | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Michal Skrivanek <michal.skrivanek> | |
Component: | vdsm | Assignee: | Francesco Romani <fromani> | |
Status: | CLOSED ERRATA | QA Contact: | Eldad Marciano <emarcian> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 3.5.1 | CC: | ahoness, bazulay, danken, fromani, gklein, istein, lpeer, lsurette, mgoldboi, michal.skrivanek, mkalinin, mtessun, pstehlik, pzhukov, ybronhei, ycui, yeylon, ykaul | |
Target Milestone: | ovirt-3.6.0-rc | |||
Target Release: | 3.6.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Previously, excessive thread usage in VDSM and Python runtime architecture caused poor VDSM performance on multicore hosts. VDSM now supports affinity in order to pin its processes to specific cores. CPU usage is reduced as a result of pinning VDSM threads to a smaller number of cores.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1265205 (view as bug list) | Environment: | ||
Last Closed: | 2016-03-09 19:43:02 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1279431 | |||
Bug Blocks: | 1265205 |
Description
Michal Skrivanek
2015-07-27 09:40:49 UTC
makes sense to have a vdsm.conf configurable to pin to a certain number of cores. We can consider backporting (disabled by default) if changes are not that intrusive Michal, Can you please set the severity? Thanks, Ilanit Didn't investigate much deeper, but the benefits are quite important/visible, so worth trying to do something configurable at least(e.g. default disable, default settings 2 cores 0,1, drop the pinning in cpopen so child processes are not constrained) Infra's thoughts? Dan/Yaniv - thoughts? first patch posted. Still very much WIP/RFC, but the concepts are there, so it is good for first evaluation. Two possible approach: 1. expose affinity syscalls to python level, then let vdsm itself adjust affinity (this is what patch 45282 starts to do). 2. just use taskset tool at startup 1. is much more complex than 2. and 2. is probably good enough. Need a bit of tinkering, however, to make it easily configurable. Given the numbers in comment 0, what is the benefit of making this configurable? What is the case against ExecStart=taskset 1 @VDSMDIR@/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null "@VDSMDIR@/vdsm" (In reply to Dan Kenigsberg from comment #7) > Given the numbers in comment 0, what is the benefit of making this > configurable? > > What is the case against > > ExecStart=taskset 1 @VDSMDIR@/daemonAdapter -0 /dev/null -1 /dev/null -2 > /dev/null "@VDSMDIR@/vdsm" Configurability? It is pretty brutal to always pin VDSM to cpu #1. Maybe the admin do not want pinning (!), maybe he want to pin it to a different cpu. That said, this is not the hardest part (for some definition of "hard"). The hardest part is to make sure that helper processes spawned by VDSM are NOT pinned to the same CPU. There are unexpected woes, please check last comments about pidStat/pgrep: https://gerrit.ovirt.org/#/c/45738/9/tests/utilsTests.py restoring other NEEDINFO (In reply to Francesco Romani from comment #8) > Configurability? It is pretty brutal to always pin VDSM to cpu #1. Maybe the > admin do not want pinning (!), maybe he want to pin it to a different cpu. I admit it seems brutal, but I'm asking why would an admin want to avoid pinning, when there's such a big CPU benefit, and no reported down sides. danken - I'd just make sure not everyone is pinning themselves to processor 0 to solve such issues... How about randomly selecting which core to pin to? flagging as blocker - this is way too important for large setups Not sure the proper way to convey documentation about this BZ is the doc_text, as per last discussions. I added it just in case. verified on top of vdsm-4.17.20-0.el7ev.noarch host has 144 cores and vm per core (144 vms) vdsm running 60% CPU utilization in avg, there is some peaks to 90-100% but not over it. Test Case: 144 vms per 144 cores. for 15min duration. more over, it has been tested with and without mom (disabled \ enabled). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html |